Unbiased Approaches to Quantifying Motor and Other Visible Behaviors in Humans with Autism

A photo of a boy using the smartphone AI app, BlinkLab, designed to support clinicians in the assessment of autism. — SFARI held a workshop to explore advanced quantitative phenotyping methods for capturing visible behaviors, including motor phenotypes. One presentation featured BlinkLab, a platform that uses smartphones to perform reflex-based neurometric assessments. BlinkLab

Simons Foundation Executive Vice President of Autism and Neuroscience Kelsey Martin introduced the meeting, giving an overview of the Simons Foundation and its programs. She spoke about SFARI’s strategic planning efforts and its focus on the basic science of autism, explaining that those efforts gave rise to the current workshop. She highlighted SFARI’s four human cohorts and the Research Match program that connects researchers with participating families, as well as recent research findings using data from those cohorts. She also highlighted approaches that researchers are using to quantify motor behavior in animals.

SFARI Scientific Officer Amy Norovich further explained the goals and structure of the workshop, which brought together people working in animal models with researchers studying human populations, both with and without autism, to explore advanced quantitative phenotyping methods for capturing visible behaviors, including motor phenotypes

Norovich gave the example that motor differences are common in autism. She proposed that unbiased approaches to quantifying behavior could uncover novel motor signatures of autism, and remarked that other autism-relevant behaviors — such as social interaction and sensory responsiveness — might be captured in the motor output observed in videos. Objective measures of behaviors could enable cross-species studies, better parsing of autism heterogeneity, and more informed autism diagnosis and treatment, leading to better understanding of underlying mechanisms and improved patient outcomes.

Quantitative Phenotyping in Animal Models: Methods and Insights

Improving accessibility to quantitative phenotyping in animal models
Matt Whiteway, Ph.D. (Columbia University)

Matt Whiteway discussed current tools for quantitative phenotyping in animal models, highlighting their limitations and strategies for addressing these. He described a common phenotyping pipeline in which animal behavior is captured on video, keypoints are tracked and used for pose estimation, and pose estimations are used for behavioral segmentation. He explained that most tools to carry out this kind of analysis succeed when videos capture a single view of a single animal’s behavior, but often fail when data becomes more complex — such as when videos involve multiple animals or multiple camera views.

Whiteway stressed the importance of making computer-vision tools broadly accessible. He identified data annotation as a major bottleneck in the analysis pipeline — particularly for tracking animals that are less commonly studied — and noted that the user interface should be a key consideration in software development. Lightning Pose and Lightning Pose 3D, pose estimation tools developed by Whiteway and colleagues, use semisupervised learning to reduce the need for manual annotation and include a user interface for labeling data, training models, running inference and visualizing results.

Computer-vision foundation models can also track animals’ poses without manual annotation. These models are becoming increasingly powerful, but they can be too slow to be practical for labs with large datasets and limited computing power; they are also trained on data that may look very different from the data that researchers want to analyze. Whiteway argued that there is value in customizing these models, training them on relevant data to track specific points and behaviors.

Following the talk, the group discussed strategies for ensuring that computer-vision models are suitable for accurately quantifying relevant behaviors, including in clinical settings. Speakers pointed out the need for robust quality control measures and compatibility with smartphone-captured video.

Exploring behavior using Motion Sequencing
Sandeep “Bob” Datta, Ph.D. (Harvard University)

Bob Datta discussed Motion Sequencing (MoSeq), which his group developed to characterize the structure of mouse behavior. MoSeq uses depth or keypoint data to identify and analyze recurring behavioral modules called behavioral syllables. Datta stressed that the hypothesis that behavior is modular has led to methods that segment behavior into discrete actions. However, some aspects of behavior are fundamentally continuous and should be characterized as such. He pointed out the need to consider the timescale on which behavioral modules are discretized. MoSeq’s behavioral syllables are about 100 to 200 milliseconds long, which corresponds to the timescale of sensory sampling and cortico-basal ganglia loops.

Datta’s lab has linked MoSeq behavioral analyses to neural data, and found activity in the dorsolateral striatum of the basal ganglia that appears to encode the selection and sequencing of behavioral syllables. Datta also showed how MoSeq can characterize and predict the behavior of mice administered different psychoactive drugs. In combination with MoSeq descriptions of the behavior of animal models of autism, they have used this data to identify candidate therapeutics that rescue upregulated behavioral syllables in the Cntnap2 mouse model. He acknowledged that large amounts of data have gone into these analyses; it is not clear how much data would be needed to use their models to make predictions about new mouse strains or conditions.

Datta also highlighted ShMoSeq, a tool for identifying higher order behavioral states, which they have found correspond to animals’ self-directed goals and correlate to neural activity in the prefrontal cortex. He noted that morphological variations between animals, including those associated with sex and age, can impact the presentation and identification of behavioral syllables. He also shared preliminary data in which his team has used MoSeq to identify facial syllables in people with autism.

High-resolution behavioral phenotyping in rat models of ASD
Bence Ölveczky, Ph.D. (Harvard University)

Bence Ölveczky described the methods his lab historically used to quantitatively describe animal behavior — beginning with tracking rats’ tattooed paws in 2D and using deep learning tools for pose estimation like DeepLabCut and SLEAP, and later capturing 3D pose dynamics with body-piercing-affixed retroreflectors and multiple cameras and using Motion Mapper to identify and describe natural behaviors. Early on, they used this 3D behavioral analysis to characterize the behavior of animal models of autism, revealing distinctive grooming behavior in Fmr1-knockout rats. He suggested that with new methods and further analysis, these data might reveal additional motor phenotypes and their relationship to behaviors observed in human cohorts.

To make 3D pose estimation more versatile and accessible, Ölveczky’s team developed DANNCE (3 Dimensional Aligned Neural Network for Computational Ethology), which obtains precise 3D poses from 2D video without the need for markers, using a convolutional network trained using the team’s 3D motion-capture data as ground truth. While originally developed using multiple camera views, DANNCE can also reconstruct 3D poses with data obtained with a single camera, including an iPhone. DANNCE has been used to track the 3D behavior of a variety of animals, including rats, marmosets and chickadees. The method is noninvasive and can be used to analyze the behavior of young animals for developmental studies.

Ölveczky highlighted opportunities to use DANNCE to study social behavior, explaining that interactions between animals can be treated as a single system and features such as animals’ distance, orientation, and touch can be measured and tracked to parse social behavior.

Discussion
Matt Whiteway, Bob Datta and Bence Ölveczky
Moderated by Amy Norovich

In the discussion following the first set of talks, the group discussed how biases enter computational methods for animal and human behavior phenotyping, and the limits of current pose estimation and behavioral modeling.

Whiteway stressed that unsupervised methods, in which algorithms are not provided with labels when they are trained, are not unbiased. He clarified that all models are inherently biased by design choices, including model architecture, hyperparameters and timescales. He added that while supervised models are trained with labeled data, unsupervised methods also require substantial human labor, with supervision occurring at the output stage. Ölveczky added that significant bias comes from the data that are used to train the model and how it is collected, which should be informed by the research question. Likewise, models can be selected and fine-tuned to different scientific questions to make sense of particular dimensions of data. It is useful to be explicit about a model’s biases, though this is not possible for many neural networks.

The group agreed that pose estimation is not a solved problem. Challenges remain, especially when dealing with occlusions or tracking multiple animals, even when data is captured within the well-controlled environments that are possible for many animal studies. For human research, complex scenes, historical video, suboptimal lighting and occlusions complicate things further. Speakers noted that these factors mean foundation models often fail in precisely the most interesting behavioral situations. Moving from pose tracking to action recognition — especially for clinically meaningful constructs like autism-relevant behaviors — adds an additional level of complexity, particularly as the perception of these can vary even among trained clinicians. The group also noted the lack of reliable confidence measures for these methods. Another obstacle is accessibility and the lack of integration of various tools.

The group discussed metrics and behavioral axes that might generalize across species, such as movement variability across development, low-level motor behaviors and abstract measures of social interactions, such as the degree of coupling between two systems. They discussed the hope that data-driven behavioral axes will eventually map onto mechanistic axes, potentially via joint models uniting neural and behavioral data.

Finally, speakers emphasized the need for modular, maintainable pipelines that can swap in new computer-vision models, as well as the need for sustained funding for software engineering and tool maintenance.

Quantitative Phenotyping in Non-Autism Human Populations: Methods and Insights

Learning the dynamics of human behavior
Gordon Berman, Ph.D. (Emory University)

Gordon Berman talked about the importance of representing behavior as a process in time, pointing out that key information is lost when the description of dynamic behavior is collapsed into a static summary. He explained that understanding behavioral dynamics requires uncovering unseen processes that drive those outputs that can be observed. He advocated for seeking a dynamic whole-body understanding of behaviors like gait, using representations of continuous processes that move forward in time.

Berman explained how his team analyzes human gait by training recurrent neural networks to predict future gait using motion tracking data from patients on treadmills. Examining the networks’ latent space, they look for activations of the neural network and use these to generate low‑dimensional gait signatures for individuals. They have found that able‑bodied individuals have similar gait signatures, whereas those of stroke survivors are much more heterogeneous, varying even between individuals with similar clinical diagnoses.

Berman also talked about the dynamic nature of social interactions. He described his team’s approach to analyzing interactions between mothers and infants, using temporal convolutional autoencoders and surrogate signals. Their preliminary analyses, using data from both screen-mediated and face-to-face interactions, show differences in the behavior of typically developing infants and those who were later diagnosed with autism.

During the Q&A, Berman noted that long short-term memory-based generative models give better biomechanical and neuromechanical interpretability than categorical models like MotionMapper. He also addressed the challenge of phenotyping behaviors that change over time, noting that decisions about longitudinal boundaries should be guided by biological and developmental knowledge.

Inferring individuals’ perceptual, cognitive and motor properties from naturalistic behavior with inverse modeling
Constantin Rothkopf, Ph.D. (Technical University Darmstadt)

Constantin Rothkopf described a sensorimotor control framework for modeling naturalistic human behavior that involves human state-dependent sensory uncertainties, time-dependent representational uncertainties and signal-dependent action uncertainties. This leads perception, decision-making and motor behavior to be intertwined processes. Using inverse modeling customized to describe a broad range of different sensorimotor tasks, his group infers latent variables like perceptual uncertainty, internal effort costs and beliefs about task dynamics at the individual level. This allows them to quantify individual differences in sensorimotor behavior and to attribute these differences to meaningful cognitive quantities.

Rothkopf described three tasks in which this quantitative framework can successfully predict and explain interindividual differences in behavior. He started with the most frequent human visual behavior — blinking — and showed how people strategically time blinks to balance the need to blink with avoidance of missing important visual stimuli. In a laboratory test, subjects adjusted their blink timing as they developed beliefs about when a relevant image will appear. Rothkopf’s group’s inverse models parameterize individuals’ beliefs about the distribution of these events and the trade-off between blinking and task performance. Their model explains variations in interblink intervals among individuals, showing that the behavior is shaped by how much an individual prioritizes task performance.

Rothkopf’s team has also used inverse modeling to infer individuals’ sensory sensitivity, beliefs about a task and costs of motor control during a continuous psychophysics task in which individuals track a target through a noisy environment. This newly introduced experimental paradigm is particularly promising in clinical settings and with children, as it requires a relatively small amount of data and is more engaging than conventional methods for measuring sensory sensitivity.

Rothkopf’s final example applied the framework to an often used virtual-reality landmark-based navigation task, where experimenters can manipulate subjects’ available landmark and self-motion cues. He showed how modeling human sensorimotor control as belief-space planning explains subjects’ navigational strategies. This model is able to reproduce data from multiple experiments and labs, including how people move their eyes, heads and bodies; their errors; and variability in navigation. Thus, these methods can be used to quantify individual differences in navigation behavior, a fundamental everyday behavior, in terms of different cognitive properties.

Phenotypic development: Why, when, what, where, how
Karen Adolph, Ph.D. (New York University)

Karen Adolph discussed approaches for studying phenotypic development in the context of basic science research. For studies of autism, she thought it was most important to study infants to learn about the early development of the condition. She recommended using time-locked videos that capture both behavior and context, and described the ways her team uses video to study behavioral development in infants.

Adolph noted the advantages and limitations of fixed, roving and wearable cameras, comparing angles, occlusions and the ability to capture environmental context, which is crucial for interpreting infants’ behavior. She illustrated how infants’ developing motor skills reshape what they see, touch and do, and how this affects opportunities for learning. She noted advantages to studying behavior in both home and lab settings, explaining that homes capture everyday richness while labs offer more control over the environment, and pointed out that families are often more willing to consent to video that is captured at home. She also compared naturalistic and structured tasks, illustrating how each can reveal developmental trajectories when paired with rich video and computer vision.

Adolph also highlighted Databrary, a secure web-based video library for sharing and reusing developmental research videos, enabling researchers to leverage existing resources and scale up behavioral analyses. Tools for annotation and analysis are integrated into Databrary. Access is open but controlled, and videos are shared with participant consent.

The Storyline Health AI platform: Potential for scalable autism research and care
Chris Gregg, Ph.D. (University of Utah)

Chris Gregg introduced Storyline Health, an AI platform for adaptive, algorithmic care, explaining how it can help guide the design of chemotherapy regimens, physical therapy programs and mental health management. Storyline uses assessments that can be completed on a smartphone to collect behavioral data, tracking features such as facial expressions, gaze, motor behavior, speech content and prosody. Videos collected through Storyline are analyzed with AI models within a secure cloud environment, generating knowledge that can be used to adjust and update treatment. These models are continuously updated and can be guided by expert-curated literature to ensure accuracy and clinical relevance. The team is also exploring building foundational models for human behavior with its multimodal data.

To support patients and encourage them to use the platform, features like symptom monitoring, patient education and behavioral assessments are integrated with an AI assistant that can manage users’ daily tasks. Data is managed in a HIPAA- and Systems and Organizational Controls-compliant system, and users retain control over their data.

Gregg emphasized Storyline’s flexibility, which allows users to make and share their own assessments, and its capacity for deep behavioral phenotyping. He proposed that the platform could be deployed for autism research, suggesting that its AI agent, Cleo, could be used to study social interactions, while also providing support for autistic individuals as a social partner and coach. He clarified that further validation was needed to support clinical use, and the Storyline team is working to become established in spaces outside of medicine, such as supporting self-guided care, before pursuing clinical trials.

Discussion
Gordon Berman, Constantin Rothkopf, Karen Adolph and Chris Gregg
Moderated by Paul Wang, Ph.D., Simons Foundation

The discussion addressed questions about tailoring data collection and modeling strategies to different goals and populations, the importance of internal states for characterizing and interpreting behavior, and the potential for cross-species studies to inform human research.

Participants talked about the richness of data that can be captured with wearable sensors, contrasting these with more accessible and scalable smartphone data. The breadth of data captured by these tools was noted to be valuable, with Adolph pointing out that researchers do not always know where to look for meaningful behavioral signals, citing the input from caregivers that influences language development early in life.

In a discussion about gait analysis, Adolph pointed out that infants rarely use a periodic gait, instead using a variety of modes of locomotion that are easily recognizable to a human observer but poorly tracked with computer vision. She also noted that humans are better able than AI models to infer when an infant’s locomotion is goal-directed. Others responded that computer-vision models could be trained to track these kinds of movements. With inverse modeling, it is possible to make inferences about internal states and better understand the measures that inform those internal states. They noted that inferences about internal states would be valuable in interpreting behavior, such as recognizing potential motor impairments or discriminating between motor variability and uncertainty in an individual’s internal models.

Considering the relative advantages of structured tasks and naturalistic behavior, Adolph suggested that structured tasks are most appropriate for diagnosis or evaluating interventions, whereas naturalistic studies are necessary to understand real‑life behavior. Wilson argued that both are needed and both should be pursued simultaneously, with structured tasks being valuable for comparing typical with atypical behavior, and naturalistic tasks offering insight into how people move and adapt in variable environments.

Megan Carey, Ph.D., of the Champalimaud Foundation, proposed using video corpora of ADOS assessments for behavioral analysis, noting the opportunity to use findings to refine the assessment tool for more meaningful diagnoses. LeeAnne Green Snyder, Ph.D., from SPARK, Boston Children’s Hospital and Dartmouth Health, clarified that although ADOS scoring isn’t based on neurotypical norms, many neurotypical children have completed the assessment. Wang pointed out that while there are many recordings of both autistic and neurotypical children completing the assessment, consent limits broad sharing, and video quality is often not good. Berman added that his team has found little correlation between ADOS score and the structure of infant-caregiver interactions.

In a discussion about cross‑species studies, Berman talked about the importance of understanding evolutionary biology in identifying homologous behaviors, structures and neurobiology, and the value of an interdisciplinary approach. Gregg talked about opportunities to identify behavioral fingerprints through perturbations of internal states in model organisms, then look for these in humans. Berman also noted that his group’s gait analysis has been applied in flies, where optogenetic perturbation linked specific somatosensory neurons to gait control.

Quantitative Analysis in Autism Populations: Methods and Insights

Neurobehavioral assessment of sensorimotor function in autism using smartphone technology
Henk-Jan Boele, MD, Ph.D. (Erasmus University Medical Center; Blinklab; Princeton University)

Henk-Jan Boele talked about BlinkLab, which uses smartphones to perform reflex-based neurometric assessments, such as pre-pulse inhibition (PPI), habituation and eye-blink conditioning. The BlinkLab app delivers precisely timed sensory stimuli while recording facial responses, capturing audio at the same time. Computer vision tracks movements of participants’ eyelids, mouth, eyebrows and head. Participants complete tests at home, which Boele’s team has found improves data quality. Researchers have the flexibility to design their own experiments, and testing can be done over multiple sessions or days so researchers can study learning and neuroplasticity. Data are stored in a secure database.

Boele described potential clinical applications for BlinkLab, highlighting the need for earlier diagnosis and intervention in autism. In a case-control study involving 536 children at autism centers in Morocco, he and colleagues used BlinkLab to compare PPI and startle habituation in autistic and neurotypical individuals. They found that whereas a pre-pulse leads to strong inhibition of the startle blink in neurotypical children, autistic children were less likely to exhibit PPI. Many autistic children exhibited pre-pulse excitation instead, reacting more strongly to a second stimulus than to the first. Habituation of the startle reflex was stronger in children with autism than in neurotypical children, due to the high amplitude of their initial response. In contrast to children with autism, the team has found elevated responsivity to pre‑pulses in children with ADHD. The team is also comparing vocal responses and hand and head movements in autistic and neurotypical children recorded during the same tasks.

Quantitative analysis of gait phenotypes in individuals with nongenetic and genetic autism
Rujuta Wilson, M.D. (University of California, Los Angeles)

Rujuta Wilson described motor impairments that impact some children with autism and explained her lab’s approach to characterizing gait and other motor phenotypes. Her team uses a pressure-sensor gait mat and computer-vision technology to derive spatiotemporal measures of pace, postural control and gait variability. She pointed out factors that can influence these assessments in autistic children, including caregiver hand-holding, variations in speed, toe-walking and repetitive behaviors — and noted how her team has developed approaches to control for these.

Wilson described her team’s efforts to characterize the earliest onset of walking in toddlers, some of whom were later diagnosed with autism. They found that in the first three years of life, autistic toddlers displayed gait patterns that are distinct from those of their typically developing peers, particularly their slower pace. Wilson stressed that comparisons were made using both chronological and mental age–matched comparison groups, suggesting that gait differences in children with autism are not associated with cognitive ability.

Wilson also highlighted studies of children with 15q duplication (Dup15q) syndrome, who have high rates of motor impairments and autism. Although standard motor assessments found few differences among individuals, the lab’s quantitative gait assessment found significant deficits and identified markers that differentiate Dup15q from both idiopathic autism and neurotypical controls. These metrics can be utilized across conditions to identify deficits in gait and postural control. Wilson pointed out the translational nature of these methods, and parallels in their findings in children and studies of a mouse model of Dup15q, including a wide stance and slower velocity.

Wilson described her team’s work adapting and validating a pose estimation in computer-vision pipeline against gold standard methods for use in pediatric populations, noting that this process is needed as pediatric populations pose different challenges when employing these techniques. Wilson demonstrated the ability to conduct scalable longitudinal phenotyping of key motor domains with this approach. Her team is also using computer vision to assess upper-extremity motor function in individuals with genetic forms of autism. She also described her team’s work using wearable sensors to capture longitudinal infant motor skills in naturalistic environments and its progress developing metrics that improve earlier prediction of a later autism diagnosis.

Quantifying autistic behaviors during clinical assessments with computer vision and AI
Ilan Dinstein, Ph.D. (Ben-Gurion University)

Ilan Dinstein talked about the difficulty in measuring autism severity, noting frequent disagreement between scores on different assessment tools. He argued that digital phenotyping efforts should focus on quantifying specific behaviors rather than overall autism severity—particularly those that have the most impact on autistic people and their families. He recommended breaking broad domains like social communication and restricted/repetitive behaviors into concrete measurable units.

Dinstein detailed the approach his team has used to quantify stereotypical movements and social communication behaviors during ADOS assessments, using a large library of video recorded with multicamera, multimicrophone setups at autism centers in Israel. To identify stereotypical movements, which can be very diverse, his team first trained an object-detection model (YOLO) to identify the child in each video. They used OpenPose to extract full-body skeletons. Then, clinician-trained students manually annotated more than 5,000 stereotypical movements from hundreds of children, which were used to fine-tune a PoseC3D action-recognition network. The algorithm identifies video segments with stereotypical movements with more than 90 percent recall and 68 percent precision. Dinstein’s group is using the same pipeline to identify social communication behaviors like pointing, showing and headshakes. He pointed out the potential to use the same tools to analyze video collected outside clinical settings and the value of tracking quantitative measures of behavior over time.

Following the talk, workshop participants discussed the ability to detect subtle examples and the possibility of using unsupervised methods to detect stereotypical movements as deviations from normative motor behavior.

Into the wild: Automated quantification of autism-relevant behaviors from naturalistic home videos
Rachel Reetzke, Ph.D. (Johns Hopkins University School of Medicine; Kennedy Krieger Institute)

Rachel Reetzke described how her team is using AI-driven techniques, such as computer vision, to quantify early autism‑relevant behaviors from home videos. She stressed the need for scalable valid measures that can be tolerated by infants and toddlers, and the importance of measuring autism-relevant behaviors that change over the course of development.

In a fully remote study that recruited research participants across the United States through SPARK Research Match, Reetzke and her team collected data from 147 children with and without autism. Parents submitted questionnaires and home videos of children at approximately 14 and 36 months, and children participated in remote language assessments. About 35 hours of video were submitted and underwent manual quality checks, metadata extraction and extensive manual annotation of behavior to establish a “ground truth” dataset. Consistent with prospective autism literature, the team detected significant differences between autistic and non-autistic groups in their use of gestures, vocalizations, repetitive behaviors and response to name. They also saw significant differences between groups in their grasping ability, but did not find significant group differences in locomotion. Cluster-based analyses revealed higher‑ and lower‑skills trajectories, separated by different features at 14 and 36 months.

Reetzke emphasized the significant labor required to manually annotate behavior, further underscoring the need for automated quantification approaches. To accelerate this work, her group has partnered with Satrajit Ghosh at the Massachusetts Institute of Technology to develop a quantitative phenotyping pipeline that integrates multiple computer-vision approaches to track children’s body movements and analyze how those movements unfold over time. By combining these complementary streams of information, the team is able to automatically identify autism-relevant behaviors like repetitive motor movements, gesture use and response to name. They are also developing a family-friendly app to support scalable data collection and automate quality checks.

Having demonstrated proof of concept and developed new tools, the group now aims to use their approach to support large-scale prospective remote phenotyping.

Discussion
Henk-Jan Boele, Rujuta Wilson, Ilan Dinstein and Rachel Reetzke
Moderated by Jennifer Foss-Feig, Ph.D. (Simons Foundation)

The group discussed opportunities to uncover patterns of behavior in autistic children that have not yet been clinically described, potentially enabling earlier diagnosis and intervention. Wilson talked about the power of characterizing motor patterns and trajectories in autistic children that may underlie broad developmental outcomes. Reetzke pointed out the opportunity to use quantitative methods to detect kinematic differences that could help clinicians understand and predict individuals’ developmental trajectories, but that are difficult to describe or too subtle to see by eye, such as differences in hand-flapping frequency and amplitude, head-movement variability, and postural control.

Dinstein favored focusing on clinically relevant behavioral dimensions that can inform treatment and lead to better outcomes for autistic individuals and their families. Wilson suggested that identifying early motor signatures of autism could create opportunities to introduce early motor therapies that might broadly impact development.

Wilson suggested that while machine learning models are good at identifying severe autistic behaviors, there are many areas where better tools are needed, including for children with broad developmental delays, individuals who are profoundly impacted by autism and younger children. She also talked about the tension between developing approaches to identify behaviors that are difficult for a clinician to pick up, and the need to associate these with clinically validated measures.

The group also considered how behaviors that are useful diagnostically — which might not be those directly targeted by interventions — might be used to stratify populations and guide interventions. Wilson talked about the opportunity to identify behavioral signatures that predict developmental trajectories impacting phenotypes like autism severity, language development, and cognitive ability, and to use these to guide early interventions. Reetzke pointed out the value of identifying and automating outcome measures for clinical trials and of having markers to track development and treatment response.

The group considered the value of foundational models for pose estimation and action segmentation. Liam Paninski, Ph.D., of Columbia University, recommended labeling and fine-tuning pose estimation models to improve results, which is commonly done in animal research. R. James Cotton, M.D., Ph.D., from Northwestern University, argued that models for human pose estimation have already been trained on massive datasets, and so the gains from additional labeling would, in most cases, be minimal. It was also pointed out that fine-tuning these foundational models risks overfitting to a particular dataset. Dinstein pointed out that while foundational models do a good job identifying facial landmarks, there was a lot of room for improvement in identifying facial expressions and emotions, for which manual labeling would be useful. Talmo Pereira, Ph.D., of the Salk Institute, argued for bypassing pose estimation and developing models that recognize actions directly from pixels.

Carey argued that the real challenge is to glean biological insights from behavioral data, and urged the group to focus on higher‑level conceptual questions. Some speakers agreed that for many clinically relevant questions, granular biomechanical data is not needed. Some argued for more neural data to pair with behavioral data in pursuit of mechanistic insights, while Cotton advocated for using biomechanics to develop physics simulations.

The group also discussed the challenge of distinguishing autism from other neurodevelopmental conditions, like ADHD and intellectual disability, and the importance of avoiding low-specificity tools. Dinstein argued for focusing less on categorical diagnoses and more on clinically relevant behavioral dimensions that can directly guide treatment.

Group Activities

On the second day of the workshop, participants met in small groups, tasked with outlining an approach that capitalizes on new technologies to better quantify behavior in humans with autism. SFARI Vice President and Senior Scientific Officer Jennifer Foss-Feig introduced the activity, asking participants to consider potential next steps, opportunities that could be unlocked with new technologies or methods, and how SFARI cohorts might be leveraged in data-collection efforts. Foss-Feig and Martin spoke about opportunities to add rich behavioral data to SPARK, and the group discussed how better quantification of behavior could allow researchers to parse autism’s heterogeneity and lead to mechanistic insights.

Following the small group work, the full group discussed concerns about family burden and the need to design studies and tools that support, not exhaust, caregivers. They noted that thoughtful consideration of study design, including investing in easy-to-use tools for collecting and sharing data, can help minimize burden. It was also pointed out that study participation should offer value to participants as well as researchers.

Wilson stressed that behavioral studies have the potential to reduce burden on families of autistic individuals in the long term by identifying diagnostic tools that could reduce or eliminate the need for lengthy standardized assessments. Many families see this as valuable, which should be factored in when weighing the benefits and burden of study participation. Cotton pointed out that many rehabilitative technologies fail because usability and patient acceptance are afterthoughts in their development. He advocated for a significant investment in engineering to ensure a positive user experience that would support the adoption of relevant technologies by study participants.

The group also considered how to balance immediate and longer-term priorities, debating how broad behavioral data collection should be. Rothkop argued that the current focus should be on developing and validating methods for robust measurement and analysis of behavior. Berman added that significant methodological advancement should be anticipated in the near future, and thus data collection strategies should be designed accordingly. Ölveczky argued for targeted data collection that focused on data useful for addressing specific questions about autism. Lisa Yankowitz, Ph.D., of Children’s Hospital of Philadelphia, suggested that as technology progresses, it would be valuable to collect norming data for behavioral assessments.

Carey and Wilson noted that motor function impacts communication, social interactions and daily functioning. Gait, another speaker said, is a useful biomarker for motor control strategy more broadly. Reetzke made the point that fine-motor abilities have been found to be more predictive of later outcomes than gross motor function. Wilson noted that the majority of findings on gross motor function are based on standardized developmental assessments. Rothkopf added that motor behaviors can also reflect sensory perception, internal modeling, planning, and other processes. Nanthia Suthana, Ph.D., of Duke University, added that gait, for example, is a good bridge between animal and human studies, and explained how neural signatures across species can be bridged with field potentials and population coding.

Closing Discussion

Moderated by Amy Norovich, Jennifer Foss-Feig, and Paul Wang

To begin the discussion, the group considered what things cannot be compromised in order to preserve scientific rigor and achieve phenotyping goals, while remaining mindful of feasibility and participant burden. Pereira stressed the importance of obtaining rich metadata, including time stamps, at the time data is collected. This can be achieved with software engineering and a well-designed user interface. Wilson advocated for including “ground truth tasks” alongside naturalistic tasks, to understand the relationship between new and established behavioral measures. Another priority was preserving autism’s heterogeneity, enabling better understanding of phenotypic variability. Carey added that it will be vital to preserve raw data, retaining all of its features for future analysis.

The group also considered privacy, consent and other issues related to data sharing. Participants agreed that it was important to be clear with study participants about how and by whom their data might be used — whether for academic or commercial purposes, for example, and whether it would be used to train AI — and that they should be able to consent to different levels of data sharing. Berman advocated for seeking legal and ethical advice about consenting participants to data sharing, particularly with respect to AI.

The group discussed the trade-off between retaining and sharing raw video versus derived features. Most agreed that while in most cases, researchers were generally more interested in derived features, it was important to preserve raw data — to allow for further validation of those features, to preserve features that may be important but that were not initially recognized as such, and to make it possible to revisit the data with new methods in the future.

With respect to equity and inclusion, speakers noted that collecting and sharing large video files is not practical or accessible for all families. Efficient video encoding, providing devices and connectivity, and the development of simple low‑burden apps for data collection and sharing can enable wider participation — some of which can be done on a case-by-case basis, rather than study-wide. It was also noted that to protect vulnerable families, participants should be able to decide which videos they share. Several speakers noted that offering participants the option to delete or choose not to share specific videos can improve their comfort and willingness to consent.

Wilson highlighted the need to consider not only how video will be shared but also data from wearable sensors, which study participants might be more willing to provide. She recommended developing a clear pipeline that includes documenting details of how data has been collected, including which devices were used and where they were placed.

In a discussion about how to establish ground truth in the face of autism’s heterogeneity, the group acknowledged that there is more than one kind of ground truth, and that new behavioral data could allow for the discovery of new, more meaningful ground truths. With respect to ground truth diagnoses of autism, many advocated for relying on established clinical measures.

Several speakers pointed out that heterogeneity in autism will likely be useful in uncovering underlying mechanisms, and this complexity should be embraced. It was pointed out that clinicians would also like to better understand autism heterogeneity, particularly in order to better understand and predict developmental trajectories.