Tools for Automated Assessment of Language

Language and communication problems critically characterize a number of neurodevelopmental disorders including Developmental Language Disorders (DLD) and Autism Spectrum Disorders (ASD).

It is increasingly recognized that assessment should include spontaneous natural language samples. There are measures, such as the Index of Productive Syntax (IPSyn), which are laborious to apply, since they require manual analysis of a corpus of sentences collected from the child. There is therefore a strong need for automated tools to aid clinicians in this task, and to provide more robust analyses for assessment.

Many widely-used commercial software packages, such as the Systematic Analysis of Language Transcripts (SALT) do little more than count linguistic features, and still require someone to code those features. Fully automated systems, such as Computerized Profiling (CP) do exist, but reports vary on how well they work. However, recent advances in natural language processing, in particular in the realm of grammar adaptation, mean that we are now able to produce high-quality commercial software that can be used by clinical practitioners in their assessment of child language.

This project will build a commercially viable suite of fully automated, software-based tools, requiring no special equipment beyond a standard personal computer, targeted at clinicians who work with children with neurodevelopmental disorders. The system will have been evaluated not only with children with Typical Development (TD), but with data from children with ASD and DLD.

The system will include the following components: (1) text normalization tools to help with the clean up and normalization of transcriptions; (2) a state -of-the-art part-of-speech tagger; (3) a state-of-the-art morphological analyzer; (4) a state-of-the-art syntactic parser; (5) a dependency analyzer and semantic-role labeler, (6) a scoring module to take the output of the language analysis, and map this to the IPSyn or other scales.

The software will be written in C++ and Java, and will adopt a rigorous industrial coding style, namely the Google style guide, which is in the public domain.

As part of an NIH-funded project on Autism, the Center for Spoken Language Understanding, in collaboration with colleagues at Yale University, has collected a corpus of video and audio recorded interactions of children with ASD and DLD, as well as TD children ages 4-8. In this STTR-funded project, we will perform a manual IPSyn assessment on these data and provide the results as a benchmark on how well the system will perform in the field.

While the target language for the initial development will be English, the software will be written in a fully language-independent fashion, in that no properties of English will be hard-coded into the system. All that would be required to 'port' the system to a new language would be appropriate training material for the language.

Public Health Relevance

Language and communication problems critically characterize a number of neurodevelopmental disorders, and it is increasingly recognized that assessment should involve the analysis of spontaneous language samples. This project will develop a package of software programs for the automatic analysis of spontaneous language samples from children with neurodevelopmental disorders. The program will be usable directly by clinicians in their assessment of patients.

Funding source


Principal Investigator

Richard Sproat