2000 — 2004 |
Van Santen, Jan Macon, Michael (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Modeling Degree of Articulation For Speech Synthesis @ Oregon Health and Science University
The automatic conversion of text to speech provides a means to achieve universal access to on-line information. However, except for simple messages, speech generated by current synthesizers is both unpleasant and hard to understand: even though words presented individually are quite intelligible, listeners are generally unable to comprehend longer or more complex messages without intense concentration. A key reason for this "incomprehensibility" is the lack of proper prosody in synthetic speech. Prosody refers to the rhythmic and melodic characteristics of speech, which are used by the speaker to structure information for the listener. That is, prosody conveys to the listener which words or phrases are important prominence, and which words belong together in some semantic or syntactic sense (phrasing). Prosody involves a host of acoustic features, such as variations in fundamental frequency (F0), timing, and features that are related to the speaker's level of effort. Current synthesizers have poor prosody for two main reasons: (i) accurate prediction from text of timing and F0 is intrinsically difficult, and (ii) they can neither predict nor control features in speech that correspond to the speaker's articulatory effort. While many techniques exist for control of segmental duration (one aspect of timing) and F0 characteristics of speech, little attention has been paid to control of this second category of effects, and the quality of current synthesizers is poor as a result.
The PI has defined a concept of "degree of articulation" to refer to the fact that, at a given speaking rate, speakers can control the precision and speed of the motions of their tongue, lips, velum, etc. with varying degrees of effort, from "hypo-articulate" (sloppy) to "hyper-articulate" (precise). Acoustic correlates of degree of articulation have been shown to covary with linguistic factors such as word emphasis and syllabic stress. While clearly important, this concept is nevertheless vague and its static and dynamic acoustic correlates have not been well established. Moreover, no quantitative models exist that predict degree of articulation from text or that provide a sufficiently precise quantitative description of these acoustic correlates for implementation in a synthesizer. The overarching goal of this project is to develop principled quantitative models for the prediction of acoustic features associated with degree-of-articulation, and to implement these results in a speech synthesizer. The strategy will be (a) to use text materials that systematically vary in prominence-related factors in order to elicit varying levels of degree of articulation in read speech; (b) to analyze speech signal, laryngograph signal, and jaw/lip articulatory data; and (c) to use the analysis results to generate mathematical descriptions of the relationship between prosodic structure and spectral features of the speech signal.
The outcomes of this project will include the following: Improved understanding of the acoustic, glottal, and articulatory correlates of degree of articulation, including both static and dynamic features. This knowledge will impact not only basic science, but also technologies like speech synthesis and automatic speech recognition; Accurate prediction of spectral features of the speech signal from prosodic structure, based on a principled model that incorporates both acoustic and articulatory knowledge; Techniques for more natural-sounding speech synthesis that requires a lower attentional demand on the listener. This will lead to greater user acceptance of synthesized speech in applications including voice-based information access, language training, and tools for visually or vocally disabled persons.
|
0.915 |
2001 — 2005 |
Hosom, John-Paul Van Santen, Jan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Making Dysarthric Speech Intelligible @ Oregon Health and Science University
Making Dysarthric Speech Intelligible
This is the first year funding of a three year continuing award. Of the 2.5 million or more adult Americans with significant disability due to chronic neurologic impairment, a large percentage present with dysarthria, or speech impairment, as one of their disabling conditions, and there are no known cures. Dysarthric individuals report loss of employment, educational opportunities, social integration, and quality of life. Despite some strategies for compensating, the isolation caused by communication impairment is pervasive. In this project, the PI will develop new algorithms that, when implemented in wearable devices, will enable dysarthric individuals to be more easily understood. Currently available devices are essentially (digital or analog) spectral filters and amplifiers that enhance certain parts of the spectrum. While these can help certain types of dysarthria, many dysarthric persons suffer from speech problems that require forms of speech modification that are much more profound and complex such as: irregular sub-glottal pressure, resulting in loudness bursts that can be difficult to adjust to; absence, or poor control, of voicing; systematic mispronunciation of certain phoneme groups, resulting in certain sounds becoming indistinguishable or unrecognizable; variable mispronunciation; and poor prosody (pitch control, timing, and loudness). For these difficult problems, new approaches are needed that do not merely filter the speech signal but analyze it at acoustic, articulatory, phonetic, and linguistic levels. These approaches can be combined to generate an output speech signal that, while preserving certain features of the input speech, modifies the input speech along as many dimensions as is needed to achieve intelligibility. The past decade has seen a revolution in speech technology that can be applied to these problems; while little of the currently developed technologies are in their present form applicable to dysarthria, the underlying algorithms can form a basis for the creation of innovative techniques that are specifically targeted to address these more difficult speech problems. The PI will create these technologies in a diagnostic framework, so that the appropriate technology is used for a given type of dysarthria. The results will be of great value for dysarthric individuals; the scientific challenges are formidable, and meeting them will produce insights that will be broadly useful for other speech technologies as well.
|
0.915 |
2002 — 2008 |
Sproat, Richard (co-PI) [⬀] Hosom, John-Paul Van Santen, Jan Black, Alan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Prosody Generation For Child Oriented Speech Synthesis @ Oregon Health and Science University
This project focuses on innovative algorithms for generating highly expressive synthetic speech. Current text-to-speech synthesis (TTS) systems generate speech that lacks expressiveness. This is a serious obstacle for the potential application of TTS to computer based language and speech remediation for children. Using TTS has these advantages over recorded speech, which is currently the standard in remedial systems: (i) TTS provides complete flexibility in textual materials, and enables interactivity and individualization, which are both key for successful language teaching and remediation. (ii) TTS output can be modified more easily and along far more dimensions than recorded speech, including temporal, intonational, and spectral dimensions, so that speech output can be adjusted to a child's individual pattern of needs. Generating expressive speech involves three hard research problems. (i) Computation of abstract tags that specify, e.g., which words need emphasis, and phrasing (e.g., where to pause). (ii) Based on these tags, the system has to compute a fundamental frequency contour. (iii) Severe modification of the stored speech fragments ("acoustic units") to obtain these contours. The central goal of the project is to address these research problems, and create a TTS system that will make the next generation of TTS based remedial systems viable.
|
0.915 |
2003 — 2007 |
Van Santen, Jan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Objective Methods For Predicting and Optimizing Synthetic Speech Quality @ Oregon Health and Science University
The goal of this project is to improve the quality of text-to-speech synthesis. Text-to-speech synthesis is an increasingly more widely used technology that plays a core role in automated information access by telephone, and universal access for individuals with visual or other challenges, and educational software.
The project focuses on how humans perceive acoustic discontinuities in speech. Current text-to-speech synthesis technology operates by retrieving intervals of stored digitized speech from a database and splicing ("concatenating") them to form the output utterance. Unavoidably, there are acoustic discontinuities at the time points where the successive speech intervals meet. For reasons that are currently poorly understood, many of these acoustic discontinuities are not audible even when they seem large by any objective measure. This relative insensitivity of human hearing is the reason that concatenative synthesis works at all. However, conversely it also often occurs that seemingly small discontinuities are audible. These facts raise the scientific question of how one can construct an objective acoustic discontinuity measure that accurately predicts from the quantitative, acoustic properties of two to-be-concatenated speech intervals whether humans will hear a discontinuity.
This question is not only of interest for a better understanding of human hearing, but is also of immediate practical relevance. Many text-to-speech synthesis systems select speech intervals at run time from a large speech corpus. During selection, the systems search through the space of all possible sequences of speech intervals that can be used for the utterance and selects the sequence that has the lowest overall objective cost measure, such as the Euclidean distance between the final frame and initial frame of two successive intervals. However, research has already shown that this method and related methods do not predict well whether humans will hear a discontinuity. The current research, by being explicitly focused on perceptually optimized objective cost measures, will directly contribute to the perceptual accuracy of cost measures and hence to synthesis quality.
|
0.915 |
2005 — 2009 |
Van Santen, Jan P |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Expressive and Receptive Prosody in Autism @ Oregon Health and Science University
DESCRIPTION (provided by applicant): Autistic Spectrum Disorders (ASD) form a group of neuropsychiatric conditions whose core behavioral features include impairments in reciprocal social interaction, in communication, and repetitive, stereotyped, or restricted interests and behaviors. The importance of prosodic deficits in the adaptive communicative competence of speakers with ASD, as well as for a fuller understanding of the social disabilities central to these disorders is generally recognized; yet current studies are few in number and have significant methodological limitations. The objective of the proposed project is to detail prosodic deficits in young speakers with ASD through a series of experiments that address these disabilities and related areas of function. Key features of the project include: 1) the application of innovative technology. The study will apply computer-based speech and language technologies for quantifying expressive prosody, for computing dialogue structure, and for generating acoustically controlled speech stimuli for measuring receptive prosody; moreover, all experiments will be delivered via computer to insure consistency of stimuli and accuracy of recording responses; 2) broad coverage of the dimensions of prosody. All three functions of prosody, grammatical, pragmatic, and affective, will be addressed; expressive and receptive tasks are included; and both contextualized tasks (dialogue, story comprehension and memory) and decontextualized tasks (e.g., vocal affect recognition) will be used; 3) inclusion of neuropsychological assessment and classification methodologies to address within-group heterogeneity and obtain a detailed characterization of the groups; 4) inclusion of two comparison groups: children with typical development and those with Developmental Language Disorder; 5) inclusion of an experimental treatment program to enhance the prosodic abilities of speakers with ASD.
|
0.936 |
2009 — 2013 |
Shafran, Izhak (co-PI) [⬀] Song, Xubo Kain, Alexander (co-PI) [⬀] Van Santen, Jan Black, Lois |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Hcc: Medium: Automatic Detection of Atypical Patterns in Cross-Modal Affect @ Oregon Health and Science University
"This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5)."
The expression of affect in face-to-face situations requires the ability to generate a complex, coordinated, cross-modal affective signal, having gesture, facial expression, vocal prosody, and language content modalities. This ability is compromised in neurological disorders such as Parkinson?s disease and autism spectrum disorder (ASD). The PI's long term goal is to build computer-based interactive, agent based systems for remediation of poor affect communication and diagnosis of the underlying neurological disorders based on analysis of affective signals. A requirement for such systems is technology to detect atypical patterns in affective signals. The objective of this project is to develop that technology. Toward that end the PI will develop a play situation for eliciting affect, will collect audio-visual data from approximately 60 children between the ages of 4-7 years old, half of them with ASD and the other half constituting a control group of typically developing children. The PI will label the data on relevant affective dimensions, will develop algorithms for the analysis of affective incongruity, and will then test the algorithms against the labeled data in order to determine their ability to differentiate between ASD and typical development. While automatic methods for cross-modal recognition of discrete affect classes already have yielded promising results, automatic detection and quantification of atypical patterns in affective signals, and the ability to do so in semi-natural interactive situations, is unexplored territory. The PI expects this research will lead to new methods for affect recognition based on facial affective features (with special emphasis on facial frontalization algorithms and on modeling of facial expressive dynamics), vocal affective features, and lexical affective features, as well as to new methods for automated measurement of cross-modal affective incongruity.
Broader Impacts: The expression of affect in special populations is a largely neglected area in affective computing and robotics; yet, these populations may be among the most important beneficiaries of these technologies. Affective expression impairments afflict many individuals, including those with neuro-developmental disorders such as autism, and those with neuro-degenerative disorders such as Parkinson?s disease. Because these impairments concern a core aspect of human communication and, hence, may cause profound social isolation in these individuals, intervention is highly desirable. However, one-on-one intervention by therapists, if effective, would be available only to relatively few individuals, thereby making computer-based intervention critical for broader access to such treatment. Accurate processing of the affective signal will be of use as a research and diagnostic tool for a range of neurological disorders. The CSLU research team will continue its tradition of disseminating research findings and technology, including speech corpora and software, to the research community.
|
0.915 |
2009 — 2010 |
Tjaden, Kris Van Santen, Jan P |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Quantitative Modeling of Segmental Timing in Dysarthria @ Oregon Health &Science University
DESCRIPTION (provided by applicant): Quantitative, acoustic models of segmental timing in spoken English, such as have been developed for text-to-speech synthesis (TTS), acknowledge that segment durations in connected speech reflect the combined influence of systematic factors as well as nonsystematic or random factors. Systematic Variability in segment durations reflects factors such as context, stress, speaking style or register, and cognitive load. Segment durations also reflect within-speaker variability - termed Random Variability - that cannot be attributed to any of these systematic factors. An individual talker's speech duration patterns therefore can be mathematically characterized in terms of the magnitude of the effects of each systematic factor (e.g., amount of lengthening associated with word stress), as well as in terms of the relative and absolute amounts of systematic and random variability. Importantly, this powerful modeling framework can be applied to meaningful sentence productions, and is capable of isolating the effects of individual systematic factors without requiring the use of artificial speech materials. This approach to quantitatively modeling segmental timing in TTS has further proven crucial for successfully synthesizing intelligible, natural-sounding speech. Given the importance of this modeling framework for generating high quality speech synthesis, it is surprising that similar modeling efforts have not been applied to dysarthria as a means of understanding the source of reduced intelligibility and naturalness in this speech disorder. Aberrancies in the temporal patterning of speech are ubiquitous in most persons with dysarthria, and the contribution of speech duration variables to intelligibility and naturalness is suggested in a variety of studies. The approach used in many existing studies is to document whether speech durations in dysarthria are - on average - atypically short, long or variable as compared to normal speech. The TTS modeling framework described above, however, goes beyond this type of simple description to identify the relative contribution of specific systematic factors influencing segment durations for an individual speaker as well as the combined relative and absolute contributions of systematic and random factors to segmental timing for that individual. The TTS modeling framework further allows model parameters for an individual speaker to be manipulated via speech synthesis to determine the impact on intelligibility and naturalness. The proposed exploratory project seeks to apply such a quantitative modeling framework to segment durations in sentences produced by speakers with a variety of neurological diagnoses and dysarthrias. The perceptual relevance of model parameters will be further studied via speech resynthesis to determine their impact on judgments of intelligibility and naturalness. PUBLIC HEALTH RELEVANCE: Effective and efficacious treatment of reduced intelligibility and naturalness in dysarthria requires knowledge of factors explaining or underlying these functional limitations. The proposed exploratory project seeks to apply a quantitative model of segmental timing, developed for text-to-speech synthesis, to persons with dysarthria for whom anomalies in the temporal patterning of speech are common. Findings from this project will provide a new and comprehensive model of aberrancies in the temporal patterning of speech in dysarthria;the contribution of model parameters to perceptual judgments of intelligibility and naturalness also will be determined.
|
0.936 |
2010 — 2015 |
Klabbers, Esther Kain, Alexander [⬀] Van Santen, Jan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Hcc: Medium: Synthesis and Perception of Speaker Identity @ Oregon Health and Science University
This proposal addresses the problem of synthesizing speaker identity when only a small training sample is available. To achieve the goal of synthesis of speaker identity from a small training corpus the project will address problems including trainable abstract parameterizations of the prosodic patterns that characterize a speaker and voice conversion methods. The project falls into the general category of building Text-to-Speech (TTS) synthesis system in order to generate speech that sounds like that of a specific individual (Speaker Identity Synthesis, or SIS). Systems of this kind have numerous applications, including the creation of personalized voices for individuals with neurodegenerative disorders who anticipate becoming users of Speech Generating Devices (Sods) in the future and many other applications in the consumer products and entertainment industry. Consumer products such as navigation systems and mobile phones are rapidly being developed that make use of linguistic information about generated utterance. The project will also provide new tools and data for human perception of speaker identity. The tools developed in the process and the associated perceptual studies are also relevant for assessment of speaker recognition systems, and the project provides a new generation of concise, trainable characterizations of a speaker?s prosodic patterns that can be incorporated in these systems. The proposed study will elucidate the trade-offs and algorithm issues of the proposed SIS systems and it is likely that the proposed work will have a strong intellectual impact in the field of speech synthesis.
|
0.915 |
2010 |
Van Santen, Jan P |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Expressive Crossmodal Affect Integration in Autism @ Oregon Health &Science University
DESCRIPTION (provided by applicant): Children with autism spectrum disorder (ASD) have often been observed to express affect either weakly, only in one modality at a time (e.g., choice of words) or in multiple modalities but not in a coordinated fashion. These difficulties in crossmodal integration of affect expression may have roots in certain global characteristics of brain structure in autism, specifically atypical interconnectivity between brain areas. Poor crossmodal integration of affect expression may also play a critical role in communications difficulties that are well documented in ASD. Not understanding how e.g., facial expression can be used to modify the interpretation of words undermines social reciprocity. Impairment in crossmodal integration of affect is thus a potentially powerful explanatory concept in ASD. The study will provide much needed data on expressive crossmodal integration impairment in ASD and its association with receptive croosmodal integration impairment, using innovative technologies to create stimuli for a judgmental procedure that makes possible independent assessment of the individual modalities;these technologies are critical because human observers are not able to selectively filter out modalities. In addition, the vocal measures and the audiovisual database lay the essential groundwork for the next step: Creation of audiovisual analysis methods for automated assessment of expressive crossmodal integration. These methods will be applied to audio-visual recordings of a structured play situation;the child will participate in this play situation twice, once with a caregiver and once with an examiner. This procedure for measuring expressive crossmodal integration will be complemented by a procedure for measuring crossmodal integration of affect processing using dynamic talking-face stimuli in which the audio and video stream are recombined (preserving perfect synchrony of the facial and vocal channels) to create stimuli with congruent vs. incongruent affect expression. Both procedures will be applied to three groups: Children with ASD, children with Developmental Language Disorder (DLD), and typically developing children;ages will be six to ten. Our study would be the first to perform a comprehensive analysis of crossmodal integration of affect expression in ASD. If the study confirms the existence of these impairments in ASD, and provides a detailed picture of these impairments, this could (i) guide brain studies to specifically target areas responsible for affect expression;(i) provide a deeper understanding of impairments in social reciprocity;and (ii) help design remedial programs for intensive training of under-used or incoordinated expressive modalities. The study this contributes to etiology diagnosis, and treatment.
|
0.936 |
2011 — 2014 |
Jimison, Holly Kaye, Jeffrey Hayes, Tamara Van Santen, Jan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Shb: Large: Collaborative Research: Integrated Communications and Inference Systems For Continuous Coordinated Care of Older Adults in the Home @ Oregon Health and Science University
NSF 10‐575 SHB: Large: Collaborative Research: Integrated Communication and Inference Systems for Continuous Coordinated Care of Older Adults in the Home PI Bajcsy, UC Berkeley, Co‐PI Jimison, OHSU
Abstract
This research project addresses the important problem of improving and maintaining peoples? healthy lifestyles by inventing smart technology based on fundamental scientific principles. The approaches are economically feasible and socially compelling approaches, with a focus on maintaining the health and independence of older adults in a home environment. The project uses a mix of networking and monitoring technologies to connect older adults with a remote health coach (real person facilitated by a semi-automated program) and remote family members. One of the key design issues is how best to preserve privacy and enable the participants to control the distribution and sharing of their data. The intervention is designed to provide coordinated and continuous health management.
The research for this project uses the integration of data from a variety of sensors in the home, yielding information for activity monitoring, sleep monitoring, gait and movement analysis, socialization measures, as well as a variety of cognitive metrics derived from computer interactions with adaptive games. Rigorous computational engineering models of the cognitive and physical functions of the patient, as well as context and environment, are used to infer patient state and provide feedback for the patient and the remote health coach. The modeling techniques include Partially Observable Markov Process and Hybrid Control Modes. User models that incorporate behavior change principles are then used to drive algorithms to optimize automated feedback and recommendations that serve as prompts for a health coach managing a large number of patients. These approaches to remote health management are evaluated by leveraging an existing prototype platform with the capability of collecting data from the homes of elderly participants.
|
0.915 |
2012 — 2015 |
Van Santen, Jan P |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Computational Characterization of Language Use in Autism Spectrum Disorder @ Oregon Health & Science University
DESCRIPTION (provided by applicant): Atypical or impaired language is one of the core features of autism spectrum disorder (ASD). Yet, what the precise characteristics of language are in ASD and how they differ from those in other disorders such as Specific Language Impairment (SLI) is still substantially unknown. An important obstacle for the study of language in any disorder is that conventional structured instruments (i.e., instruments consisting of a sequence of items, each eliciting a - typically brief - response, such as the Clinical Evaluation of Language Fundamentals [CELF]) may not provide adequate breadth of information: Analysis of natural language samples is required The proposed research will build on recent progress in Natural Language Processing (NLP) technology, an area of Computer Science concerned with computational analysis of text. The goal of the proposed research is to develop and validate new NLP based methods that automatically measure language characteristics of ASD based on raw (i.e., not coded) transcripts of natural language samples. The objective is to improve the analysis of natural language samples by enhancing efficiency, reliability, and richness of information extracted. Data on three groups of children ages four to eight will be analyzed, obtained from an earlier study: ASD, SLI, and typically developing children. If successful, the new methods will have important impacts on research and clinical practice for ASD and for other disorders in which language is affected, by enabling analysis of more representative and ecologically valid natural language samples as well as by creating opportunities for discovery of currently unknown language characteristics of ASD by the effortless extraction of numerous language features.
|
0.936 |
2016 — 2019 |
Van Santen, Jan P |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Automatic Voice-Based Assessment of Language Abilities @ Oregon Health & Science University
? DESCRIPTION (provided by applicant): Since untreated language disorder - a disorder with a prevalence of at least 7% - can lead to serious behavioral and educational problems, large-scale early language assessment is urgently needed not only for early identification of language disorder but also for planning interventions and tracking progress. This is all the more so because a recent study found that 71% of children diagnosed with Specific Language Impairment (a type of language disorder) had not been previously identified. However, such large-scale efforts would pose a large burden on professional staff and on other scarce resources. As a result, clinicians, educators, and researchers have argued for the use of computer based assessment. Recently, progress has been made with computer based language assessment, but it has been limited to language comprehension (i.e., receptive vocabulary and grammar). Thus, computer based assessment of language production that is expressive language and particularly discourse skills, is still lacking. One contributing factor is that a key technology needed for this, Automatic Speech Recognition (ASR), is perceived as inadequate for accurate scoring of language tests since even the best ASR systems have word error rates in excess of 20%. However, this perception is based on a limited perspective of how ASR can be used for assessment, in which a general- purpose ASR system provides an (often inaccurate) transcript of the child's speech, which then would be scored automatically according to conventional rules. We take an alternative perspective, and propose an innovative approach that comprises two core concepts. The first is that of creating special-purpose, test-specific ASR systems whose search space is carefully matched to the space of responses a test may elicit. The second is that of integrating these systems with machine-learning based scoring algorithms whereby the latter operate not on the final, best transcript generated by the ASR system but on the rich layers of intermediate representations that the ASR system computes in the process of recognizing the input speech (rich representation). Earlier experiments in our lab with digit and narrative recall tests have demonstrated the feasibility of this approach. In the proposed project we will create computer-based scoring and test administration systems for tests in the expressive modality as well as in the vocabulary, grammar, and discourse domains; we will also create a system for a non-word repetition test. The systems will be applied to a diverse group of 300 children ages 3-9 with typical development and with neurodevelopmental disorders, and will be validated against conventional language measures. The automated language tests developed in the project cover core diagnostic criteria for language disorders but also create a technological foundation for the computerization of a much broader array of tests for voice based language and cognitive assessment.
|
0.936 |
2017 — 2018 |
Hill, Alison Presmanes Van Santen, Jan P |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Automated Measurement of Language Outcomes For Neurodevelopmental Disorders @ Oregon Health & Science University
Improving conversational use of spoken language is an important goal for many new interventions and treatments for children with neurodevelopmental disorders. However, progress in testing these treatments is limited by the lack of informative outcome measures to indicate whether or not an intervention or treatment is having the desired effect on a child's conversational use of language (i.e., discourse skills). The long-term goal of the proposed renewal project is to harness the benefits of NLP to impact functional spoken language outcomes for children with neurodevelopmental disorders. The goal of the parent R01 (R01DC012033) is to develop and validate new Natural Language Processing (NLP) based methods that automatically measure discourse-related skills, including language productivity (talkativeness), grammar and vocabulary, and discourse, based on raw (i.e., not coded or annotated) transcripts of natural language samples. Our objective in this proposal is to take the next step to evaluate the suitability of these NLP-based measures as outcomes for children with a range of intellectual abilities, language levels, and diagnoses. NLP algorithms require choices of pivotal parameter settings, such as word frequency dependent weights. While our previous results, involving between-group contrasts, were insensitive to these settings, our proposed project, involving psychometric quantities such as validity, may be sensitive to them. Building on our progress from the parent R01, we propose to pursue three specific aims: (1) Identify pivotal parameter settings that optimize stability of NLP discourse measures, and examine responsiveness to real change; (2) Evaluate consistency of NLP discourse measures, and identify key measurement factors that impact consistency; and (3) Evaluate validity of NLP discourse measures, and differences in validity as a function of diagnostic group, age, IQ, and language ability. Our approach will focus on optimizing stability of such measures, and assessing responsiveness to change over time, consistency across sampling contexts and different sample lengths, and validity of each measure. The contribution of the proposed project will be to systematically assess the psychometric properties of NLP discourse measures. The proposed research is innovative because it represents a substantial departure from the status quo by taking the crucial next step: the development of scalable, psychometrically sound measures of discourse skills that can be used to assess between-group differences as well as within-individual change over time. The proposed research is significant because it is expected to result in viable spoken language outcome measures for children with a range of neurodevelopmental disorders, making it possible to target and meaningfully measure improvements in clinical trials and behavioral interventions. Ultimately, the successful completion of this study will provide the immediate ability to scale up treatment evaluations involving measurement of spoken language use, allowing flexible data collection across sites and studies, and in the future provide new targets for to-be-developed behavioral and pharmacological interventions.
|
0.936 |