2009 — 2012 |
Jaeger, T. Florian |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Bayesian Cue Integration in Probability-Sensitive Language Processing @ University of Rochester
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
The process of sentence comprehension involves incrementally accessing the meaning of individual words and combining them into larger representations. In this process, readers / listeners use probabilistic cues to guide their expectations of upcoming words and of the syntactic roles that the words will play. For example, previous research has demonstrated that people are sensitive to word frequency (more frequent words or word meanings are easier to process than less frequent words or word meanings), syntactic frequency (more frequent rules being easier to process than less frequent rules) and world knowledge (more likely events being easier to process than less likely events). However, a complete theory of language processing must not only identify the cues that people are sensitive to, but also has to specify a theory of how a reader or listener will combine them. Towards this end, this project investigates the extent to which readers? syntactic processing mechanisms fit several cue combination models that are based on different Bayesian inference methods. Bayesian models provide a formalism specifying how any set of probabilistic information sources can be optimally weighted and combined. In this research, each of the cue combination models is applied to a wide range of language cues which people have been shown to rely on. The models will then be compared in terms of their fit to human reading-time data. The project will use existing reading-time data sets from previous experiments, as well as new reading-time data from language materials consisting of single sentences in null contexts and in supportive contexts, systematically varying several cues that have been shown to have measurable reading time effects in previous literature. This will demonstrate which of the inference methods provides the most accurate description of the computations that human readers perform in order to understand sentences.
Previous work in the field of sentence comprehension has established that people use a diverse range of probabilistic cues when they interpret a sentence. However, several important questions that remain unanswered include (a) how much each cue matters in typical texts; and (b) how the cues are combined in the course of comprehension. The main advance of this research is to use a combination of experimental and computational modeling methods to answer these two questions. The techniques and results developed will be broadly useful in at least three general areas: (1) cognitive science; (2) engineering; and (3) human applications of language research. First, this project will help researchers by providing an available database of reading times for a large corpus of English text, which any researcher will be able to use to evaluate theories of language processing. In addition, the project will provide open source software that researchers can use or modify to evaluate related theoretical questions in language and other fields of cognitive science. Second, the project will provide a way for computer engineers who research language to investigate the effects that human readers are sensitive to, indicating potential directions for fruitful research. And third, an understanding of how language is processed will in the long run aid in developing better diagnostic tools and treatments for people with developmental and acquired language disorders.
|
1.009 |
2009 — 2012 |
Jaeger, T. Florian |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Studying Language Production in the Field: Accessibility Effect On Variation @ University of Rochester
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
When speakers encode their thoughts into linguistic utterances, they often can choose between several different ways of conveying the same message. For example, English speakers may use an active or a passive. They are more likely to choose a passive over an active if the patient, but not the agent, is human and hence more accessible ("The girl was struck by lightning"). Such accessibility effects are well-established for a small set of (mostly related) languages, and have had a tremendous influence on linguistic and psycholinguistic theory. Yet, the languages previously studied psycholinguistically are unsuited for distinguishing competing accounts of accessibility effects. For example, according to availability accounts, speakers prefer to mention the more accessible referent earlier in the sentence. According to certain alignment accounts, on the other hand, speakers prefer to make the most accessible referent the subject of the sentence. Both accounts correctly predict English passive vs. active choice. To address these questions, this study will look at morphosyntactic variations in a language that is typologically very different from English: Yucatec (Mayan). The studies will exploit properties of Yucatec to distinguish between accessibility accounts where previously studied languages fail to do so. The studies will also contribute to the establishment of an interdisciplinary research program of field-based psycholinguistics. Since most of the world's languages are spoken far away from psycholinguistic laboratories, it is crucial to adapt and apply psycholinguistic methods to the study of variation under field conditions, where participant recruiting is a non-trivial issue and familiarity with the very concept of an "experiment" cannot be assumed. The research will employ two types of production methodologies, recall studies and video description tasks, as well as grammaticality ratings. Production studies are the primary methodology of psycholinguists working on choice in language production, providing quantitative data on what speakers produce. Grammaticality ratings are the primary methodology of theoretical linguists employed in fieldwork. This method will be extended to a quantitative level, making it possible to study gradient preferences in alternations. By using both methods, the relevance and accessibility of the results will be increased for both research communities. In addition to their relevance for psycholinguistic research, the studies will close significant gaps in the scientific record of the grammar of Yucatec, for which close to no quantitative record of syntactic variation exists. This data will help to distinguish between competing analyses of Yucatec syntax, and will contribute to syntactic typology.
The successful completion of the research will have a potentially transformative effect on several disciplines: for psycholinguistics, it will allow theories of production to be tested against a wider set of language data. For field linguistics, it will contribute to the methodological repertoire available for future field work, pushing forward the state-of-the-art in language description. The post-doctoral researcher, graduate research assistants, and undergraduate students will have the experience of participating in and becoming specialists in a newly emerging research paradigm of field-based psycholinguistics. They will be trained in psycholinguistic theory, as well as psycholinguistic and field work methodologies (including state-of-the-art statistical data analysis). The native speaker consultants, who are members of indigenous communities, will receive valuable linguistic training. The sound files from the research will constitute the first large scale documentation of syntactic variation in Mayan. All data will be archived and made freely available, allowing further study of the material. To facilitate similar investigation on other understudied languages, all experimental stimuli and scripts will also be made available.
|
1.009 |
2012 — 2017 |
Jaeger, T. Florian |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Communicative Efficiency and Adaptiveness in the Ideal Speaker @ University of Rochester
Human communication is typically robust even at high speeds. This suggests that both speakers and listeners efficiently deal with the uncertainty and noise inherent to perception, production, and the environment. This CAREER award investigates how the human brain accomplishes this. A mathematical model of efficient communication based on probability and information theory (the Ideal Speaker model) is tested against data from conversational speech. Specifically, the project investigates how the pronunciation of words in spontaneous speech depends on words' expected confusability in context, the cognitive load the speaker is under and the situational incentive for robust communication. The Ideal Speaker model also predicts that efficient communication with a particular interlocutor requires adaptation to that interlocutor, a prediction that the project tests in behavioral paradigms against task-oriented speech production.
The project contributes to our understanding of how humans produce language, why language has the properties it has, and to what extent the neural systems underlying language production can adjust to different communicative task demands. These insights can contribute to the development of better automatic speech recognition systems (this project is limited to the evaluation of such systems). In addition, novel paradigms to gather large amounts of language data are developed that will dramatically cut research costs. Finally, training in the emerging field of computational psycholinguistics is provided to a broad international audience via summer schools and workshops. This will contribute to a new generation of multidisciplinary scientists working across traditional boundaries between computer science, linguistics, and cognitive psychology.
|
1.009 |
2013 — 2014 |
Ferreira, Victor Jaeger, T. Florian |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop: How the Brain Accommodates Variability in Linguistic Representations; July, 2013 - University of Michigan @ University of Rochester
When we listen, we rapidly and reliably decode speakers' intentions and we mostly do so independently of whom were are talking to. Yet, anyone who has interacted with an automated speech recognition system (e.g., while booking a flight) is painfully aware that speech recognition is a computationally hard problem: although we hardly ever become aware of it, the physical signal corresponding to, for example, one speaker's "b" can be identical to another speaker's "p", making it hard for computers to distinguish between them. How then does the human brain accomplish this task with such apparent ease?
This NSF funded workshop brings together researchers from computer sciences, linguistics, and the cognitive sciences to discuss and investigate how the brain achieves robust language understanding despite variability. The invited speakers are internationally-known experts. Representatives from both industry and academia will present on the state of the art in automated speech recognition, implicit learning during language understanding, and the neural systems underlying speech perception. The workshop will take place in conjunction with the 2013 Linguistic Society of America's Summer Institute--the largest international linguistics summer school--and will thereby provide training to a large number of young language researchers.
|
1.009 |