1994 — 1998 |
Traver, Alfred Tucker, Richard Haas, Carl [⬀] Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Development of Large Scale Manipulator Technology For Construction @ University of Texas At Austin
9302222 Haas The work envisioned will extend the manipulator machine's capabilities and utility by demonstrating feasibility for more general material handling applications involving truck loading and unloading and steel erection. The incorporation of additional control system features should allow for use of the manipulator in a teach-learning mode which would reduce the requirement for operator attention during routine repetitive operations and allow demonstration of the task and path planning demonstrated in the simulation studies already completed. In achieving these objectives, fundamental knowledge about extending the functionality of large scale manipulators in construction will be possible.
|
0.918 |
1995 — 1999 |
Wexler, Kenneth Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Computational Studies of Parameter Setting in Language @ Massachusetts Institute of Technology
The problem of developing an appropriate and workable computational theory of parameter-setting in language is a central problem for linguistic theory, language acquisition, learning theory and cognitive science in general. The purpose of this grant is to make a significant increase, compared to previous studies, in the size and scope of the syntactic parameter spaces, and to do computational parameter-setting investigations of these parameter spaces. We plan to investigate the learning of a space of 256 grammars or larger, based on 8 parameters (plus a "lexical" parameter). The major questions that we will investigate include: 1. With a larger set of parameters, does the theory of parameter-setting work (i.e., for all natural languages stated in terms of those parameters, does the algorithm converge)? 2. Do different algorithms work better than others? 3. What are the kinds of special assumptions about markedness or default values that work in the specific parameter spaces that we study? Are there general principles of markedness or default values that seem to apply to many different parameters? Are default values necessary in general? 4. What happens to computational results in parameter-setting when we "scale up" ? That is, if a certain pattern of results obtains for a parameter-space, do we find that this pattern remains when the space is embedded in a larger space? Or were the results artifacts of the smaller space? 5. Suppose that an algorithm converges on the correct parameter-settings for a class of parameter-settings that we know is instantiated (i.e., there are natural languages which exhibit these parameter-settings), but doesn't converge for some other parameter-settings. Can we show that these parameter-settings are not instantiated in natural language, so that in fact there are reasons of learnability that some parameter-settings don't show up? 6. How fast do particular algorithms converge? This can be stated in terms of the number of examples t hat need to be given. Are some algorithms more realistic than others in this regard? 7. What is the relation of the properties of the algorithms to empirical research in language acquisition? Can early properties of acquisition be shown to relate to the algorithms?
|
1 |
1998 — 2001 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Empirical Investigations of Locality Effects in Linguistic Complexity @ Massachusetts Institute of Technology
The goal of this research is to investigate the relationship between the human sentence processing mechanism and the available computational resources. A promising new theory of this relationship is proposed whose main notion is locality: the Syntactic Prediction Locality Theory (SPLT). There are two components to the theory: an integration component and a memory cost component: ª Integration resources are required to link new words into current syntactic and discourse structures. This includes thematic role assignment and matching syntactic category predictions. It is proposed that longer distance integrations are more costly and hence more time consuming than shorter distance integrations. ª According to the memory resource component of the theory, memory resources are required to maintain the prediction of each syntactic category that is needed to complete the input string as a grammatical sentence. It is proposed that the memory resources required to retain a predicted category increase over distance. The research consists of three sets of self paced reading experiments that address a number of predictions of the SPLT and other theories of language comprehension. The first two sets of experiments test predictions of the SPLT in English, and the third set of experiments explores the cross linguistic generality of the SPLT by testing the theory in a language with very different syntactic structure from English: Japanese.
|
1 |
2002 — 2006 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Intonational Boundaries in Sentence Production and Comprehension @ Massachusetts Institute of Technology
With National Science Foundation support, Dr. Edward Gibson will conduct three years of psycholinguistic research on relations between intonational (or prosodic) phrasing and syntactic structure in sentence production and comprehension. This research asks where people tend to place intonational boundaries (pauses, roughly speaking) in producing sentences, and where people perceive such boundaries in sentences. The working hypothesis of the production experiments is that the probability of producing an intonational boundary at a given location is proportional to the sum of (1) the number of phonological phrases over which the most recently processed syntactic phrase extends, and (2) the number of phonological phrases over which the upcoming syntactic phrase extends, as long as it is not an argument of the most recently processed word. This and related hypotheses will be tested using analyses of natural speech corpora and a reader-listener paradigm. Participants in the reader-listener paradigm say sentences that they have read in advance. Other participants answer comprehension questions on the sentences after they are produced. The research also investigates whether preferences in comprehension mirror those in production. Methods for investigating comprehension will include complexity ratings, comprehension question accuracy, and cross-modal lexical decision. This project is important for several reasons. First, it will broaden our knowledge of the relationship between language and other aspects of human cognition, such as memory. The results of the work will also be of interest to researchers in computer speech generation and analysis, and language acquisition. Speech processing systems need to model human preferences in intonational boundary placement in order to both improve understanding of human speech and synthesize more natural sounding speech. With respect to language acquisition, it has been proposed that intonation can help learners acquire syntactic knowledge. Uncovering the relationship between intonational phrasing and syntactic structure will help to evaluate whether such claims are viable.
|
1 |
2008 — 2010 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation Research: Empirical Studies and Probabilistic Models of Word Segmentation and Word Learning @ Massachusetts Institute of Technology
How do children learn their first words? To learn even a simple word like "table," a child must first pick out that word, rather than, for example, the non-word "the-tay," from the continuous string of syllables comprising a sentence like "it's under the table." The child must also learn that the word "table" refers to one particular object (a table), rather than any of the other things present at the time the word was used (a chair, the family dog, etc.). Recent research suggests that children make use of the statistical distribution of phonemes to solve first problem, word segmentation, and similarly make use of the co-occurrence statistics of words and objects to solve the second problem, word-world mapping.
Graduate student Michael Frank, under the guidance of Dr. Edward Gibson, will use computational methods to help to characterize the nature of the learning mechanisms involved in these tasks. This grant will support studies with adults at MIT and a collaboration with Dr. Anne Fernald at Stanford University to conduct experiments with infants and young children. The goal of these experiments is to vary natural parameters in the learning situation using simple artificial languages to find out what makes learning harder or easier. Computational models will be evaluated on their fit to these data. This project will both contribute to our understanding of how children acquire the first words of their language and provide new directions for computer scientists attempting to create natural language processing systems.
|
1 |
2009 — 2012 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Bayesian Cue Integration in Probability-Sensitive Language Processing @ Massachusetts Institute of Technology
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
The process of sentence comprehension involves incrementally accessing the meaning of individual words and combining them into larger representations. In this process, readers / listeners use probabilistic cues to guide their expectations of upcoming words and of the syntactic roles that the words will play. For example, previous research has demonstrated that people are sensitive to word frequency (more frequent words or word meanings are easier to process than less frequent words or word meanings), syntactic frequency (more frequent rules being easier to process than less frequent rules) and world knowledge (more likely events being easier to process than less likely events). However, a complete theory of language processing must not only identify the cues that people are sensitive to, but also has to specify a theory of how a reader or listener will combine them. Towards this end, this project investigates the extent to which readers? syntactic processing mechanisms fit several cue combination models that are based on different Bayesian inference methods. Bayesian models provide a formalism specifying how any set of probabilistic information sources can be optimally weighted and combined. In this research, each of the cue combination models is applied to a wide range of language cues which people have been shown to rely on. The models will then be compared in terms of their fit to human reading-time data. The project will use existing reading-time data sets from previous experiments, as well as new reading-time data from language materials consisting of single sentences in null contexts and in supportive contexts, systematically varying several cues that have been shown to have measurable reading time effects in previous literature. This will demonstrate which of the inference methods provides the most accurate description of the computations that human readers perform in order to understand sentences.
Previous work in the field of sentence comprehension has established that people use a diverse range of probabilistic cues when they interpret a sentence. However, several important questions that remain unanswered include (a) how much each cue matters in typical texts; and (b) how the cues are combined in the course of comprehension. The main advance of this research is to use a combination of experimental and computational modeling methods to answer these two questions. The techniques and results developed will be broadly useful in at least three general areas: (1) cognitive science; (2) engineering; and (3) human applications of language research. First, this project will help researchers by providing an available database of reading times for a large corpus of English text, which any researcher will be able to use to evaluate theories of language processing. In addition, the project will provide open source software that researchers can use or modify to evaluate related theoretical questions in language and other fields of cognitive science. Second, the project will provide a way for computer engineers who research language to investigate the effects that human readers are sensitive to, indicating potential directions for fruitful research. And third, an understanding of how language is processed will in the long run aid in developing better diagnostic tools and treatments for people with developmental and acquired language disorders.
|
1 |
2010 — 2012 |
Gibson, Edward Piantadosi, Steven |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation Research: Discovering Semantic Primitives @ Massachusetts Institute of Technology
Many words in language have meanings which require a rich and structured representational system. For instance, the meaning of "most" compares the relative sizes of two sets: a sentence like "Most musicians are happy" is true if the happy musicians outnumber the unhappy musicians. However, the representational system which supports these types of set comparisons is not well-understood. Indeed, there are often multiple ways a computational system could realize such meanings. For "most," one might compare the number of happy musicians to the number of unhappy musicians, or to half the total number of musicians (e.g. Hackl 2009, Pietroski et al. 2009). The current project aims to discover the representational system for these types of complex and abstract word meanings.
Many results in cognitive science have found that people are biased to learn concepts which are simpler for their representation system: people find it easier to learn a concept like "black chairs" from examples than the more complex "short and black chairs." The current project will teach people novel, language-like concepts. It will use a novel computational model to predict what generalizations people should make in the learning experiment according to different possible representational theories, under the assumption that representational "simplicity" influences learning. This will allow multiple representational systems to be compared to see which best fit human learning patterns.
Uncovering the basic operations which underlie linguistic representation is important for understanding the way in which the mind learns and uses complex meanings. Moreover, the representational systems which best describe human learning are likely to provide a good basis for artificial systems which use and learn natural language like humans.
|
1 |
2010 — 2013 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Origins of Numerical Competence: Assessment of Number Sense in Piraha @ Massachusetts Institute of Technology
This RAPID project focuses on "fundamental" research on three distinct aspects of number sense: (1) a small exact number system; (2) a large approximate number system; and (3) a system for set-based quantification. Recent exploratory research suggests a strong linkage between number sense (particularly large numerical approximation systems) and student abilities in other domains of mathematics. The investigators will research the links among these three aspects, and the study is designed to further the preliminary findings (cited above).
The investigators propose a correlational study in which they test a population of the indigenous Piraha people of Brazil (a small, isolated, monolingual hunter-gatherer group from the Amazonas) and a sample of Americans (60 in each group) in a battery of cognitively-oriented tasks which measure different core numerical systems as well as other basic cognitive abilities like short-term memory and face perception (as control tasks). The Piraha are an ideal test case for understanding the relationship among core numerical systems because their language is has no words for numbers. In addition, the Piraha do not use exact number in their society and they do not adopt cultural or linguistic conventions from other cultures. A RAPID is justified because their population is threatened by imminent development.
This research is important because a deeper understanding of the conceptual/cognitive components of number sense and how they interrelate can lead to perhaps changed understandings of how students learn and teachers teach this area of mathematics. And, because number sense is so foundational to mathematics and because preliminary research results show its potential importance to future mathematics learning, the project may have a transformative impact on the field.
|
1 |
2012 — 2015 |
Gibson, Edward Kline, Melissa (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation Research: Causal Representations in Children's Transitive Sentences @ Massachusetts Institute of Technology
This Doctoral Dissertation Research Improvement grant will support the work of doctoral student Melissa Kline under the direction of Dr. Edward Gibson.
The structures of sentences contain clues to their meanings. A nonsense sentence like "The dax gorped the blicket to the voom" allows for some initial guesses: 'gorping' probably means something about sending or transferring. The research carried out in this project will address the issue of how children learn the rules that match sentence meanings with sentence form. An important test for this question is a basic sentence type, the transitive sentence (e.g. "Jane broke the lamp.") Across languages, transitive sentences are reliably used to describe causal events, but some languages like English also allow for other kinds of meanings (e.g. "Jane liked the lamp.") Even so, adult speakers tend to expect new transitive verbs to refer to causal events. Do children use a similar strategy to help them learn new verbs? Depending on the learning biases children use and the input they hear from their parents, children might make very different guesses than adults about new verb meanings.
To test these questions we will examine the guesses that toddlers make about the meaning of a novel word like 'daxing.' Co-PI Kline will also use conversation transcripts to compare these guesses to the transitive sentences that parents actually say to their children. Currently, we know that children have a broad preference for certain scenes to match with transitive sentences: a girl pushing a boy in a wheelbarrow is a better guess for "The girl daxed the boy" than a girl and a boy waving their arms separately. In this research Ms. Kline will use what is known about causal perception to create closely matched scene contrasts to discover the specific cues and features that children associate with transitive verbs.
This project will contribute to our understanding of how children learn verbs and sentence structures and of how conceptual representations are related to language. Funding this project also contributes to the training of a graduate student.
|
1 |
2014 — 2015 |
Bergen, Leon (co-PI) [⬀] Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation: Investigating the Role of Grammatical Representation in Language Learnability @ Massachusetts Institute of Technology
Technologies which process natural language have become ubiquitous in the last decade. Web search engines, for example, process billions of pages of text, in order to determine which of those pages best match a user's search query. Many interfaces for interacting with computers -- for example, Apple's Siri personal assistant -- take voice-issued commands from their users, and must process these commands in order to follow the users' instructions. Finally, machine translation technologies have become available for many of the world's most common languages, allowing users to automatically translate text that they find in foreign books or websites. These technologies mostly rely on simple models of language, known as n-gram models or context-free grammars, which were developed in the 1950's and 1960's, and refined in later decades. These simple models of language have many advantages, most notably that they can be used to process large amounts of data very quickly. Because of their simplicity, however, these models are not able to capture many aspects of meaning in natural language. This has resulted in limitations for the technologies discussed above; virtual personal assistants are only able to process very simple types of instructions, and machine translations is still far from being as accurate as human translation. In the current project, Leon Bergen and Dr. Edward Gibson will be investigating more sophisticated kinds of language models, with the goal of increasing the ability of computers to understand language.
Under the direction of Dr. Gibson, Mr. Berger will be studying language models known as mildly context-sensitive grammars. These grammars are able to express certain types of linguistic knowledge that humans have, but which cannot be expressed using simpler types of grammatical formalisms. For example, native speakers of English know that a declarative sentence like "Mary kicked the ball" is closely related in meaning to the question "What did Mary kick?" Although this fact seems obvious, it is difficult (or impossible) to express using simple types of grammars. However, mildly context-sensitive grammars can be used to express this knowledge in a very natural way. Mr. Bergen and Dr. Gibson will be studying whether mildly context-sensitive grammars can be automatically learned from examples of grammatical sentences. To do this, they will be using techniques from machine learning, a branch of computer science and statistics that develops algorithms that can automatically learn from data. The researchers will integrate these learning algorithms with their grammatical formalism, and will test whether their method learns an accurate grammar. The accuracy of the grammar will be evaluated using a corpus -- a collection of sentences -- in which every sentence has been manually annotated with its correct grammatical structure. If accurate mildly context-sensitive grammars can be learned in this manner, then this provides a potential method for improving the natural language processing technologies which were discussed above. In particular, because this method does not require an expert to write down the complete grammar for a language, it has the potential to be deployed without tremendous engineering effort, and may be deployed easily in foreign languages.
|
1 |
2015 — 2016 |
Mahowald, Kyle Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation Research: Investigating Cognitive and Communicative Pressures On Natural Language Lexicons @ Massachusetts Institute of Technology
Understanding how humans produce and comprehend language is a critical step in understanding high-level human cognition and the human brain more generally. Moreover, basic research into human language has been, and will continue to be, useful for building computational natural language processing systems that enable humans to interact naturally with computers. The lexicons of the world's thousands of languages--that is, the sets of words that exist in any given language--offer a particularly rich source of insight into the language production and comprehension mechanism. The words of any given language have undergone thousands of years of evolution, sometimes changing dramatically over one or two generations as sounds change, new words are invented or borrowed from other languages, and old words die. What all languages have in common, however, is that they enable their speakers to successfully communicate with one another. Therefore, a language's lexicon is necessarily constrained by the cognitive and communicative demands of speakers. Consequently, studying the statistical properties of lexicons, the ways that lexicons evolve, and the process by which words are formed is a promising avenue for answering fundamental questions about human cognition.
Building on previous work by this research group showing that lexicons tend to be structured for efficient communication, this research will harness the power of large cross-linguistic data sets available through the Internet, including Wikipedia and Google Books, in order to study the lexicons of a large number of world languages (~100). Specifically, this analysis will focus on how words cluster or spread out in phonetic space, exploring competing demands for words to consist of easy-to-pronounce and easy-to-comprehend sequences but also to be phonetically distinct from one another. A second major component of this work is a series of human-participant behavioral experiments that, in a controlled laboratory setting and in a smaller number of languages, explore the mechanisms that underlie how words change over time. Finally, a computational model will be used to integrate the insights of the statistical analyses and behavioral experiments in order to explore and predict how words enter and exit the lexicon over time. This research program has implications not just for higher-level human cognition but for any engineering applications that require human-computer interaction involving natural language and also for any applications that require building a cognitively tractable communication system that allows people to communicate efficiently.
|
1 |
2015 — 2019 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
The Role of Noise in Information-Theoretic Models of Sentence Comprehension and Production @ Massachusetts Institute of Technology
Human language as it is produced and understood is full of errors: people make speech errors, they make typographical errors when typing / texting, and there is often background noise that makes it impossible to perceive words accurately. Given the noisy nature of human language in practice, it is surprising that people can understand one another so well. The question of how people can communicate given noise is not yet solved, and is the focus of our work. Understanding how humans can understand noisy language is critical for two reasons. First, language technologies must be capable of processing noisy language input: translation services need to account for errors in the text being translated; search engines need to process noisily-generated web content. Evidence concerning how humans understand language in noise can lead to improvements in the design of language technologies. In addition, until dialogue systems can produce coherent language responses--likely decades away--any practical application of such systems must be designed with an understanding of how humans deal with noisy or confusing language input. Second, on the clinical side, understanding how humans understand language which might contain errors will provide insights into language comprehension disorders. Recent research has shown that individuals with aphasia appear to assume the presence of more errors in the input than healthy participants, and thus show stronger reliance on their prior beliefs about the world when interpreting language. Applications of this work may lead to more efficient diagnosis and treatment options for such patients.
The goals of the proposed research are two-fold. First, the researchers will investigate noise in the process of language comprehension, where noise falls into three categories (a) deletions, such that the listener / reader might miss something that was intended; (b) insertions, such that the producer might accidentally insert something; and (c) swaps, such that the producer might accidentally switch elements in the stream. Second, the researchers will investigate an information-theoretic approach to memory in sentence production, where memory is a source of potential errors in language use. Recent human vision research suggests that memory capacity is best modeled as a limitation on the complexity of the representations, in terms of information-theoretic units called "bits". Simple representations require very few bits of information, but complex representations require many. The proposed research extends this idea to language, such that high-frequency words and phrases such as "the boy sees the girl" should be stored easily in memory, while less frequent components such as "the woman who the man met was tall" should be difficult to store in memory.
|
1 |
2016 — 2017 |
Gibson, Edward Futrell, Richard |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Doctoral Dissertation Research: a Communicative Perspective On Quantitative Syntax @ Massachusetts Institute of Technology
This research addresses the question: why are human languages the way they are? The researchers propose that a number of the properties of languages can be explained as design choices that maximize ease of communication for the human brain. Understanding how languages are (or aren't) optimized for communication will be crucial to developing scientific models of human language understanding and learning. The resulting understanding of the purpose of certain language structures will enable more effective teaching of those structures in second language pedagogy. In addition, understanding why languages are the way they are will also affect computer systems for natural language understanding. Over the last decade there have been major advances in computer understanding of natural language, often using surprisingly simple algorithms. It is not known what properties of language make it possible to get so far with such simple algorithms. Once these properties are understood, it will be possible to leverage them to develop even more effective algorithms.
The research focuses on predicting the quantitative word order properties of languages as observed in large annotated corpora of text. Data comes from the Universal Dependencies project, a recent collaboration of several universities and Google to develop uniformly annotated dependency-parsed corpora for about 40 languages. The researchers will develop simple models of communication that incorporate the probability of error in communication and certain well-known points of difficulty in human language comprehension (due to factors such as limited working memory resources). The frequency distribution of different word orders in the dependency corpora will be modeled by assuming that approximately rational speakers selects utterances which have a high probability of being understood. The researchers aim to use these models to explain the distribution of dependency length (distance between syntactically related words) and the distribution of the degree of word order freedom (the extent to which words can appear in different orders while keeping the same meaning).
|
1 |
2016 — 2017 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop On Language Processing and Language Evolution: Special Session At the 2017 Cuny Conference On Human Sentence Processing @ Massachusetts Institute of Technology
It is a central goal of the field of linguistics to characterize the process by which the human brain links the forms of language with some representation of meaning--a process unique to humans. Understanding this process has practical value in a world where language barriers and foreign language learning are increasingly important, where new media and new forms of language use (for example, as on the Internet) play an increasingly large role, and where algorithms that seek to bestow computers with understanding of human languages are proliferating and have increasing impact on day-to-day life. Within the field of psycholinguistics, the most common approach is to take languages as given and to study the processes by which humans comprehend and produce utterances in those languages. Here, using ideas from functional linguistics, quantitative typology, and evolution, workshop participants will explore the reverse approach. Taking some knowledge of human information processing as given, participants will examine systematic differences among languages, variation among grammatical constructions within languages, and the diachronic emergence and development of language structures. Using this evolutionary view of language, presenters will discuss novel hypotheses about the relationships between processing and linguistic structure, and shed new light on how human language operates both within speakers and over time.
This project is for a special conference session, Language Processing and Language Evolution, to be held in conjunction with the 30th Annual CUNY Conference on Human Sentence Processing. The CUNY Human Sentence Processing conference is the premier event in North America for scientists interested in how humans comprehend and produce language. The conference regularly receives well over 300 abstracts for roughly 35 oral presentation slots and 150 poster slots. Approximately 400 scientists (faculty, postdocs, graduate and undergraduate students) attend the conference as audience members and presenters. Conference attendees come from all over the United States and Canada, and from Europe, Australia, East Asia and Latin America. The conference has been remarkably successful as an interdisciplinary forum, drawing researchers from the fields of linguistics, psychology, computer science, education, neuroscience, and philosophy. The special session is designed to extend our current knowledge of language processing by providing a richer characterization of the linguistic systems that humans learn and use, from the perspective of models of language evolution: how these linguistic systems got to be the way they are. Workshop participants will explore how human languages have and have not been shaped by information processing constraints. The ultimate goal is to create a more realistic and representative picture of diversity among languages and of how the human mind represents and processes those languages.
|
1 |
2020 — 2023 |
Gibson, Edward |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Evaluating Meaning-Based Explanations of Syntactic Island Effects Cross-Linguistically @ Massachusetts Institute of Technology
A core problem in understanding constraints on syntax in human languages is understanding constraints on long-distance dependencies, such as in ?Which sportscar did the color of __ delight the baseball player?? The long-distance dependency between the wh-phrase (a question word or phrase beginning with ?wh?) ?which sportscar? and its interpretation position following ?color of? is not acceptable in English. This kind of observation led Chomsky to propose that constraints on long-distance dependencies may be innate and unlearnable: part of Universal Grammar. An alternative approach proposes that the difficulty of a long-distance dependency depends on the discourse status of the extracted element in the construction at hand: the extracted element is a focus (corresponding to new information) in wh-questions, but not in relative clauses. The current project investigates predictions of the two kinds of accounts. The results will inform theories about the form of extraction constraints in English and French, and human language more generally. This project will provide interdisciplinary training to a postdoctoral student and to undergraduate students, including under-represented minorities. This research will build connections between the fields of experimental psychology, cognitive science and linguistics.
The research to be conducted here consists of behavioral acceptability experiments in English and French, in cases of extraction out of subjects, and in extraction out of adjuncts, which have also been claimed to be difficult to extract from in the syntactic literature. For example, according to a discourse-based theory, extractions from subjects will improve even in wh-questions if the extraction position can be marked as a focus, e.g., with contrastive focus markers like ?even? and ?only?, as in ?Which sportscar did even the color of __ delight the baseball player??
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2021 — 2024 |
Gibson, Edward Levy, Roger [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Compcog: Noisy-Channel Processing in Human Language Understanding @ Massachusetts Institute of Technology
Every day we understand hundreds of sentences that we have never encountered and we produce hundreds more. This success is remarkable given the noisy environments in which language takes place, the errors speakers make, and limitations of our memory and attention. The present project develops and tests a theory of robust language understanding. The investigators combine tools of information theory, natural language processing, linguistics, and experimental psychology to provide a mathematically formalized model of human language comprehension as probabilistic inference over a “noisy channel”. The project contributes to our basic scientific understanding of human language and the human mind, while strengthening bridges between psycholinguistics and contemporary artificial intelligence research. The work has wide-ranging long-term potential to enhance our understanding of healthy cognitive performance and development in the area of language and to identify and guide treatments for developmental and acquired language disorders.
In this program of research, the investigators develop a computationally and algorithmically precise theory of how human understanding of sentences unfolds moment-by-moment. This incremental noisy-channel theory is implemented using state-of-the-art symbolic and neural network-based approaches to modeling language from artificial intelligence and natural language processing. A key component includes an account of how the distributional statistics of language shape noisy memory representations used during real-time language processing. Distinctive empirical predictions regarding robustness to errors in the linguistic input and regarding when and how the proposed mechanisms influence comprehension, allow this approach to be evaluated relative to alternative psycholinguistic theories. The predictions are tested using controlled behavioral experiments on how native speakers process and interpret linguistic input.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |