2000 — 2001 |
Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop: Student Research in Computational Linguistics, At the Acl'2000 Conference @ University of Maryland College Park
This is funding to subsidize expenses of student participants in the Student Research Workshop organized in conjunction with the Association for Computational Linguistics Conference (ACL'2000), which was held October 3-6, 2000, in Hong Kong. The Association for Computational Linguistics (ACL) is the primary international organization in the field of natural language processing and language engineering, with two regional chapters, Europe (EACL) and North America (NAACL), of approximately equal size. The Association's annual conference, which rotates between North America and Europe, is the major international meeting in the field, and was held for the first time this year in a Pacific Rim country. The workshop format allows students sufficient time to present their research (25 minutes) and receive feedback from a panel of established researchers in the field (15 minutes). It will provide students with invaluable exposure to outside perspectives on their work at a critical time in their research, through feedback from the panel and other student participants. The ACL Student Workshop is an inexpensive yet highly effective means of encouraging young and upcoming computational linguists. The intimate format encourages the student participants to begin building a rapport with established researchers. This nurturing effort should pay dividends by more effectively guiding students in this rapidly changing research field. In addition, by building a supportive environment for these students, it is more likely that down the road they will in turn lend a supporting hand to other students who follow.
|
1 |
2001 — 2004 |
Dorr, Bonnie (co-PI) [⬀] Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Proposal-Using the Web as a Corpus For Empirical Linguistic Research @ University of Maryland College Park
. This project will develop tools that make it possible to retrieve naturally occurring sentences from the World Wide Web on the basis of lexical content and syntactic structure, providing linguists with an immediate, easily accessible source of raw linguistic data. The PIs will investigate specific linguistic hypotheses at the lexical semantics/syntax interface as an illustrative application of these tools. At a high level, the planned work constitutes an important step toward a new paradigm for linguistic research. Rather than relying entirely on introspective data generated by the linguist who is trying to (dis)prove a particular hypothesis, Web-enabled linguistics research will draw on the methodology and the tools developed by the PIs to supply naturally occurring data on which theories can rest. With regard to specific linguistic questions, the goal is to provide an explanation of the rules and constraints that govern three transitivity alternations (Middle, Unaccusative, Unspecified Object Deletion), and the PIs expect data made available by their tools to shed light on the "grey" area between competence and performance, that is, the linguistic behavior that seems to fall outside of rule-governed behavior. Although naturally occurring data are not accorded great emphasis in generative syntax, the use of text corpora has a tradition in the greater linguistic enterprise. An explosive new phenomenon in the world of naturally occurring text, the World Wide Web is an essentially untapped resource that embodies the rich and dynamic nature of language, presenting a data resource of unparalleled size and diversity.
|
1 |
2003 — 2007 |
Druin, Allison [⬀] Preece, Jennifer (co-PI) [⬀] Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Technology For Cross-Cultural Communication in a Children's International Book Community @ University of Maryland College Park
Every day news media report misunderstandings, intolerance or outright aggression between people from different cultures. Age-old disputes over land, water, religious and cultural practices fuel intolerance. Children absorb the ambiance, culture and attitudes of their community. Consequently, cycles of intolerance pass from generation to generation. But there is some hope for change: research has shown that sharing personal experiences can change attitudes. When it comes to developing tolerance, early intervention is best. The multicultural education literature highlights the effectiveness of children interacting with children from other cultural groups, as well as reading children's literature from other cultures. The aim of this project is to develop technology and social structures needed for children who speak different languages to learn with and from each other in a digital community setting centered on children's books. The PIs will address two closely linked technical challenges. The first is the design of child-friendly user interfaces to support online interaction and communication between children; this will be addressed via cooperative inquiry, in which design teams (including researchers and child participants) gather field data, initiate ideas, and test and develop new prototypes. The second is the problem of translation; this will be addressed by involving the communicating children themselves as collaborative participants in an innovative approach to developing language technology resources that support cross-language communication without full-scale machine translation. A combination of methods will be used to evaluate the proposed research. Data to be gathered in classrooms will include understanding changes in how children from different cultures use new technologies to communicate, interpret stories, develop collaborative narratives, and see other cultures. This research will leverage the ongoing NSF ITR-funded International Children's Digital Library (ICDL), with which the PIs are involved, in order to provide access to books that create common ground for children's "ICDL communities." This project will add value to ICDL by contributing technologies that give children who speak different languages an opportunity to interact. The intellectual merit of this research includes: new interfaces that enable children who speak different languages to communicate safely with each other in ways that are comfortable for them; advances in translation technology; and evaluation of the impact of children's experiences in ICDL communities.
Broader Impacts: The broad impacts deriving from this research include two significant contributions: a model online community using technology to advance the cause of tolerance and understanding; and a technology test-bed that will be made available to the scientific community in order to lower the barriers to entry for further development of cross-cultural online communities for children and adults.
|
1 |
2008 — 2011 |
Resnik, Philip Lin, Jimmy [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Putting the Clouds in Context: Statistical Machine Translation With Mapreduce @ University of Maryland College Park
Statistical machine translation (SMT) promises to bridge the language divide in today's multi-cultural and multi-faceted society. Systems capable of converting text from one language into another have the potential to transform how diverse individuals and organizations communicate. Despite recent successes, we see two critical impediments to continued progress in translation technology: (1) the development of systems depends on access to large amounts of data, and the growth of available resources has far outpaced increases in the performance of individual computers; and (2) current systems for the most part do not take the context of what they are translating into account. With few exceptions, systems translate sentence by sentence, and do not differentiate whether the input text is a newswire article or a children's book. This project advances the state of the art in SMT by addressing both issues. Since divide-and-conquer techniques running on multiple processors are currently the only practical solutions to large-data problems, we must develop scalable algorithms that can exploit large computer clusters. MapReduce is an attractive framework for tackling these challenges since it hides low-level distributed processing issues such as synchronization, fault tolerance, etc., allowing the researcher to focus on actually solving the problem. By coupling network analysis with cross-language information retrieval techniques, we can build rich, multilingual contextual models that will guide an SMT system in translating different types of text. We focus on cross-language enrichment of Wikipedia as an application for demonstrating this technology. Although Wikipedia has emerged as a valuable repository of human knowledge, it has yet to transcend the language barrier. For the most part, contributors work in silos defined by languages, without the benefit of knowledge that is being accumulated elsewhere. The potential broader impact of this project is no less than knowledge dissemination across language boundaries, which will serve to enrich the lives of all the world's citizens.
|
1 |
2008 — 2010 |
Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Exploiting Alternative Packagings of Source Meaning in Statistical Machine Translation @ University of Maryland College Park
SGER: Exploiting Alternative Packagings of Source Meaning in Statistical Machine Translation
Current approaches in statistical machine translation (MT) miss a key fact: the source language sentence is not the only way the author's meaning could have been expressed. The idea that the source sentence is just one of various ``packagings'' of underlying meaning was, of course, one familiar motivation for interlingual approaches to translation; however, interlingual semantic representations have generally been abandoned as notoriously difficult to define, and equally difficult to obtain accurately with broad coverage once defined. In this project, we are revisiting the idea of "packagings" of meaning, but exploring it in practical ways consistent with current practice in statistical MT. Unlike semantic transfer or interlingual approaches, we encode alternatives as source paraphrase lattices, a representation that allows us to exploit generalizations about the source language while still maintaining the surface-to-surface orientation that characterizes the statistical state of the art. Our exploratory work focuses on capturing syntactic and semantic variation using Lexicalized Well Founded Grammars (LWFG), a recent formalism that balances expressiveness with practical and provable learnability results. We are quantifying and characterizing the information available in source paraphrase lattices, assessing the value of shallow paraphrasing, and exploring the relative promise of deeper techniques for source paraphase generation using LWFG and other constraint-based grammatical frameworks. The ability to capture generalizations via source paraphrase may open new possibilities in the translation of minority and endangered languages, which lack training corpora on the scale necessary to support standard statistical MT techniques.
|
1 |
2009 — 2013 |
Bederson, Benjamin [⬀] Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cdi-Type I: Translation as a Collaborative Process @ University of Maryland College Park
Natural language translation remains a crucial problem that is expensive, slow to develop solutions for, and difficult to scale. While automated approaches often result in understanding the gist, fully automated high quality translation remains far out of reach for the vast majority of the world's languages. A variety of projects are now emerging that tap into the Web-based community of people willing to help translate, but bilingual expertise is quite rare compared to the total availability of volunteers. This project will investigate whether a combination of machine translation and human participants that speak only a single language (i.e., monolingual speakers) can result in high quality translation. The research is organized around development of an iterative protocol that combines elements of machine translation, human and semi-automated language annotation, and human correction, motivated by concepts in information theory and discourse analysis. This research framework will support both synchronous and asynchronous pairwise interaction among human participants as well as a "bag of tasks" approach that permits truly distributed human computation.
With respect to broader impacts, this project is among the first to investigate the potential of hybrid human/machine translation involving non-bilingual human participants, combining practical implementation with empirically driven experimentation. If successful, this project will lower the bar for translation of natural languages, resulting in a widely available approach offering high quality translation for an unprecedentedly wide range of language pairs while reducing requirements and costs for bilingual expertise. The technology to be developed will be evaluated on a real-world problem: translation of books within the (previously NSF-funded) International Children's Digital Library project (www.childrenslibrary.org). The ICDL currently contains 4,000 books in 60 languages and has an active user population including 1,000 volunteers with differing language skills who are interested in helping with translation. Participants in Mexico, Romania, Mongolia, and the U.S. will act as early adopters in K-12 educational settings, supporting the ICDL's goal of enabling greater shared cultural understanding through this existing and growing resource.
|
1 |
2010 — 2014 |
Resnik, Philip Lin, Jimmy [⬀] Boyd-Graber, Jordan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dc: Small: Cross-Language Bayesian Models For Web-Scale Text Analysis Using Mapreduce @ University of Maryland College Park
The Web promises unprecedented access to the perspectives of an enormous number of people on a wide range of issues. Turning that still untamed cacophony into meaningful insights requires dealing with the linguistic diversity and scale of the Web. Most current research focuses on specialized tasks such as tracking consumer opinions, and virtually all current research treats the Web as both monolithic and monolingual, ignoring the variety of languages represented and the rich interplay between topics and issues under discussion.
This project moves the state of the art forward by focusing on two key challenges. First, highly-scalable MapReduce algorithms for linguistic modeling within a Bayesian framework, making use of variational inference to achieve a high degree of parallelization on Web-scale datasets. Second, novel Bayesian models that learn consistent interpretations of text across languages and a wide range of response variables of interest (for example, views on an issue, strength of emotion relative to an event, and focus of attention).
The techniques developed in this project will be demonstrated on large crawls of Web pages and blogs. Potential applications for these technologies include helping a schoolchild learn that people in different countries may view some issues very differently, helping a politician understand how constituents are reacting to proposed legislation, or helping an intelligence analyst understand how public opinion is evolving in a hostile country.
For further information see the project Web page: http://www.umiacs.umd.edu/~jimmylin/cloud-computing
|
1 |
2012 — 2015 |
Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Socs: Collaborative Research: Data Driven, Computational Models For Discovery and Analysis of Framing @ University of Maryland College Park
This project studies framing, a central concept in political communication that refers to portraying an issue from one perspective with corresponding de-emphasis of competing perspectives. Framing is known to significantly influence public attitudes toward policy issues and policy outcomes. As social media allow greater citizen engagement in political discourse, scientific study of the political world requires reliable analysis of how issues are framed, not only by traditional media and elites but by citizens participating in public discourse. Yet conventional content analysis for frame discovery and classification is complex and labor-intensive. Additionally, existing methods are ill-equipped to capture those many instances when one frame evolves into another frame over time.
This project therefore develops new computational modeling methods, grounded in data-driven computational linguistics, aimed at improving the scientific understanding of how issues are framed by political elites, the media, and the public. This collaboration between political scientists and computer scientists has four goals: (a) developing novel methods for semi-automated frame discovery, whereby computational models guided by political scientists? expert knowledge speed up and augment their analytical process; (b) developing novel algorithms based on natural language processing for automatic frame analysis, producing measurably accurate results comparable with reliable human coders; (c) establishing the validity of these processes on well-understood cases; and (d) applying these methods to several current policy issues, using data across years and across traditional and social media streams. The resulting evolutionary framing data will help unpack the mechanisms of framing and help predict trends in public opinion and policy.
|
1 |
2020 — 2021 |
Resnik, Philip |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Rapid: Advanced Topic Modeling Methods to Analyze Text Responses in Covid-19 Survey Data @ University of Maryland College Park
As the COVID-19 pandemic continues, public and private organizations are deploying surveys to inform responses and policy choices. Survey designs using multiple choice responses are by far the most common -- "open ended" questions, where survey participants provide a longer-form written response, are used far less. This is true despite the fact that when you allow people to provide unconstrained spoken or text responses, it is possible to obtain richer, fine-grained information clarifying the other responses, as well as useful ?bottom up? information that the survey designers did not know to ask for. A key problem is that analyzing the unstructured language in open-ended responses is a labor-intensive process, creating obstacles to using them especially when speedy analysis is needed and resources are limited. Computational methods can help, but they often fail to provide coherent, interpretable categories, or they can fail to do a good job connecting the text in the survey with the closed-end responses. This project will develop new computational methods for fast and effective analysis of survey data that includes text responses, and it will apply these methods to support organizations doing high-impact survey work related to COVID-19 response. This will improve these organizations? ability to understand and mitigate the impact of the COVID-19 pandemic.
This project?s technical approach builds on recent techniques bringing together deep learning and Bayesian topic models. Several key technical innovations will be introduced that are specifically geared toward improving the quality of information available in surveys that include both closed- and open-ended responses. A common element in these approaches is the extension of methods commonly used in supervised learning settings, such as task-based fine-tuning of embeddings and knowledge distillation, to unsupervised topic modeling, with a specific focus on producing diverse, human-interpretable topic categories that are well aligned with discrete attributes such as demographic characteristics, closed-end responses, and experimental condition. Project activities include assisting in the analysis of organizations' survey data, conducting independent surveys aligned with their needs to obtain additional relevant data, and the public release of a clean, easy to use computational toolkit facilitating more widespread adoption of these new methods.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2022 |
Resnik, Philip Miler, Kristina (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Modeling Co-Decisions: a Computational Framework Using Language and Metadata @ University of Maryland College Park
In many settings, entire groups are presented with a decision -- for example, a set of legislators can be presented with a bill to vote on, a set of scientific authors can decide whether or not to cite a piece of research in their papers, or a set of social media users might decide whether or not to share a piece of online content. It is very standard to look independently at choices made by individuals, but a more complete scientific understanding of decision-making can be obtained by looking at the decision process in terms of whether individuals will make the same decision or not, taking into account what the individual deciders do and do not have in common. This project is looking at that question by developing new computational methods to help better understand what goes into the decisions people make.
The project begins with computational models of co-voting in political contexts, moving from ''how does an individual vote, and why?'' to ''do these individuals vote the same way, and why?''. It generalizes and extends such models, going beyond established factors such as party and state, by enabling incorporation of unstructured language from bills to characterize the issues under consideration and to incorporate analysis and comparison of individuals' language. The extended framework will be validated by demonstrating improved predictive performance on datasets derived from proceedings of the U.S. Congress, making possible direct evaluation against prior work and enabling new substantive analyses of political rhetoric and decision making. In the process, the project also develops richer analysis of individuals' language using techniques that identify interpretable, task-relevant language and by incorporating recently developed methods for incorporating covariates into topic analysis. These advances will be validated by incorporating them within the extended co-voting framework, and will also contribute to the investigation of substantive questions about Congressional decision-making. Finally, the project will address more general use cases by applying the approach beyond the political domain, moving from modeling of co-voting to modeling co-decisions, where a decision is a generalized vote. The generalized model will be validated via application to the problem of scientific citation recommendation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2021 — 2025 |
Espy-Wilson, Carol [⬀] Resnik, Philip Dickerson, John |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sch: Int: Collaborative Research: Using Multi-Stage Learning to Prioritize Mental Health @ University of Maryland, College Park
According to the World Health Organization and the Global Burden of Disease 2010 studies, mental health issues are a top contributor to global disease and a leading cause of disability worldwide. It is an enormous personal and societal toll. Mental illness is a common precursor to suicide, and suicidality is the second leading cause of death in youth and young adults between 10 and 34 years of age. In economic terms, mental illness exceeds cardiovascular diseases in the projected 2011-2030 cost of noncommunicable diseases (USD16.3T worldwide). Complicating this picture further is the fact that mental healthcare is desperately resource-limited, and clinicians treating people for mental health problems operate in a vacuum between visits. This project proposes a fundamental shift in how machine learning is used to approach the problem of mental health detection and monitoring, with a technological investigation that brings together speech analysis, language analysis, and machine learning research, informed by deep clinical experience and expertise and fueled by ethically collected data. A tiered multiarmed bandit framework will be used to provide a highly flexible way to evaluate multiple kinds of evidence in settings where there can be diverse methods for assessment that vary in cost and the value of the information they provide. As such, it is an excellent fit for the real-world problem of mental health assessment in resource-limited settings. Investigations will include simulations of patient monitoring between clinical visits that will be informed by realistic, real-world assumptions and team members' clinical experience treating patients with schizophrenia, depression, and risk of suicide.
At the core of this project's technical approach is the recognition that the “multi-armed bandit” problem in machine learning is a good fit for the real-world scenario that mental health providers face when monitoring a population of patients in treatment: what is the best way to allocate limited resources among competing choices, given only limited information? This project develops a tiered multi-armed bandit formulation, where a succession of stages is applied to a population of patients in order to best allocate different types of resources, each with different per-patient impact but also cost. Conceptually, tiered approaches are familiar in current medical practice. For example, patient contact typically progresses from a receptionist, to a nurse or intake coordinator, perhaps to a certified nurse practitioner, to a primary care doctor, ultimately to a specialist---each step involving corresponding increases in both the cost of the professional involved and their degree of expertise. The tiered multi-armed bandit model developed by this award includes concerns of stochastic and adverse selection, where patients at one tier do not proceed deterministically to the next, even when explicitly selected. It also incorporates complex (e.g., non-linear such as monotone submodular) objective functions that better capture within-cohort interactions. One core strength of the tiered model is that it provides a flexible way to incorporate multiple kinds of evaluative evidence in settings where there can be diverse methods for assessment that vary in cost and the value of the information they provide. Toward that end, this project also includes both text analysis and speech analysis components that make use of ethically collected language and speech data and clinically validated assessments of mental condition. Techniques developed under this award, while directly motivated by and tested in the mental health setting, will be useful in other settings in both healthcare as well as other settings where a "prioritization funnel" is in play, including talent sourcing and customer acquisition.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |