2008 — 2013 |
Nenkova, Ani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri-Medium: Collaborative Research : Corpus-Based Studies of Lexical, Acoustic, and Discourse Entrainment in Spoken Dialogue @ University of Pennsylvania
Participants in human-human conversation often entrain to one another, adopting the vocabulary and other behaviors of their partners. Evidence of this has been found from laboratory studies and observations of real life situations. We are investigating many types of entrainment in two large corpora of human-human conversations to improve system behavior in Spoken Dialogue Systems (SDS). We want to discover which types of entrainment occur generally across speakers and which seem to be speaker-specific, which types of entrainment can be reliably linked to task success and perceived naturalness, and which types of entrainment can be automatically modeled in SDS. Our research has importance for the construction of better SDS. Currently, research SDS have attempted to entrain users to system vocabularies to improve speech recognition accuracy: Since users are likely to employ the same vocabulary in their answers that systems use in their queries, systems have a better chance of recognizing user input correctly if they can predict word usage. However, there has been little attempt to create SDS that entrain to user behavior, despite evidence that human beings rate humans and systems that behave more like them more highly than those that do not. Our work focuses on determining which types of system entrainment to users will be most important to users and most feasible for SDS. Our results will be disseminated through papers and presentations at speech and language conferences. We will also provide publicly available annotated corpora for future research by others.
|
1 |
2010 — 2017 |
Nenkova, Ani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Capturing Content and Linguistic Quality in Automatic Extractive and Abstractive Summarization @ University of Pennsylvania
This CAREER proposal deals with the development of novel systems for automatic summarization which incorporate both linguistic and content quality considerations in their operation. The main motivation for the work is that even the best current systems do not take the characteristics of the input into account during their operation, they cannot estimate how successful they perform content selection, and completely ignore issues of linguistic quality of the output.
Improvement of linguistic quality of summaries requires a combination and relative assessment of a wide range of text quality factors: discourse relations, topic/entity/word coherence, form of referring expressions, vocabulary. Tools for automatic extraction of such models from the input text, including automatic discourse analysis of explicit and implicit discourse relations, are developed as part of the project. The resulting models of linguistic quality will have broader impact on a whole range of text producing applications including questions answering, machine translation, automatic essay grading and computer-assisted writing tutoring.
Improvement of content quality requires taking into account characteristics of the input. In particular, we develop measures of input difficulty, which enable systems to automatically predict if they can produce a good quality summary for a given input and permit for change of summarization strategy when necessary. Specialized summarization strategies for input types where current system performance is known to be suboptimal are also elaborated.
Text quality and summarization are research topics with cross-disciplinary appeal. The PI will offer project-based courses at the undergraduate and graduate level which have the potential to attract young people to the field of computer science.
|
1 |
2011 — 2014 |
Nenkova, Ani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ci-P: Collaborative Research: Summarizing Opinion and Speaker Attitude in Speech @ University of Pennsylvania
As part of this planning project, the PIs test the feasibility of collecting a corpus of conversational speech including both broadcast and telephone conversations. The corpus is annotated to support research in extractive and abstractive summarization of opinion and attitude in speech. The goal of the pilot annotation effort is the adaptation and refinement of current opinion and attitude annotation schemes for conversational data. The PIs are also organizing a workshop to be held at the 2011 meeting of the Association for Computational Linguistics during which they plan to solicit annotation desiderata and feedback from researchers who are the future users of the resource. The pilot annotations include abstractive and extractive summaries, rich mark-up with existing automatic tools for prosodic event detection, discourse relations, topic words and extractive summaries from current baselines.
It is increasingly important to track opinions and information on a wide spectrum of issues, and increasingly difficult to do so in the face of enormous amounts of information in textual and audio form. However, speech data is notoriously hard to search using current technologies, so developing new tools for this type of speech search is particularly important. The corpus the PIs are developing will make possible the development of automatic techniques to deal with this problem. Users will benefit from better tools to identify and summarize opinions that concern their daily choices related to health, diet, purchases, and the environment.
|
1 |
2012 — 2013 |
Nenkova, Ani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Naacl-Hlt 2012 Student Workshop @ University of Pennsylvania
The NAACL HLT conference is the major international conference in North America in the field of natural language processing. The goal of this grant is subsidize travel, conference, and housing expenses of students selected to participate in the NAACL HLT Student Research Workshop which will be held during the conference June 3-8 in Montreal, Canada. The workshop aims to attract papers both from authors who are in their early stages of academic work (possibly undergraduate and masters students) as well as from students who are approaching graduate and would like to present their thesis work. The goal is to help attract students from the first group to pursue further academic work, and to help students from the second group in their job search and career planning.
Papers from the student workshop are presented as posters during the main poster session of the conference. Senior researchers are assigned as mentors to each student and provide individual feedback. A general session on on how to review papers and what to expect from reviewers is held during a lunch slot. The workshop is organized and run by students.
The Student Research Workshop provides a valuable opportunity for the next generation of natural language processing researchers to enter the computational linguistics community. It allows the best students in the field to take their first important step toward becoming professional computational linguists by receiving critical feedback on their work from external experts, and by making contacts with other students and senior researchers in their field. The students who are involved in running and selecting papers for the workshop also gain valuable opportunities for professional growth and interaction with the researchers on the organizing committee of the main conference. The workshop contributes to the maintenance and development of a skilled and diverse computational linguistics and natural language processing research community.
|
1 |
2016 — 2017 |
Ives, Zachary (co-PI) [⬀] Nenkova, Ani Wallace, Byron Casey |
UH2Activity Code Description: To support the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Crowdsourcing Mark-Up of the Medical Literature to Support Evidence-Based Medicine and Develop Automated Annotation Capabilities @ Northeastern University
? DESCRIPTION (provided by applicant): Evidence-based medicine (EBM) promises to transform the way that physicians treat their patients, resulting in better quality and more consistent care informed directly by the totality of relevant evidence. However, clinicians do not have the time to keep up to date with the vast medical literature. Systematic reviews, which provide rigorous, comprehensive and transparent assessments of the evidence pertaining to specific clinical questions, promise to mitigate this problem by concisely summarizing all pertinent evidence. But producing such reviews has become increasingly burdensome (and hence expensive) due in part to the exponential expansion of the biomedical literature base, hampering our ability to provide evidence-based care. If we are to scale EBM to meet the demands imposed by the rapidly growing volume of published evidence, then we must modernize EBM tools and methods. More specifically, if we are to continue generating up-to-date evidence syntheses, then we must optimize the systematic review process. Toward this end, we propose developing new methods that combine crowdsourcing and machine learning to facilitate efficient annotation of the full-texts of articles describing clinical trials. These annotations will comprise mark-up of sections of text that discuss clinically relevant fields of importance in EBM, such as discussion of patient characteristics, interventions studied and potential sources of bias. Such annotations would make literature search and data extraction much easier for systematic reviewers, thus reducing their workload and freeing more time for them to conduct thoughtful evidence synthesis. This will be the first in-depth exploration of crowdsourcing for EBM. We will collect annotations from workers with varying levels of expertise and cost, ranging from medical students to workers recruited via Amazon Mechanical Turk. We will develop and evaluate novel methods of aggregating annotations from such heterogeneous sources. And we will use the acquired manual annotations to train machine learning models that automate this markup process. Models capable of automatically identifying clinically salient text snippets in full-text articles describing clinical trials would be broadly useful for biomedical literature retrieval tasks and would have impact beyond our immediate application of EBM.
|
0.954 |
2017 — 2019 |
Nenkova, Ani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Predicting Domain-Level Reading Comprehension Difficulty to Support Adult Learning @ University of Pennsylvania
Regardless of their level of education, participants in the modern workforce are expected to be flexible in their ability to read and learn in new, often technical domains, including scientific subfields, medicine, policy, and care. Remarkably, however, there is little, if any, technological support for such learning, which often is dynamic and requires reading texts as needed rather than in an ordered curriculum as in classroom learning. This Early Grant for Exploratory Research addresses the need for technological support for adult learning in technical domains by exploring the feasibility of simulating the behavior of an expert reader in identifying important content and drawing inferences to connect rich background knowledge and the text at hand. Readers will be provided access to pilot implementations of the inferences and importance judgements of the simulated expert, to focus their attention and strengthen their comprehension of the text.
This development of exploratory models of expert reader behavior relies on techniques for characterizing text vocabulary into technical and plain language by contrasting word occurrence statistics in the domain and in general text such as in a random sample of telephone conversations or news. It will also explore the adaptation of techniques for definition mining and for deriving prototypical event sequences and shallow ontologies from large volume of typical domain text.
|
1 |
2017 |
Ives, Zachary (co-PI) [⬀] Nenkova, Ani Wallace, Byron Casey |
UH2Activity Code Description: To support the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Project-001 @ Northeastern University
Evidence-based medicine (EBM) promises to transform the way that physicians treat their patients, resulting in better quality and more consistent care informed directly by the totality of relevant evidence. However, clinicians do not have the time to keep up to date with the vast medical literature. Systematic reviews, which provide rigorous, comprehensive and transparent assessments of the evidence pertaining to specific clinical questions, promise to mitigate this problem by concisely summarizing all pertinent evidence. But producing such reviews has become increasingly burdensome (and hence expensive) due in part to the exponential expansion of the biomedical literature base, hampering our ability to provide evidence-based care. If we are to scale EBM to meet the demands imposed by the rapidly growing volume of published evidence, then we must modernize EBM tools and methods. More specifically, if we are to continue generating up-to-date evidence syntheses, then we must optimize the systematic review process. Toward this end, we propose developing new methods that combine crowdsourcing and machine learning to facilitate efficient annotation of the full-texts of articles describing clinical trials. These annotations will comprise mark-up of sections of text that discuss clinically relevant fields of importance in EBM, such as discussion of patient characteristics, interventions studied and potential sources of bias. Such annotations would make literature search and data extraction much easier for systematic reviewers, thus reducing their workload and freeing more time for them to conduct thoughtful evidence synthesis. This will be the first in-depth exploration of crowdsourcing for EBM. We will collect annotations from workers with varying levels of expertise and cost, ranging from medical students to workers recruited via Amazon Mechanical Turk. We will develop and evaluate novel methods of aggregating annotations from such heterogeneous sources. And we will use the acquired manual annotations to train machine learning models that automate this mark up process. Models capable of automatically identifying clinically salient text snippets in full-text articles describing clinical trials would be broadly useful for biomedical literature retrieval tasks and would have impact beyond our immediate application of EBM.
|
0.954 |