2000 — 2003 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Information Fusion Across Multiple Text Sources: a Common Theory @ University of Michigan Ann Arbor
A common theory of information fusion across multiple text sources is to be developed. Three main tasks are undertaken: (a) robust techniques for identifying structure across sets of related textual documents in arbitrary domains are developed and used to produce graph representations of the document sets, (b) an environment in which users can specify their summarization preferences is created, and (c) graph-based methods are applied to produce personalized multi-document summaries of clusters of the related documents based on the users' priorities.
Cross-document structure is based on features such as paraphrasing, contradiction, change of perspective, and complementation. A large-scale taxonomy of cross-document links is being investigated.
Providing users with personalized abstracts of large amounts of critical textual information is expected to speed up and otherwise facilitate their access to the Web. Large-scale deployment of a Web-based summarization system based on cross-document structure is planned and is expected to be used by millions of users.
|
0.915 |
2002 — 2003 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop On Effective Tools and Methodologies For Teaching Natural Language Processing and Computational Linguistics; Philadelphia, Pa @ University of Michigan Ann Arbor
The goal of this project is to bring together young and experienced college faculty to encourage scholarship in the teaching of computational linguistics. This would take place in a workshop on "Effective Tools And Methodologies For Teaching Natural Language Processing And Computational Linguistics" to be held during the annual conference of the Association for Computational Linguistics on July 7, 2002 in Philadelphia, PA. The workshop will enable new and prospective computational linguistics faculty to learn from their peers and to share resources related to teaching NLP and CL.
|
0.915 |
2003 — 2007 |
Radev, Dragomir Abney, Steven |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Semantic Entity and Relation Extraction From Web-Scale Text Document Collections @ University of Michigan Ann Arbor
This project addresses current limitations in automatic information extraction technology. Specific objectives are to: 1. use bootstrapping techniques to greatly increase the number of types of entities and relations that can be extracted and the rate at which one is able to create new extractors, 2. improve the performance of supervised training for entity and relation extractors by using bootstrapping to add additional training features and by applying new supervised learning techniques, including new perceptron and discriminative training techniques, 3. address meta-data issues of provenance, confidence, and temporal extent of facts, focussing particularly on the construction of a model of the expected lifetime of facts based on a longitudinal corpus of Web data.
The outcome of the project will be scientific understanding and technology for automatic information extraction from free text, making it possible to convert large document collections into formal databases suitable for automated processing. This will represent a significant enhancement in the utility and societal benefit of digital libraries and the World Wide Web. Project results will be disseminated in the form of publications and publicly available code for information extraction and learning of extractors.
|
0.915 |
2003 — 2006 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Probabilistic and Link-Based Methods For Exploiting Very Large Textual Repositories @ University of Michigan Ann Arbor
This research project addresses the disconnect between the way in which humans ask questions on the Web and the existing interfaces to the state-of-the-art search engines. Search engines require online searchers to formulate their requests in idiosyncratic query languages whose syntax is unnatural and hard to learn by typical users. Furthermore, existing search engines are notoriously bad at returning documents which do not contain any of the terms given by the user and yet which were retrieved as relevant to the user's information need. The proposed work focuses on two areas of research: (1) probabilistic question-to-query transformation (query modulation) for Web access and (2) models of content transfer over web links. The approach for (1) involves designing and evaluating algorithms and systems for automatic, rule-based conversion of natural language queries to the language of specific search engines. Part (2) facilitates retrieval of relevant Web documents by virtue of the links from other relevant documents to them. The expected outcomes and impact of this project are threefold: (1) a better understanding of the interaction between document retrieval and question-answering in a Web environment, (2) better models describing how document relevance is transferred over the Web hypergraph, and (3) better algorithms for natural language access to the Web which will make it easier for millions of web users to find information that they need in a timely, accurate, and intuitive way. All findings and artifacts developed under this grant will be widely disseminated and incorporate into a public-domain search engine, and the results will be accessible via the project Web site (http://tangra.si.umich.edu/clair).
|
0.915 |
2005 — 2007 |
Quinn, Kevin Monroe, Burt Radev, Dragomir Abney, Steven Colaresi, Michael (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dhb: the Dynamics of Political Representation and Political Rhetoric @ University of Michigan Ann Arbor
The verbatim records of democratic legislatures [or "parliaments", "representative assemblies", "councils", etc.] represent a source of untapped information of unique importance for the study of both democratic societies and language over multiple time scales. For linguists, there is no other place where we have a systematic minute-by-minute record of the spoken word exchanged by a slowly changing and overlapping set of individuals over significant lengths of time. For political scientists, these are also unique sources of deliberations by elected political representatives about the issues of the day. As such, legislative records provide exceptional opportunities for studying the dynamics of language and rhetoric, of democratic politics and representation, and of their interactions, over time scales ranging from minutes to centuries.
The record of a single legislature can, however, run to thousands of pages in a single day. It is impossible for any one person to read, much less absorb or analyze, the entire record of a legislature as quickly as it is produced. Increasing availability of these records in electronic form, however, opens possibilities for various forms of computerized analysis. This multidisciplinary project applies and advances recent developments in computer science, information science, and statistics - for natural language processing in particular and statistical learning from massive databases in general - to the analysis of legislative records from democracies worldwide, illuminating important questions of dynamics of political representation and political rhetoric.
In the first stage of the project, new corpora (linguistic databases) will be developed from legislative records. Using data reduction techniques including scaling, classification, and summarization, novel statistical and computational methodologies for the dynamic analysis of parliamentary language and parliamentary speakers will be developed and refined.
In the second stage of this project, these data and techniques will be applied to questions of importance to linguistics, political science, and social science more generally. In all cases, a wide range of democratic legislatures (from subnational to international), in a wide range of languages, over multiple time scales will be examined. Examples of these important questions include:
What can legislative speech tell us about the role of political parties in a democracy? When do they compete, when do they cooperate, when do they polarize, and on what issues?
What can legislative speech tell us about democratic representation? How and when are new issues incorporated into the agenda of a legislature? When and whom do representatives lead; when and whom do they follow?
What can legislative speech tell us about individual democratic representatives? Do they change their rhetorical behavior in response to citizen preferences, to career motivations, or not at all? Does gender or group identity affect rhetorical choices?
What can legislative speech teach us about language itself? How has the political content of language changed over the last two centuries? Do legislative debates actually involve exchanges of information or persuasion?
How do events and political rhetoric interact? When do events cause a shift in political rhetoric? When does talk about policy or other political change translate into actual change? Is it predictable?
The broader impacts of this multidisciplinary international project include enhanced scientific infrastructure in the form of new software and data for the study of politics and language, enhanced public infrastructure for monitoring what are now impossibly large records of democratic institutions, and new statistical and computational techniques for analyzing large-scale textual databases in general applications. This research will ultimately lead to an increased ability to understand and forecast political and policy changes around the world as well as a greater understanding of how language affects politics, how politics affects language, and how the interaction between them affects democracy.
|
0.915 |
2006 — 2009 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Blogocenter - Infrastructure For Collecting, Mining and Accessing Blogs @ University of Michigan Ann Arbor
The BlogoCenter project is a collaborative effort (0534784, Junghoo 'John' Cho, University of California-Los Angeles and 0534323, Dragomir Radev, University of Michigan Ann Arbor) with a goal to develop innovative technologies for building a system that (1) continuously monitors, collects, and stores personal Weblogs (or blogs) at a central location; (2) discovers hidden structures and trends automatically from the blogs; and (3) makes them easily accessible to general users. By making the new information on the blogs easy to discover and access, this project is helping blogs realize their full potential for societal change as the "grassroots media." It is also collecting an important hypertext dataset of human interactions for further analysis by the social and behavioral sciences research communities.
In developing such a system, the project investigates new research challenges in three areas: (1) novel monitoring algorithms that discover and download new information from rapidly-changing distributed sources with minimal delay; (2) new text and graph mining techniques appropriate for large-scale hypertext corpora; and (3) novel text ranking and summarization algorithms to help the users access new and high-quality information quickly from the rapidly-evolving blogs.
The project will make a significant impact to the scientific community by making the collected datasets and the source code of the prototype available to other research groups via the Web (http://www.eecs.umich.edu/~radev/blogocenter), accelerating progress in the blog-related research. The new research findings will be disseminated via scientific conferences and journals, spurring significant advancements in distributed Web-source monitoring, text summarization and ranking, and large-scale text and graph mining. In addition, this project will support graduate and undergraduate student research and foster cross-institution collaboration.
|
0.915 |
2008 — 2011 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative: Sger: New Problem Genres For the North American Computational Linguistics Olympiad @ University of Michigan Ann Arbor
Abstract NACLO is a high school Olympiad contest in linguistics and language technologies. Formulating contest problems is not straightforward because the contest does not have pre-requisites. Because students are not expected to know linguistics, specific languages, advanced math, or computer programming, the problems must be self-contained, and yet must still test the skills such as algorithmic thinking, abstractly representing a solution space, reducing a solution space, and evaluating a solution. These skills represent computational thinking, the part of computer science that does not involve computers. Some problems also introduce tools of linguistics and language technologies such as finite state machines and context free grammars. So more generally, this Small Grant for Exploratory research tackles the new problem of designing a curriculum of training sessions and contest problems that introduce high school students to computational thinking as it applies to the processng of human languages. The importance of the project is to inspire students to study linguistics and computer science and to increase participation and diversity in those fields.
|
0.915 |
2010 — 2012 |
Radev, Dragomir |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Eager: Computational Thinking Olympiad @ University of Michigan Ann Arbor
This project focuses on developing the infrastructure for a self-sustaining organization that can manage, grow, and evangelize olympiads that involve young students (middle and junior high) in computational thinking. A large component is the creation of pilot olympiads in a select few cities in the United States. Specific goals of this project include: (1) identifying a set of foundational skills that underlie computational thinking that can be taught before college and high school; (2) identifying a style of problems and scenarios that engage a wide variety of students; and (3) implementing a curriculum of training sessions and contest questions that exemplify those foundational skills.
There are two broad reasons for creating a Computational Thinking Olympiad. First, to expose the fundamentals of computational thinking to a broad audience of potential researchers and practitioners in the field, thus increasing participation and diversity in computing. Second, to ensure long-lasting impact beyond of this project.
The success of the Computational Thinking Olympiad will have a significant impact on our society by introducing middle school students to computational thinking in its breadth and depth: (1) encouraging students to have fun with the computational thinking in an arena that is both cooperative and competitive; (2) encouraging students to pursue education in computing; (3) introducing the unplugged parts of computing to those who have not had access to the plugged-in parts; and (4) showing that computational thinking is not ``just'' programming.
|
0.915 |
2010 — 2015 |
Resnick, Paul (co-PI) [⬀] Radev, Dragomir Sami, Rahul (co-PI) [⬀] Mei, Qiaozhu |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Socs: Assessing Information Credibility Without Authoritative Sources @ University of Michigan Ann Arbor
Rumors, smears, and conspiracy theories can now spread quickly through email, blogs, and other social media. Recipients of such messages may not question their validity. Moreover, even upon careful investigation and reflection, not everyone will agree about the validity of particular claims. This project will develop tools that help people make personal assessments of credibility. Rather than relying on particular sources as authoritative arbiters of ground truth, the goal is to minimize the amount of "social implausibility." That is, the tool will identify assertions that are disbelieved by "similar" people (those who, after careful consideration, someone tended to agree with in the past) or come from sources that someone has tended to disagree with. A text mining system for online media will be developed to extract controversial assertions and the beliefs expressed by users about those assertions. Comparisons of beliefs about common assertions, and retractions or updates to beliefs, will be tracked as part of personalized reputation measures.
This work is the first attempt to formally address the automatic assessment of information credibility based on text mining and social computational systems. The techniques will provide the solution to many challenging research problems in information retrieval and reputation networks. The techniques are broadly applicable to other domains where the credibility of content and reputation of sources is a concern, to help a broad class of information consumers. Prototype tools will be released freely and demonstrated in high schools, thereby building awareness of the diversity of beliefs around topics of public interest.
|
0.915 |