2003 — 2005 |
Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Exploratory Research of Word Sense Disambiguation Methods For All Words in Open Text @ University of North Texas
Word Sense Disambiguation (WSD) is a core task in natural language processing and is considered essential for major applications like text understanding, common sense reasoning, and machine translation. Previous research on WSD has produced good disambiguation schemes for the relatively few words for which training data has been available. In contrast, there have been few attempts to create systems that disambiguate all words in open text. The goal of this one-year project is to conduct exploratory research of various WSD techniques to enable the development of a tool for semantic tagging of all words in open text.
The methods to be investigated rely on inner and outer representations of word sense. The inner representation comes from examples of word meanings; whereas, the outer representation is given by semantic relations between word senses. These two representations correspond to two different views on word meanings that can be used to derive complementary WSD techniques, which ultimately can be combined, yielding a tool for resolving the semantic ambiguity of all words in open text.
The techniques developed as part of this project are expected to significantly improve the performance level of WSD applications for open text and should have an impact on other important applications in natural language processing.
|
1 |
2004 — 2005 |
Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Workshop: Senseval-3 - Evaluation of Systems For the Semantic Analysis of Text; July 25-26, 2004; Barcelona, Spain @ University of North Texas
This award to subsidize travel, conference, and housing expenses for students participating in the Senseval-3 workshop, held in conjunction with the Association for Computational Linguistics (ACL) meeting on July 25-26, 2004, in Barcelona, Spain. ACL is the primary international organization for researchers in the field of computational linguistics, and Senseval is a major event and international meeting for the ACL Special Interest Group on the Lexicon (ACL-SIGLEX).
The main purpose of Senseval-3 is to analyze and discuss the results of systems participating in the Senseval-3 evaluations, held in March-April 2004. Fourteen different tasks are organized part of Senseval-3, to conduct evaluations of systems that perform automatic semantic analysis of text, including: word sense disambiguation for various languages, identification of semantic roles, logic forms, multilingual annotations, subcategorization acquisition. The participating students will be supported to participate in this international evaluation exercise for their systems and have the potential to gain invaluable feedback from the senior participants.
|
1 |
2008 — 2015 |
Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Semantic Interpretation With Monolingual and Cross-Lingual Evidence @ University of North Texas
Word meanings are central to the semantic interpretation of texts. Although much work to date has focused on statistical approaches that often ignore the explicit understanding of the text, recent research work has begun to challenge this simplification, demonstrating that semantic interpretation is indeed essential for a number of language processing applications.
The key observation underlying this CAREER project is that word meaning distinctions differ from one lexical resource to another and that the optimality of word meaning representations should be dictated by the target application. The project is exploring rich and flexible word meaning representations that combine the benefits of multiple monolingual and cross-lingual lexical resources and that can be adapted to the context and to the target application. In particular, the multilingual nature of these representations allows for an effective exploitation of the knowledge and resources available in different languages. The project also explores the role played by these word meaning representations and the corresponding monolingual and cross-lingual knowledge sources in several natural language processing tasks including lexical substitution, word and text translation, and text-to-text semantic similarity.
Another aim of the project is to integrate natural language processing into educational applications, and explore the use of the word meaning interpretation models to build a comprehension-assistant tool for students of English as a second language (ESL) and English as a foreign language (EFL). The educational program also fosters increased awareness about research in multilingual natural language processing among college, undergraduate, and graduate students, through a college outreach program and a new course on multilingual computational linguistics, as well as increased exposure of students to international experiences through international collaborations.
|
1 |
2008 — 2009 |
Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Collaborative Research: Exploring the Role of Word Senses in Subjectivity Analysis @ University of North Texas
Title: SGER: Collaborative Research: Exploring the Role of Word Senses in Subjectivity Analysis
Approaches to subjectivity and sentiment analysis often rely on manually or automatically constructed lexicons. Most such lexicons are compiled as lists of words, rather than word meanings (senses). However, many words have both subjective and objective senses, which is a major source of ambiguity in subjectivity and sentiment analysis.
The goal of this exploratory research project is to address this gap, by investigating novel methods for subjectivity sense labeling, and exploiting the results in sense-aware subjectivity analysis. Specifically, the project targets two research objectives. The first objective is to develop new methods for assigning subjectivity labels to word senses in a taxonomy. The second objective is to explore contextual subjectivity disambiguation techniques that will effectively make use of the word sense subjectivity annotations. By achieving these objectives, the project is expected to contribute to the understanding of the connections among subjectivity, word senses, and contextual subjectivity analysis, which will serve as a stepping stone for continued research efforts in this area.
The resources created in this project will be made available to the research community, which will help advance the state of the art in automatic sentiment and subjectivity analysis.
|
1 |
2009 — 2013 |
Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Collaborative Research: Word Sense and Multilingual Subjectivity Analysis @ University of North Texas
Approaches to subjectivity and sentiment analysis often rely on manually or automatically constructed lexicons. Most such lexicons are compiled as lists of words, rather than word meanings ("senses"). However, many words have both subjective and objective senses as well as senses of different polarities, which is a major source of ambiguity in subjectivity and sentiment analysis. The proposed work addresses this gap, by investigating novel methods for subjectivity sense labeling, and exploiting the results in sense-aware subjectivity and sentiment analysis. To achieve these goals, three research objectives are targeted. The first is developing methods for assigning subjectivity labels to word senses in a taxonomy. The second is developing contextual subjectivity disambiguation techniques to effectively make use of the word sense subjectivity annotations. The third is applying these techniques to multiple languages, including languages with fewer resources than English. The project will have broader impacts in both research and education. First, it will make subjectivity and sentiment resources and tools more widely available, in multiple languages, to the research community, which will help advance the state of the art in automatic subjectivity analysis, which in turn will benefit end applications. Second, several educational goals will be pursued: training graduate and undergraduate students in computational linguistics; augmenting artificial intelligence courses with projects based on the proposed research, which will offer students hands-on experience with natural language processing research; and reaching out to women and minorities to increase their exposure to text processing technologies and access to research opportunities.
|
1 |
2010 — 2014 |
Mihalcea, Rada Tarau, Paul |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Collaborative Research: Building a Large Multilingual Semantic Network For Text Processing Applications @ University of North Texas
This project is devoted to building a large multilingual semantic network through the application of novel techniques for semantic analysis specifically targeted at the Wikipedia corpus. The driving hypothesis of the project is that the structure of Wikipedia can be effectively used to create a highly structured graph of world knowledge in which nodes correspond to entities and concepts described in Wikipedia, while edges capture ontological relations such as hypernymy and meronymy. Special emphasis is given to exploiting the multilingual information available in Wikipedia in order to improve the performance of each semantic analysis tool. Significant research effort is therefore aimed at developing tools for word sense disambiguation, reference resolution and the extraction of ontological relations that use multilingual reinforcement and the consistent structure and focused content of Wikipedia to solve these tasks accurately. An additional research challenge is the effective integration of inherently noisy evidence from multiple Wikipedia articles in order to increase the reliability of the overall knowledge encoded in the global Wikipedia graph. Computing probabilistic confidence values for every piece of structural information added to the network is an important step in this integration, and it is also meant to provide increased utility for downstream applications. The proposed highly structured semantic network complements existing semantic resources and is expected to have a broad impact on a wide range of natural language processing applications in need of large scale world knowledge.
For further information, please see the project website: http://lit.csci.unt.edu/index.php/Mu.Se.Net
|
1 |
2013 — 2017 |
Mihalcea, Rada Burzo, Mihai |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Physio-Linguistic Models of Deception Detection @ University of Michigan Ann Arbor
The goal of this Early-concept Grant for Exploratory Research is to explore a new generation of computational tools for joint modeling of physiological and linguistic signals of human behavior. The project is the first to investigate physio-linguistic models for deception analysis. To achieve this goal, the following three research objectives are pursued. First, a novel physio-linguistic dataset of deceit is built, covering several different domains. Second, rule-based classifiers for deception detection are explored, using physiological features (e.g., heart rate, respiration rate, galvanic skin response, skin temperature), as well as linguistic features. Third, data-driven learning approaches for multimodal deception detection are developed, taking advantage of the recent progress in early, late, and temporal fusion models.
The project is exploratory in nature, and acts as a catalyst for novel research problems. First, it explores rich sets of multimodal features extracted from physiological and linguistic modalities, analyzing their effectiveness in the recognition of deceit. Second, it also explores the integration of multiple physio-linguistic modalities, through experiments with rule-based and data-driven techniques that fuse multimodal features into joint deception analysis models. To address the challenges of multimodal research work, the team working on this project brings together experts from the fields of bio-sensors, computational linguistics, and physiology and behavioral sciences.
The project has high potential payoffs, as models of deception detection have broad applicability, including: the development of critical tools for various applications in fields such as criminal justice, intelligence, and security; the enhancement of applications that can be negatively affected by the presence of deceit, such as opinion analysis or modeling of human communication; and a deeper understanding of fundamental aspects of human behavior, which can positively impact medical applications in psychiatry and psychology. The tools and datasets produced during this project will be made freely available for the research community.
For further information see the project web site at: http://web.eecs.umich.edu/~mihalcea/deceptiondetection/
|
0.952 |
2013 — 2017 |
Mihalcea, Rada Pennebaker, James |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Inspire Track 1: Language-Based Computational Methods For Analyzing Worldviews @ University of Michigan Ann Arbor
This INSPIRE award is partially funded by the Cyber-Human Systems Program in the Directorate for Computer and Information Science and Engineering, the Robust Intelligence Program in the Directorate for Computer and Information Science and Engineering, and the Social Psychology Program in the Directorate for Social, Behavioral and Economic Sciences. The goal of this project is to gather new insights into the ways people organize and understand their worlds within and across different cultures by means of innovative methodologies and tools from the fields of psychology and computational linguistics. The findings from this project will provide a better understanding of people on the individual psychological level as well as the cultures themselves, while developing and demonstrating new research techniques that can be used in future by many disciplines to exploit the vast troves of scientifically valuable textual data currently available online. Specifically, the project targets the following three main research objectives: 1) Construct a very large multicultural database of writings from English-speaking cultures, covering several styles and genres, including: social media (e.g., blogs, tweets); news articles; literary works; student writings. 2) Build computational linguistic models that can automatically identify differences in concept usage for different cultures, and apply these models on a large scale. 3) Validate the findings of these computational models through psychological qualitative and quantitative methods in laboratory studies.
The ways people use words can provide insights into the ways they see and understand their worlds. Everyday language can also tell us about people's social, emotional, and psychological states and even the ways they think about themselves and others. Particularly interesting is that many of the social and psychological insights we find with the language of individuals can be extrapolated to groups, communities, and entire cultures. This project seeks to analyze the written language of people across several cultures in a way that will allow us to better understand the ways groups of people understand their worlds. In short, it will use advances in computational linguistics and social psychology to track the underlying values, beliefs, and concerns of very large groups of people by analyzing the ways they use words. Unlike previous studies, which have been limited to relatively small self-report surveys targeting a handful of concepts across cultures, this project will help us understand the differences in perception for thousands of concepts, by several cultures representing hundreds of thousands of people.
This project promises to shed new light on cultural differences by analyzing the ways people understand their worlds through their everyday language use. The approach will inform applications in communication, threat control, tracking of cultural values, and others. The project will also provide educational opportunities, in the form of training for students in both computer science and psychology, who will be directly exposed to interdisciplinary research, cultural diversity, and international experiences. Finally, the large multicultural dataset that will be created as part of this project, along with the tools to process it, will be made publicly available, thus enabling future research, as well as educational projects concerned with the analysis and understanding of cultural diversity and worldview.
|
0.952 |
2018 — 2021 |
Gonzalez, Richard (co-PI) [⬀] Mihalcea, Rada Banea, Carmen (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Demographic-Aware Lexical Semantics @ University of Michigan Ann Arbor
A central challenge in natural language processing is to develop methods for determining how meanings of words relate to one another. This task is called "lexical semantics", because "lexical" means "word" and "semantics" means "meaning". Traditional dictionaries do not solve the problem of lexical semantics, because definitions are often circular or incomplete, especially for the most common words. Instead, models of lexical semantics are computed by processing large bodies of text, using the principle that pairs of words that often appear in the same contexts must have meanings that are similar along some dimensions. For example, the words "man" and "boy" would be inferred to have similar meanings along the dimensions of "human" and "gender". However, a limitation of current models is that they assume that the meaning of words is the same for all speakers of a language. This is plainly false: we know, for example, that English speakers use words differently depending, among other factors, their age, gender, field of work, and geographic location; that is, on the basis of their demographics. This project will overcome this limitation by developing methods for demographic-aware lexical semantics, where people-centric information complements language-based information. This work will help improve systems for natural language communication between people and computers, such as Siri or Alexa, as well as improve systems for automatically translating between different languages.
Recent years have witnessed significant progress in research in lexical semantics using corpus-based approaches such as distributional vector-space models and word embeddings. At the same time, the growth of Web 2.0 has led to tremendous volumes of texts, most of which are rich in explicit or implicit demographic information, such as the age, gender, industry, or location of the writer. The goal of this project is to take the next natural step at the confluence of these two trends, and develop methods for demographic-aware lexical semantics, where people-centric information complements language-based information for enhanced linguistic representations that explicitly account for the demographics and traits of the people behind the language. The project targets the following three main research objectives. First, it develops novel demographic-aware word representations models that account not only for contextual knowledge but also for people-centric information. Methods that are explored include distributional vector-space models that can be composed to create demographic-aware vector-space representations for various demographic profiles, and joint word embeddings that combine generic context-based embeddings with specialized embeddings that reflect the specifics of given demographic dimensions. Second, building upon extensive previous work in behavioral studies targeting the identification of systematic heterogeneity across groups, lab studies are devised to validate the findings from the computational models. Third, the application of these novel people-centric word representations to three core tasks in natural language processing are explored, ranging from simple to complex, namely: word associations, text similarity, and diversified news.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.952 |
2020 — 2022 |
Samson, Perry Mihalcea, Rada |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Pfi-Tt: Using Artificial Intelligence to Identify Additional Educational Resources Based On What Was Discussed in Class @ Regents of the University of Michigan - Ann Arbor
The broader impact/commercial potential of this Partnerships for Innovation - Technology Translation (PFI- TT) project is to create the Contextual Linkaging for Undergraduate Education (CLUE) Service that analyzes transcriptions from class recordings and automatically identifies related resources in a student?s learning ecosystem. This system will allow students to search class videos to find when a concept was discussed, indicate moments during a class when they are confused, and subsequently receive additional information on the confusing topics. By automatically identifying when key concepts were discussed during class, this project will enable educational platforms to deliver content to the learner personalized to the material presented during each class session. Instructors will receive feedback on which resources students find valuable for each topic. The CLUE Service will be designed to help educational resource providers by providing contextual linkages to resources in other educational platforms. University Chief Information Officers will find value as a way to contextually integrate the numerous learning resources they support.
The proposed project builds on former NSF-funded projects and requires technical expertise in natural language processing and design of educational technologies. The CLUE Service will use computer-generated transcriptions from class captures to identify key terms and phrases discussed during class sessions. The combination of analyzed key terms and corresponding timestamps will allow contextual linkages to be created between moments in class captures and other educational resources. The envisioned prototype will be a subscribable Application Programming Interface (API) that will allow contextual linkages between educational services. Technical challenges for this project include how to best design unsupervised machine learning to either identify keywords and/or topics in a class session along with the best timestamp(s) in class to represent those keywords or topics and how to provide a resource that can be used with any video delivery system so the student can make known when they would like additional information during a streaming or recorded class session. Initially, the results of this research will be made available to participating instructors or their assistants with tools that will allow them to assess the potential value of the recommended resources to their students. This feedback will inform improvements in resource selection for the CLUE service.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.952 |