1985 — 1991 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Presidential Young Investigator Award (Information Science) |
0.915 |
1991 — 1997 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Faw:Research in Language Generation: Word Choice, Functionalunification, and Content Planning
This Faculty Award for Women Scientist is for Research in the area of natural language processing. This particular research focusses on the problem of word choice and on the user of a unification formalism to implement and represent generation decisions, including syntactic processes, word choice, and content planning. The Principle Investigator is testing previously developed language generation techniques as part of one or more generation systems. These include interactive question answering and multimedia system, as well as systems that can generate financial reports.//
|
0.915 |
1996 — 2000 |
Allen, James Mckeown, Kathleen Passonneau, Rebecca |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Card: Corpus Analysis Resources For Discourse
This is a collaborative effort among three universities (Columbia, Rochester, and Pittsburgh) to construct, evaluate, and disseminate a package of Corpus Analysis Resources for Discourse (CARD). The goal is to provide the means for a large-scale, robust analysis of language use, both within and across distinct types of discourse corpora. The three components of CARD are a Discourse Annotation Language (DAL) to encode information pertaining to language use directly within discourse corpora; reliability measures of the degree of variability in DAL annotations; and a library of DAL-annotated corpora, varying in modality, number of participants, domain, and communicative task. DAL follows the Text Encoding Initiative guidelines and is implemented in Standard Generalized Markup Language to facilitate common authoring and editing utilities. DAL is a modular language with five layers of linguistic representation: morpho-syntactic, prosodic, anaphoric, lexical, and segmental.
|
0.915 |
1998 — 2000 |
Mckeown, Kathleen R |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Customized Multimedia Summaries of Pat Stat Postbypa @ Columbia Univ New York Morningside
DESCRIPTION (Taken from application abstract): The long-term objective of this research is to develop computer systems to automatically describe a patient's clinical status, providing care givers with an easy means for quickly obtaining the exact information they need for patient care. Specifically, the research will develop a system for automatically producing multimedia briefings that provide care givers with timely, effective updates on a patient's condition after a CABG (Coronary Artery Bypass Graft) operation. The system will use AI techniques to create summaries that combine natural language and graphics that are generated on the fly and tailored to the information needs of different care givers. By integrating data currently available in the computerized operating room and other online databases at Columbia Presbyterian Medical Center, these summaries will provide a timely and concise overview of information that would otherwise be difficult to obtain. Evaluation will test the hypothesis that these multimedia briefings will improve continuity of care for post-CABG patients, by deploying the system in the cardiac ICU (intensive care unit) and measuring changes in delay and error in treatment of arriving patients as well as reductions in the amount of direct communication needed among care givers. Measurements will compare online records of drips with and without use of the system. The research plan includes scaling an initial prototype, developed prior to the proposed research, to handle multimedia summarization on a large number of input patient cases; developing tools to customize the briefing for cardiologists, ICU residents, and ICU nurses; developing techniques to allow the care giver to interrupt the system; developing an inferencing component to identify significant events during the operation; and evaluating the proposed work at early stages, to feedback care giver preferences into system development, and at late stages, to quantify the system's effectiveness.
|
0.939 |
1999 — 2006 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Dli-Phase 2: a Patient Care Digital Library: Personalized Retrieval Summarization of Multimedia Information
Abstract
IIS-9817434 McKeown, Kathleen Columbia University $953,188 - 12 mos.
DLI-Phase 2: A Patient Care Digital Library
This is the first year funding of a five year Cooperative Agreement award. Healthcare consumers and providers both need quick and easy access to a wide range of online resources. The goal of this project is to provide personalized access to a distributed patient care digital library through the development of a system, PERSIVAL (Personalized Retrieval and Summarization of Image, Video And Language resources). PERSIVAL will tailor search, presentation, and summarization of online medical literature and consumer health information to the end user, whether patient or healthcare provider. PERSIVAL will utilize the secure online patient records available at Columbia Presbyterian Medical Center (CPMC) as a sophisticated, pre-existing user model that can aid in predicting user's information needs and interests. Key features of the proposed work include personalized access to distributed, multimedia resources available both locally and over the Internet, fusion of repetitive information and identification of conflicting information from multiple relevant sources, and presentation of information in concise multimedia summaries that cross-link images, video, and text. When the latest medical information is provided at the point of patient care, it can help practicing clinicians to avoid missed diagnoses and minimize impending complications. When expressed in understandable terms, it can empower patients to take charge of their healthcare.
|
0.915 |
2002 — 2008 |
Kender, John (co-PI) [⬀] Mckeown, Kathleen Kaiser, Gail (co-PI) [⬀] Feiner, Steven (co-PI) [⬀] Schulzrinne, Henning [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cise Research Infrastructure: Pervasive Pixels
EIA 02-02063 Schulzrinne, Henning Feiner, Steven K., Kaiser, Gail E., Kender, John R., McKeown, Kathleen R. Boston University
Title: RI: Pervasive Pixels
Projected Proposed:
This project, developing a flexible department-wide collaborative work infrastructure, "Pervasive Pixels," that will serve as a testbed for research in collaborative systems, aims to use this collaborative framework to conduct research and teaching. Pervasive Pixels will capture and deliver multimedia information across heterogeneous networks and devices. The system will schedule meetings, manage and prefetch multimedia objects, laying out material on individual and shared displays. Meetings are facilitated by locating, tracking and identifying users as they desire and are recorded, annotated and summarized, with extensive research capabilities. While improvements in core computation and communication technologies encourage working and interacting remotely, engaging in interdisciplinary collaborations that span buildings, cities, and countries, routinely encounters severe limitations imposed by current collaboration support systems. Pervasive Pixels is created to address these problems and should make possible Capturing and delivering multimedia information (including video), through heterogeneous networks, clients, and devices; Scheduling meetings, managing and prefetching work documents and multimedia objects, and laying out materials of individual and shared displays, based on models of workflow needs and models of temporal, spatial, and semantic interrelationships; Facilitating meetings by locating, tracking, and identifying users, and understanding their gestures, in live and captured video and audio; Recording, annotating, summarizing, and searching meeting content, from multiple physical perspectives and via multiple types of database queries, thus maximizing the effects of temporal differences. The infrastructure emphasizes Large, instrumented, multi-display workspaces in a variety of locations, to accommodate group interactions. Networked mobile devices of various capacities, used individually and in the context of larger workspaces. Transparent and automatic adaptability to changes of place, platform, or group composition, allowing mobile users to interact as they move about, without having to account for these changes manually. Support a wide range of hardware and software, beginning with commercial off-the-shelf commodity components, whose capabilities are retained while the system evolves, ultimately leading to new standards for meeting environments.
|
0.915 |
2003 — 2006 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Collaborative Research: Interlingual Annotation of Multilingual Text Corporation
This multi-site research effort is aimed at developing a coherent, consistent, standardized Interlingual representation along with a methodology and sharable tools for annotating large bilingual corpora of parallel texts. It has four central components: First, six corpora are being compiled, each consisting of a number of texts in a particular source language along with three translations of each text into English. Second, a standardized interlingual representation is being developed based on a comparative analysis of these parallel text corpora. Third, the bilingual corpora are being annotated using the standardized interlingua and following a predefined annotation procedure. Fourth, metrics are being developed for evaluating the accuracy and appropriateness of the interlingual representations in terms of the grain size of the representation given a particular task. The metrics are based on inter-coder reliability, the growth rate of the interlingual representation, and quality of the target language text that is be generated from the interlingua.
The resulting annotated, multilingual, parallel corpora will be useful as an empirical basis for developing a wide variety of interlingual NLP systems for tasks such as machine translation, question answering, web searching, summarization, or presentation generation, as well as a host of other research and development efforts in theoretical and applied linguistics, foreign language pedagogy, translation studies, and other related disciplines.
The participants include CRL at NMSU, ISI at USC, UMIACS at the University of Maryland, LTI at CMU, Columbia University, and The MITRE Corporation. The source languages include Arabic, Chinese, French, Hindi, Japanese, Spanish and English.
|
0.915 |
2006 — 2010 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Text-to-Text Generation For Summarizing Informal Genres
This project aims at the generation of coherent and on-target summaries and answers through the use of text-to-text generation, an approach which generates new sentences from the input text, fusing relevant phrases and discarding irrelevant ones. A syntactic, statistical framework for text-to-text generation is being developed that can be applied to informal genres, such as transcribed speech and email, where sentences are not guaranteed to be either complete or grammatical. It is exactly these genres that stand to benefit the most from this approach; for them, summarization using sentence extraction alone is not an option.
The aim is a fully developed, syntactic statistical framework for text-to-text generation which features the use of a full syntactic grammar within a statistical framework for compression and combination, a model for incorporating constraints from pragmatics and semantics into the generation system, the ability to produce fluent, grammatical sentences from fragmentary and ungrammatical input, and the ability to generate sentences that make high level abstractions from input document sentences.
The project features the integration of compression and language models into a lexicalized head-driven framework, enabling the generator to keep the sentence grammatical and avoid wording changes that dramatically alter meaning. Its framework can incorporate an arbitrary number of features beyond syntax that are important for summarization. A new dynamic programming technique allows the automatic extraction of large amounts of training data from a summary/document corpus. Information about who speaks to whom and paraphrasing rules will increase the range of revisions that can be addressed.
|
0.915 |
2009 — 2011 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Corpus-Based Narrative Semantics
This Early-concept Grant for Exploratory Research (EAGER) explores approaches for computational analysis of narrative. Despite the ubiquitous nature of narrative, computational linguists have shied away from research on narrative since the 1970's, viewing analysis of stories and literature as too difficult. The goal of this EAGER project is to show that analysis of narrative is now possible and that its study can also be relevant to the development of practical, web-based systems. The project features the development of a declarative, symbolic representation of narrative, a method for manually analyzing the content units of narrative using this representation, and a computational approach for automatically processing a corpus of narratives to derive structural and content-oriented patterns. For example, a learning model may be developed to identify and describe dilemmas that a character faces or to identify thematic similarity between stories. In the first 12 months of the project, researchers are focusing on the development of the annotation methodology, a collection project for annotations of short fables and parables, and the development of learning algorithms. In the following six months, the researchers plan to apply the work to a larger domain in order to show larger impact -- namely, the processing of news text for tasks such as summarization. The project features a collaboration between computer scientists and an expert in literary theory in order to incorporate modes of analysis that are well-grounded from the perspective of narratology. The researchers will provide a range of resources for further work in the narratology and computational linguistics communities, including the annotated corpus and annotation methodology (called DramaBank) as well as software for annotation and automatic analysis; these will enable both communities to continue a new line of research on literature and other forms of narrative occurring on the web.
|
0.915 |
2009 — 2014 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Large: Collaborative Research: Richer Representations For Machine Translation
Research in machine translation of human languages has made substantial progress recently, and surface patterns gleaned automatically from online bilingual texts work remarkably well for some language pairs. However, for many language pairs, the output of even the best systems is garbled, ungrammatical, and difficult to interpret. Chinese-to-English systems need particular improvement, despite the importance of this language pair, while English-to-Chinese translation, equally important for communication between individuals, is rarely studied. This project develops methods for automatically learning correspondences between Chinese and English at a semantic rather than surface level, allowing machine translation to benefit from recent work in semantic analysis of text and natural language generation. One part of this work determines what types of semantic analysis of source language sentences can best inform a translation system, focusing on analyzing dropped arguments, co-reference links, and discourse relations between clauses. These linguistic phenomena must generally be made more explicit when translating from Chinese to English. A second part of the work integrates natural language generation into statistical machine translation, leveraging generation technology to determine sentence boundaries, ordering of constituents, and production of function words that translation systems tend to get wrong. A third part develops and compares algorithms for training and decoding machine translation models defined on semantic representations. All of this research exploits newly-developed linguistic resources for semantic analysis of both Chinese and English.
The ultimate benefits of improved machine translation technology are easier access to information and easier communication between individuals. This in turn leads to increased opportunities for trade, as well as better understanding between cultures. This project's systems for both Chinese-to-English and English-to-Chinese are developed with the expectation that the approaches will be applied to other language pairs in the future.
|
0.915 |
2014 — 2017 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Describing Disasters and the Ensuing Personal Toll
The research goal of this project is to help people react to major natural and man-made disasters by developing a system that can automatically generate descriptions of disasters, whether man-made or resulting from climate events. Through the generation of descriptions of disaster impact several months after the disaster has occurred, the system can document the impact of different scale disasters in different locations from country to city. Information for the description is drawn from news and social media. Descriptions are told from an objective point of view, describing the facts as known, and from a personal point of view, describing the experiences and emotions of people who experienced the disaster. Enriching the descriptions of disaster impact by access to personal stories provides a compelling look at how disasters impact individuals. Through generation of descriptions of the impact of a disaster as it happens, relief organizations can better coordinate delivery of aid to where it is needed. The generated descriptions are being made available through a public website so that all people interested in impact of a disaster will have access. This project is creating technology with the potential for social good and thus, will have appeal to many students, including undergraduates who seek to make their contributions meaningful to society.
The projet brings together research on generating descriptions that highlight the structure of large-scale events with research on automatic identification of riveting personal stories. Semi-supervised and supervised approaches to the problem are used, drawing on large-scale online sources of data as well as smaller collections of annotated data. The project features the use of semi-supervised approaches to learning event relevance that exploit 11 years of summaries generated by Columbia's Newsblaster system plus other online large-scale semantic information. It features construction of an event tree given textual descriptions from news and social media where nodes represent events (both larger scale and sub-events) and the tree structure can represent subsumption, temporal and causal relations between events. Finally, it uses a supervised approach to learning when a text taken from social media conveys an interesting story based on a socio-linguistic theory of narrative structure.
|
0.915 |
2015 — 2020 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Bd Hubs: Northeast: the Northeast Big Data Innovation Hub
The Northeast Big Data Innovation Hub will speed innovation and research by creating an ecosystem for a diverse group of researchers and practitioners to learn from each other. The Hub will encourage collaboration through workshops, visiting research positions, student ambassadors, virtual meetings, research projects and an online web presence, and facilitate sharing of data, tools, infrastructure, techniques, and insights to address challenges using big data. It will also facilitate the development, collection and organization of educational materials and learning opportunities for teachers in pre-K through high school. To communicate the importance of big data in everyday life, the Hub will work with museum educators and curators to develop materials and exhibits on this topic and share work already being done across the region. Finally, the Hub will develop programs for government and the general public on how to take advantage of data analytics.
The scale of this project will allow the hub to make substantial intellectual gains. By identifying shared challenges and facilitating the deployment of human and financial resources, solutions will be developed that can be applied to a range of research questions, government and economic problems and that can be useful to society generally. The Northeast Hub will be organized in a "hub and spoke model" with common theme activities (connectors) that are centrally organized and spokes that reflect regional interests aligned with national priorities. Initial areas of focus, or spokes, will be: Health, Energy, Finance, Cities/Regions, Discovery Science and Data Science in Education. Cross-cutting connectors will be: Data Sharing, Privacy & Security, Ethics & Policy and Education. Topics of importance to the northeast will be added to the Hub during the first three years.
For further information see the project web site: http://northeastbdhub.dsi.columbia.edu/
|
0.915 |
2017 — 2018 |
Mckeown, Kathleen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cybersecurity Risk Conference
In the past several years, cyber-crime and cyber-breaches have exploded, due to the increase in commerce conducted on the Internet, as well as increasing quantities of sensitive information being stored in databases that can be accessed by outside parties. In addition, cyber-criminals have advanced, not only employing technical acumen but also preying on human error by using sophisticated techniques like phishing. To address these challenges, the Northeast Big Data Innovation Hub (NEBDIH) will convene two workshops to bring together experts from academia, government, and industry in the fields of insurance, cybersecurity, data science, and policy. The ultimate goal is to develop a collaborative model (e.g., a cross sector consortium) that will help to define the frequency and economic/social severity related to successful attacks, and to begin to develop baselines for measuring and managing cyber risk more effectively. NEBDIH will seek to gather leading academic, government, risk management, general industry and healthcare participants, whose mission will be to identify, quantify, and to help mitigate risk associated with cyber-related criminal activity. The workshops will serve as a forum for these experts to collaborate on: 1) better understanding and defining the interrelated cybersecurity landscape across sectors, and 2) planning for the resources necessary to develop baselines for measuring and managing cybersecurity risk. Expected outcomes from these activities include developing a framework for filling the gaps in quantifying and managing cybersecurity risk across organizations and sectors. This may take the form of an initiative that captures the best of breed technologies, vendors, and best practices; captures a substantial portion of the data that is required, in aggregate, to make a cybersecurity risk assessment at multiple levels (by industry, by region, by company); provides the tools to make the data discoverable and interoperable; and provides a basic suite of tools to allow for high level analysis of the data within the context of cybersecurity risk.
|
0.915 |
2021 — 2024 |
Mckeown, Kathleen Patton, Desmond Grieser, Jessica |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Medium: Automatically Understanding and Identifying Digital Expression of Black Grief
In today’s world, multiple large-scale events have converged, causing increased emotional distress for many in the United States. In addition to large-scale events, such as COVID-19, incidents of police brutality against Blacks, and the economic downturn, people also experience distressing personal events, such as loss of a close family member or friend. This project develops novel machine learning-based natural language processing (NLP) tools to automatically identify the online expression of grief and component emotions that occur in reaction to these triggering events. The focus is on Black grief, a phenomenon that is not well understood, especially when it occurs in a networked public. The results of this project will include a dataset, annotated at different levels, that scholars and computational researchers can use to understand the online expression of Black grief and develop novel NLP models for its identification. The project has the potential for truly broad and profound impact in society. Given the rate at which people post online, an NLP tool that can automatically identify grief expressed in a post would be useful to professionals who respond to grief. Automatic flagging of posts indicating that the poster may need help would be more efficient than having professionals manually scan all online spaces of interest, an approach that is now common. New NLP tools developed during the project have the potential to shift how social workers, mental health professionals, and outreach workers treat complex grief online, informing new intervention and treatment programs that respond to an individual’s digital life. The investigators work with Black Harlem residents who are helping other residents cope with and process emotions including grief and other disturbing events, engaging them in the evaluation of the developed NLP tools.
This work is an interdisciplinary collaboration between computer scientists, social work researchers, and linguists. It includes the use of layered annotation and computational methods to analyze social media posts after triggering, often traumatic, events to identify how people communicate about different types of loss. The goal is to understand the digital expression of grief in posts by Black community members. The plan is to collect corpora containing expressions of grief in reaction to triggering events, and to produce a layered annotation of the corpora reflecting semantic interpretation and context, psychological interpretation of ex- pressed emotion, as well as linguistic expression of grief. Using this data, a computational approach will be developed to automatically identify grief, its component emotions and intensity, and how emotional re- actions change over time. The Natural Language Processing (NLP) team will develop new semi-supervised methods to identify grief, its component emotions and intensity as expressed in different dialects as well as conversational patterns that lead to different resolutions of grief over time. The social work team will perform a qualitative analysis of complex historical trauma, bias, and racism embedded in annotations of social media posts. They will work with community experts to identify the best strategies for deciphering different expressions of emotions that use hyper-local language that is deeply regional, nuanced, and cultural. The linguistics team’s work will advance understanding of the role of specific digital language strategies in the creation of social meaning, identifying the significance of morphosyntactic variation in digital language. The approach also includes identifying racial bias in systems that are developed in the award and understanding the impact on predictions when the computational model is applied to the language of different different demographics in communities (e.g., age, socio-economic status).
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |