2001 — 2005 |
Hunter, Lawrence E |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Integrated Neuroinformatics Resource For Alcoholism (in* @ University of Colorado Denver
DESCRIPTION (provided by applicant): The aim of this proposal is to establish an Integrated Neuroinformatics Resource on Alcoholism (INRA) as the informatics core component of the Integrative Neuroscience Initiative on Alcoholism Consortium (INIA). The overall goal of the INRA will be to create an integrated, multiresolution repository of neuroscience data, ranging from molecules to behavior for collaborative research on alcoholism. As the neuroinformatics core of the INIAC, the INRA will enable the integration of all data generated by all components of the INIAC. Furthermore, it will support synthesis of new knowledge through computational neurobiology tools for exploratory analysis including visualization, data mining and simulation. The INRA will represent a synthesis of emerging approaches in bioinformatics and existing methods of neuroinformatics to provide the INIAC a versatile toolbox of computational methods for elucidating the effects of alcohol on the nervous system. The specific aims of the INRA will be: (i) implementation of an informatics infrastructure for integrating complex neuroscience data, from molecules to behavior, generated by the consortium and relevant data available in the public domain; (ii) development of an integrated secure web-based environment so that consortium members can interactively visualize, search and update the integrated neuroscience knowledge; and (iii) development of data mining tools, including biomolecular sequence analysis, gene expression array analysis, characterization of Biochemical pathways, and natural language processing to support hypothesis generation and testing regarding ethanol Consumption and neuroadaptation to alcohol. We will also collaborate with related neuroscience projects to utilize existing resources for brain atlases, neuronal circuits and neuronal properties. The INRA will The made available to the INIAC through a Neb-based system through interactive graphical user interfaces that will seamlessly integrate tools for data entry, modification, search, retrieval and mining. The core of the INRA will be based on robust knowledge management methods and tools that will Effectively integrate disparate forms of neuroscience data and make it amenable to complex inferences. Our proposed strategy ensures that the informatics resource is: (i) flexible and scalable to address the evolving needs of the INIAC, and (ii) highly intuitive and user-friendly to ensure optimal utilization by the INIAC members. The proposed INRA is a novel system for Collaborative research in neuroscience and alcoholism which will be developed by an interdisciplinary team of experts in Bioinformatics, computational biology, neuroscience and alcoholism research. We believe the INRA will greatly enhance the Dace of discovery in the area of ethanol consumption and neuroadaptation to alcohol within the INIAC as well as the general research community.
|
0.979 |
2004 — 2012 |
Hunter, Lawrence E |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Technology Development For a Molbio Knowledge-Base @ University of Colorado Denver
[unreadable] DESCRIPTION (provided by applicant): [unreadable] [unreadable] In the three years since the original proposal was submitted, the claims we made about the impending readiness of knowledge-based approaches and natural language processing to address pressing problems of information overload in molecular biology have been resoundingly confirmed, and such methods have become increasingly accepted within the computational bioscience and systems biology communities. We are now well into the era of broad use of semantic representation technology to support biomedical research, and at the cusp of the use of biomedical natural language processing software to create the enormous number of necessary formal representations automatically from biomedical texts. The results of the work during the last funding period have not only contributed [unreadable] innovative and significant new methods, but have helped us identify a set of specific research issues we claim are now the rate-limiting factors in building an extensive, high-quality computational knowledge-base of molecular biology. The aims of this competitive renewal are to address those factors, making it possible to scale our impressive results on intentionally narrow applications to much [unreadable] larger (and more significant) tasks, specifically: (1) to create an enriched, relationally decomposed set of conceptual frames, hewing closely to multiple, community curated ontologies; (2) develop language processing tools capable of recognizing and populating instances of those conceptual frames, and (3) develop systems for integrating and using diverse knowledge from multiple sources to generate scientific insights, focusing on the analysis of sets of dozens to hundreds of genes produced by diverse high-throughput methodologies. An innovative aspect of this proposal is the creation and application of novel, insight-based extrinsic evaluation techniques for such systems. [unreadable] [unreadable] [unreadable]
|
0.979 |
2006 — 2008 |
Hunter, Lawrence E |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Beyond Abstracts: Issues in Mining Full Texts @ University of Colorado Denver
[unreadable] DESCRIPTION (provided by applicant): [unreadable] Biomedical language processing, the application of computational techniques to human-generated texts in biomedicine, is an increasingly important enabling technology for basic and applied biomedical research. The exponential growth of the peer-reviewed literature and the breakdown of disciplinary boundaries associated with high-throughput techniques have increased the importance of automated tools for keeping scientists abreast of all of the published material relevant to their work. However, despite decades of research, the performance of state-of-the-art tools for basic language processing tasks like information extraction and document retrieval remain below the level necessary for adequate utility and widespread adoption of this technology. The development, performance and evaluation of text mining systems depend crucially on the availability of appropriate corpora: collections of representative documents that have been annotated with human judgments relevant to a language-processing task. Corpora play two roles in the development of this technology: first, they act as "gold standards" by which alternative automated methods can be fairly compared, and second, they provide data for the training of statistical and machine learning systems that create empirical models of patterns in language use. The conventional view is that corpora are neutral, random samples of the domain of interest. Our preliminary work suggests that the restrictions in size, quality, genre, and representational schema of the small number of existing corpora are themselves a critical limiting factor for near-term breakthroughs in biomedical text processing technology. Therefore, we propose to test the following hypothesis: Creation of large, high-quality, biomedical corpora from multiple genres will lead to significant improvements in the performance of biomedical text mining systems and the creation of new approaches to text mining tasks. Specific aims include constructing several large corpora covering a range of genres and incorporating a rich knowledge representation; identifying factors that affect differential performance on full text versus abstracts; and developing new methods for language processing, especially of full text. Because improvements in the ability to automatically extract information from many textual genres will assist scientists and clinicians in the crucial task of keeping up with the burgeoning biomedical literature, the potential public health impact is quite large. [unreadable] [unreadable] [unreadable]
|
0.979 |
2007 — 2011 |
Hunter, Lawrence E |
T15Activity Code Description: To assist professional schools and other public and nonprofit institutions to establish, expand, or improve programs of continuing professional education, especially for programs of extensive continuation, extension, or refresher education dealing with new developments in the science of technology of the profession. |
Computational Bioscience Program Training Grant @ University of Colorado Denver
DESCRIPTION (provided by applicant): The Computational Bioscience Program (CBP) of the University of Colorado School of Medicine is an independent Ph.D.-granting and postdoctoral training program. We have an innovative and highly productive approach to training pre- and post-doctoral fellows for research careers. We are a second-generation teaching program, informed by the experiences of the many computational biology training models that have come before us. Our program is designed to produce graduates with depth in both computational methods and molecular biology, an intimate familiarity with the science and technology that synthesizes the two, and the skills necessary to pioneer novel computational approaches to significant biomedical questions. We are aware of the difficulty of achieving both breadth and depth in a reasonable amount of time, and believe we have identified a novel approach that is capable of training productive interdisciplinary scientists in a relatively short period. The program is tightly focused on transforming already strong students and recent Ph.D.s into mature and productive scientists. Our program is structured around a set of four categories of educational goals and objectives: knowledge, communication skills, professional behavior and self-directed life-long learning. Our graduates demonstrate the knowledge of core concepts and principles of computational bioscience, and have the ability to apply computation to gain insight into significant biomedical problems. Their knowledge includes mastery of the fundamentals of biomedicine, statistics and computer science, as well as proficiency in the integration of these fields. Our graduates will contribute to the discovery and dissemination of new knowledge. Our graduates demonstrate interpersonal, oral and written skills that enable them to interact productively with scientists from both biomedical and computational domains, to clearly communicate the results of their work in appropriate formats, and to teach others computational bioscience skills. Our graduates are able to bridge the gap between biomedical and computational cultures. Our graduates demonstrate the highest standards of professional integrity and exemplary behavior, as reflected by a commitment to the ethical conduct of research, continuous professional development, and thoughtfulness regarding the broader implications of their work. Our graduates demonstrate habits and skills for self-directed and life-long learning, and recognize that computational bioscience is a rapidly evolving discipline. Our focus is on the development of adaptive, flexible and curious scientists able to comfortably assimilate new ideas and technologies during the course of their professional development.
|
0.979 |
2007 — 2009 |
Hunter, Lawrence E |
G08Activity Code Description: A grant available to health-related institutions to improve the organization and management of health related information using computers and networks. |
Construction of a Full Text Corpus For Biomedical Text Mining @ University of Colorado Denver
[unreadable] DESCRIPTION (provided by applicant): [unreadable] [unreadable] There is a demonstrated community need for an annotated corpus consisting of the full texts of biomedical journal articles. There are many reasons to believe that the rate-limiting factor impeding progress in biomedical language processing today is the lack of availability of the right kind of expertly annotated data. An annotated corpus is a collection of texts with information about the meaning or structure associated with particular textual elements. Annotated corpora are a critical component of biomedical natural language processing research in two ways. First, most contemporary approaches to language processing rely at least in part on machine learning or statistical models. Such systems must be "trained" on sets of examples with known outputs, so annotated corpora provide the training data vital to the construction of modern NLP systems. Second, annotated corpora provide the gold standard by which various approaches to particular text mining tasks are evaluated. Due to their central roles in training and testing language processing systems, the quality of the design and operational creation of annotated corpora place fundamental limits on what can be accomplished with such systems. Although there has been valuable work done on annotating abstracts, there are important differences between abstracts and full-text articles from a text mining perspective, and annotation of full-text journal articles has been negligible. Workers in both the biological (especially model organism database curation) community and the text mining community have independently pointed out the importance of processing the full text of scientific publications if the biomedical world is to be able to fully utilize text mining. We propose to build a large, fully annotated corpus consisting of full texts of biomedical journal articles. Additionally, previous biomedical corpus annotation efforts have often utilized ad hoc ontologies that have limited their utility outside of the groups that created them. We will ensure community acceptability by annotating with respect to community-consensus ontologies such as the Gene Ontology and the UMLS. Since the task involves expensive human labor, efficiency is a key issue in creating corpora. For this reason, we propose to build a team that includes the builder of the largest semantically annotated corpus to date, one of the pioneers of the model organism databases, and an already-assembled cadre of experienced linguistic and domain-expert annotators. [unreadable] [unreadable] [unreadable] [unreadable]
|
0.979 |
2007 — 2010 |
Hunter, Lawrence E |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Ontologies and Biomedical Language Processing @ University of Colorado Denver
DESCRIPTION (provided by applicant): We hypothesize that there are significant synergies between the applications of biomedical ontologies and of biomedical language processing (BLP) which can be used to improve the quality and scope of both activities. A growing body of work suggests such synergies might exist, but there has yet to be a systematic exploration of their potential. We propose to carry out a focused effort to explore both the potential for, and obstacles to, the mutual application of biomedical ontologies and biomedical language processing. To provide immediate biological relevance to our work, we propose to focus on the topics of autoimmune and pulmonary disease. We group our proposed explorations into three specific aims: (1) Create novel tools and approaches for the application and maintenance of biomedical ontologies, based on an assessment of the processes and tools used for the ontological annotation of textual corpora in the biomedical language processing community. Particularly, we will focus on the creation of new methods for effective search through large ontologies, compositional approaches to annotation, effective capture of the evidence underlying annotations, and the use of automated suggestions for manual confirmation. (2) Evaluate the utility of BLP tools and techniques when applied to terms and definitions of biomedical ontologies, both to enrich and interconnect orthogonal ontologies, and to provide quality assurance and quality control mechanisms. Particularly, we propose to develop and evaluate methods for connecting terms within and across ontologies, for assessing completeness of an ontology against the literature, and for implementing automatically executable measures of ontology quality. (3) Compare the differences between annotations produced by manual procedures and those produced by automated BLP methods for completeness and correctness. Based on the resulting data, produce guidelines for the optimal interplay between manual and automatic procedures for producing broad, accurate and useful knowledge-bases. Because ontologies are the central organizing tool of the model organism databases, improvements in their quality and in the ease and efficiency of their use will have a major effect on the model organism databases, speed the translational process generally, and create a potentially large public health impact.
|
0.979 |
2009 — 2013 |
Hunter, Lawrence E. |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Biomedical Language Processing Writ Large: Scaling to All of Pubmedcentral @ University of Colorado Denver
DESCRIPTION (provided by applicant): Recent developments in text mining research, and in scientific publication, have brought us to the moment when the long-standing potential of natural language processing technology to benefit biomedical researchers may finally be realized. Technological advances, recent results in computational linguistics, maturation of biomedical ontology, and the advent of resources such as PubMedCentral have set the stage for an attempt at an integrated computational analysis of a large proportion of the full text biomedical literature. Such an analysis has the potential to dramatically extend the way that biomedical researchers can effectively use the scientific literature, particularly in the analysis of genome-scale datasets, broadly accelerating and increasing the efficiency of scientific discovery. We hypothesize that it is now possible to extract a wide variety of ontologically-grounded entities and relationships by processing the entire PubMedCentral document collection accurately and with good coverage, to use this extracted information to produce new genres of scientifically valuable tools and analysis techniques, and to demonstrate its utility in the analysis of genome-scale data. The challenges that we plan to overcome range from fundamental linguistic issues (e.g. cross- document coreference resolution) to high-performance computing (e.g. scaling up integrated processing to include millions of complex documents), to fielding practical systems that can exploit enormous knowledge-bases to accelerate the analysis of very large molecular data sets.
|
0.979 |
2010 |
Hunter, Lawrence E |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Automated Literature Mining For Validation of High-Throughput Function Prediction @ University of Colorado Denver
DESCRIPTION (provided by applicant): The function of millions of proteins remains unknown, and automated protein function prediction systems have a poor record of performance. We will test hypotheses about protein functional sites by validating high-throughput predictions derived from computational biology techniques through a novel automated system that will mine the literature for targeted information relevant to those predictions. The impact of our work will be to enable large-scale, validated, annotation of protein function and in turn to facilitate progress in tackling drug discovery for treatment of diseases. High-throughput experiments and bioinformatics techniques are creating an exploding volume of data with which we hope to transcribe the genetic blueprints of life. Targeted experiments are required to validate biomedical discoveries from these sources. Fortunately, the information to confirm or refute a prediction is often already available in an existing publication and the biologist can take advantage of this supporting evidence for validation. However, the sheer volume of predictions from high throughput methods exceeds the capacity of researchers to perform even the necessary literature searches. This gap in capacity must be addressed using automated literature mining methods that perform comparably to a human expert;indeed, development of such methods is a grand challenge of modern Biology. We will mine the full text literature to validate computational predictions of functional sites in proteins. The innovations in our approach include: (1) using computational predictions as the context for a literature search;(2) information extraction of protein functional sites from full text journal publications;(3) high-throughput text mining;and (4) using primary information in protein databases to evaluate the methods. Understanding of protein function is a critical bottleneck in the progress of biomedical research. It is time to truly integrate the biological literature into the protein function prediction problem. By doing so, we will enable a critical advance in high-throughput protein function prediction
|
0.979 |
2012 — 2021 |
Hunter, Lawrence E |
T15Activity Code Description: To assist professional schools and other public and nonprofit institutions to establish, expand, or improve programs of continuing professional education, especially for programs of extensive continuation, extension, or refresher education dealing with new developments in the science of technology of the profession. |
Colorado Biomedical Informatics Training Program @ University of Colorado Denver
The Colorado Biomedical Informatics Training Program is an independent, Ph.D.- granting and postdoctoral training program based in the University of Colorado School of Medicine, with a 15-year track record of innovative and effective training of pre- and post-doctoral fellows for research careers. We are a second-generation teaching program, informed by the experience of the many biomedical informatics training models that have come before us. Our program is designed to produce graduates with depth in both computational methods and biomedicine, an intimate familiarity with the science and technology that synergizes the two, and the skills necessary to pioneer novel computational approaches to significant biomedical questions. We are aware of the difficulty of achieving both breadth and depth in a reasonable amount of time, and believe we have identified a novel approach that is capable of training productive interdisciplinary scientists in a relatively short period. The program is tightly focused on transforming already strong students and recent Ph.D.'s into mature and productive scientists. Our program is structured around a set of four categories of educational goals and objectives: knowledge, communication skills, professional behavior, and self- directed life-long learning. Our graduates demonstrate the knowledge of core concepts and principles of biomedical informatics, and have the ability to apply computation to gain insight into important biomedical problems. Their knowledge includes mastery of the fundamentals of biomedicine, clinical and translational research, statistics, and computer science, as well as proficiency in the integration of these fields. Our graduates have contributed to the discovery and dissemination of new knowledge. They demonstrate interpersonal, oral, and written skills that enable them to interact productively with scientists from both the biomedical and the computational domains, to communicate the results of their work in appropriate formats, and to teach others biomedical informatics skills; they effectively bridge the gap between biomedical and computational cultures. Our graduates demonstrate the highest standards of professional integrity and exemplary behavior, as reflected in a commitment to the ethical conduct of research, continuous professional development, and thoughtfulness regarding the broader implications of their work. Our graduates demonstrate habits and skills for self-directed and life-long learning, and recognize that biomedical informatics is a rapidly evolving discipline. Our program itself is also undergoing continuous improvement, carefully tracking our efforts and quickly responding to changes in the field and in our situation. We are justifiably proud of our outstanding track record as well as of our dynamic and adaptive approach to the training of adept, flexible, and curious scientists able to comfortably assimilate new ideas and technologies during the course of their professional careers. Based on our successful track record, we are requesting that our current slot allocation be continued, that is, 8 predoctoral, 7 postdoctoral and 4 short term positions.
|
0.979 |
2014 — 2017 |
Hunter, Lawrence E. |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Developing and Applying Information Extraction Resources and Technology to Create @ University of Colorado Denver
DESCRIPTION (provided by applicant): Building on 8 years of highly productive work in technology development that included the creation of the Colorado Richly Annotated Full Text corpus (CRAFT), we hypothesize that text mining resources and methods are approaching the level of maturity required to productively process a significant proportion of the full text biomedical literature to create a well-represented formal knowledge base of molecular biology. We propose a detailed, integrated plan to achieve this long-standing goal. Success in this effort will make possible a transformative new way for the biomedical research community to identify access and integrate existing knowledge, breaking down disciplinary boundaries and other silos that have kept scientists from fully exploiting relevant prior results in their research. Our successes in the prior funding period broadened the applicability of biomedical concept identification systems to a much wider set of tasks, demonstrating the ability to target multiple community-curated ontologies in text mining, and generate scientifically significant insights from the results. The proposed work would take advantage of the resources we produced to transcend several of the limitations of previous efforts. We propose innovative new approaches to formal knowledge representation and to characterizing relationships between textual elements and semantic content. We will design, implement and evaluate computational systems that have the potential to transform enormous text collections into semantically rich, logic-based, standards-compliant, formal representations of biomedical knowledge with clearly identified provenance. The resulting representations will express complex assertions about a very wide range of entities, processes, qualities, and, most importantly, their specific relationships with one another.
|
0.979 |
2015 — 2018 |
Hunter, Lawrence E. |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Bio Text Nlp @ University of Colorado Denver
? DESCRIPTION (provided by applicant): Since our last renewal, the challenges for biomedical researchers of keeping up with the scientific literature have become even more acute. Last year marked the first time that Medline indexed more than a million journal articles; more than 210,000 of these had full text deposited in PubMedCentral, bringing the total number of full texts archived in PMC to over 3 million. The stunning pleiotropy of genes and their products, combined with the adoption of genome-scale technologies throughout biomedical research, has made obsolete the notion that reading within one's own specialty plus a few top journals is enough to keep track of all of the results relevan to one's research. Fortunately, advances in biomedical natural language processing and increasing access to digital full text journal publications offer the potential for innovative new approaches to delivering relevant information to working bench scientists. We hypothesize that realizing the potential of biomedical natural language processing applied to full text journal articles to make a sustained and powerful contribution to biomedical research requires contextualizing Biomedical natural language processing in the daily life of bench scientists, focusing on their unmet information gathering needs, and providing interfaces that fit well into existing research workflows.
|
0.979 |
2018 — 2021 |
Hunter, Lawrence E. |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Knowledge-Based Biomedical Data Science @ University of Colorado Denver
Knowledge-based biomedical data science In the previous funding period, we designed and constructed breakthrough methods for creating a semantically coherent and logically consistent knowledge-base by automatically transforming and integrating many biomedical databases, and by directly extracting information from the literature. Building on decades of work in biomedical ontology development, and exploiting the architectures supporting the Semantic Web, we have demonstrated methods that allow effective querying spanning any combination of data sources in purely biological terms, without the queries having to reflect anything about the structure or distribution of information among any of the sources. These methods are also capable of representing apparently conflicting information in a logically consistent manner, and tracking the provenance of all assertions in the knowledge-base. Perhaps the most important feature of these methods is that they scale to potentially include nearly all knowledge of molecular biology. We now hypothesize that using these technologies we can build knowledge-bases with broad enough coverage to overcome the ?brittleness? problems that stymied previous approaches to symbolic artificial intelligence, and then create novel computational methods which leverage that knowledge to provide critical new tools for the interpretation and analysis of biomedical data. To test this hypothesis, we propose to address the following specific aims: 1. Identify representative and significant analytical needs in knowledge-based data science, and refine and extend our knowledge-base to address those needs in three distinct domains: clinical pharmacology, cardiovascular disease and rare genetic disease. 2. Develop novel and implement existing symbolic, statistical, network-based, machine learning and hybrid approaches to goal-driven inference from very large knowledge-bases. Create a goal- directed framework for selecting and combining these inference methods to address particular analytical problems. 3. Overcome barriers to broad external adoption of developed methods by analyzing their computational complexity, optimizing performance of knowledge-based querying and inference, developing simplified, biology-focused query languages, lightweight packaging of knowledge resources and systems, and addressing issues of licensing and data redistribution.
|
0.979 |
2020 |
Hunter, Lawrence E. |
OT2Activity Code Description: A single-component research award that is not a grant, cooperative agreement or contract using Other Transaction Authorities |
High Performance Text Mining For Translator @ University of Colorado Denver |
0.979 |
2020 — 2021 |
Hunter, Lawrence E |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Scientific Questions: a New Target For Biomedical Nlp @ University of Colorado Denver
Project Summary Natural language processing (NLP) technology is now widespread (e.g. Google Translate) and has several important applications in biomedical research. We propose a new target for NLP: extraction of scientific questions stated in publications. A system that automatically captures and organizes scientific questions from across the biomedical literature could have a wide range of significant impacts, as attested to in our diverse collection of support letters from researchers, journal editors, educators and scientific foundations. Prior work focused on making binary (or probabilistic) assessments of whether a text is hedged or uncertain, with the goal of downgrading such statements in information extraction tasks?not computationally capturing what the uncertainty is about. In contrast, we propose an ambitious plan to identify, represent, integrate and reason about the content of scientific questions, and to demonstrate how this approach can be used to address two important new use cases in biomedical research: contextualizing experimental results and enhancing literature awareness. Contextualizing results is the task of linking elements of genome-scale results to open questions across all of biomedical research. Literature awareness is the ability to understand important characteristics of large, dynamic collections of research publications as a whole. We propose to produce rich computational representations of the dynamic evolution of research questions, and to prototype textual and visual interfaces to help students and researchers explore and develop a detailed understanding of key open scientific questions in any area of biomedical research.
|
0.979 |