2009 — 2014 |
Xie, Xiaohui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Computational Tools For Interpreting Genomes @ University of California-Irvine
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
This is a CAREER award to support the research of Dr. Xiaohui Xie in the Department of Computer Science at UC-Irvine. Dr. Xie is a second year, tenure-track Assistant Professor. Identification of all functional elements encoded in genomes is a fundamental need in genomic research. A powerful approach for discovering functional elements in the genome is through comparative genomics. Functional sequences are often under strong selection pressure to remain conserved so they can stand out from the surrounding sequence by virtue of greater levels of conservation. This research is developing novel statistical and computational tools for comparative genome analysis and for discovering functional elements in genomes by modeling the evolutionary constraints of these functional elements from their biased nucleotide substitution patterns. An assumption underlying more common methods for comparative genomics is that the functional sequences are evolving at a slower rate than neutral sequences, and are modeled as having shorter evolutionary distance between species than neutral sequences. In fact, many functional nucleotides can change between certain nucleotides without affecting the function they encode so that mutation-based approaches may have less power for detecting less obvious functional elements. Secondly, the rate-based methods only determine whether a sequence is conserved or not, but do not provide information regarding the specific constraints encoded at each nucleotide of the conserved sequence. This work is examining whether substitution patterns between different nucleotides show a bias over evolutionary time through the development of algorithms to infer these patterns directly from sequence alignments.
As part of his CAREER plan, Dr. Xie is developing an extensive curriculum of two new bioinformatics courses, one undergraduate and the other graduate and two additional courses in computational biology. The research includes the involvement of students from the minority science program at UCI and from the California State Summer School for Mathematics and Science (COSMOS) program, a high school student summer school program. The results from this research will provide new scientific resources because the computational tools and results will be freely available through publicly-accessible web services at http://www.ics.uci.edu/~xhx/.
|
0.957 |
2012 — 2015 |
Xie, Xiaohui Cho, Ken Blitz, Ira |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Deciphering Bmp Signaling and the Role of Schnurri Transcription Factors Using a Deep Sequencing Approach in Xenopus @ University of California-Irvine
BMPs were first identified for their bone-inducing properties. Subsequently it was discovered that BMPs control numerous events in embryonic development and adult tissue homeostasis including cell type specification, stem cell maintenance, cell death, and cell division. As BMPs play roles in a large variety of biological processes, defects in BMP activities are linked to various diseases including developmental defects and cancer. In order to further uncover the root causes of other developmental defects and diseases, identification of genes regulated by BMPs is essential. This project incorporates state-of-the-art molecular biology approaches to uncover the intricacies of BMP target gene regulation. The intellectual merits of this proposal are to provide a global view of BMP function in a depth that is unprecedented in the current literature. This project is likely to uncover the involvement of BMPs in various diseases and biological processes including the maintenance of stem cell properties. Findings from this project will also explain how BMP signaling has evolved to control development in different biological and evolutionary contexts. This project will also have an educational impact as over one dozen undergraduates and graduate students will be trained in the areas of molecular biology, developmental biology, and computational biology.
|
0.957 |
2012 — 2014 |
Xie, Xiaohui |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Machine Learning Methods to Increase Genomic Accessibility by Next-Gen Sequencing @ University of California-Irvine
DESCRIPTION (provided by applicant): DNA sequencing has become an indispensable tool in many areas of biology and medicine. Recent techno- logical breakthroughs in next-generation sequencing (NGS) have made it possible to sequence billions of bases quickly and cheaply. A number of NGS-based tools have been created, including ChIP-seq, RNA-seq, Methyl- seq and exon/whole-genome sequencing, enabling a fundamentally new way of studying diseases, genomes and epigenomes. The widespread use of NGS-based methods calls for better and more efficient tools for the analysis and interpretation of the NGS high-throughput data. Although a number of computational tools have been devel- oped, they are insufficient in mapping and studying genome features located within repeat, duplicated and other so-called unmappable regions of genomes. In this project, computational algorithms and software that expand genomic accessibility of NGS to these previously understudied regions will be developed. The algorithms will begin with a new way of mapping raw reads from NGS to the reference genome, followed by a machine learning method to resolve ambiguously mapped reads, and will be integrated into a comprehen- sive analysis pipeline for ChIP-seq. More specifically, the three aims of the research are to develop: (1) Data structures and efficient algorithms for read mapping to rapidly identify all mapping locations. Unlike existing methods, the focus of this research is to rapidly identify all candidate locations of each read, instead of one or only a few locations. (2) Machine learning algorithms for read analysis to resolve ambiguously mapped reads for both ChIP-seq analysis and genetic variation detection. This work will develop probabilistic models to resolve ambiguously mapped reads by pooling information from the entire collection of reads. (3) A comprehensive ChIP- seq analysis pipeline to systematically study genomic features located within unmappable regions of genomes. These algorithms will be tested and refined using both publicly available data and data from established wet-lab collaborators. In addition to discovering new genomic features located within repeat, duplicated or other previously unac- cessible regions, this work will provide the NGS community with (a) a faster and more accurate tool for mapping short sequence reads, (b) a general methodology for expanding genomic accessibility of NGS, and (c) a versatile, modular, open-source toolbox of algorithms for NGS data analysis, (d) a comprehensive analysis of protein-DNA interactions in repeat regions in all publicly available ChIP-seq datasets. This work is a close collaboration between computer scientists and web-lab biologists who are developing NGS assays to study biomedical problems. In particular, we will collaborate with Timothy Osborne of Sanford- Burnham Medical Research Institute to study regulators involved in cholesterol and fatty acid metabolism, with Kyoko Yokomori of UC Irvine to study Cohesin, Nipbl and their roles in Cornelia de Lange syndrome, and Ken Cho of UC Irvine to study the roles of FoxH1 and Schnurri in development and growth control. PUBLIC HEALTH RELEVANCE: DNA-sequencing has become an indispensable tool for basic biomedical research as well as for discovering new treatments and helping biomedical researchers understand disease mechanisms. Next-generation sequencing, which enables rapid generation of billions of bases at relatively low cost, poses a significant computational challenge on how to analyze the large amount of sequence data efficiently and accurately. The goal of this research is to develop open-source software to improve both the efficiency and accuracy of the next-generation sequencing analysis tools, and thereby allowing biomedical researchers to take full advantage of next-generation sequencing to study biology and disease.
|
0.957 |
2017 — 2020 |
Xie, Xiaohui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Iii: Small: Integrating and Interpreting Heterogeneous Genomic Data Through Deep Learning @ University of California-Irvine
Comprehensive identification of all functional elements encoded in genomes is a fundamental need in both basic and applied biological research. Although the coding regions of genomes are well understood, the noncoding regions, representing over 98% of mammalian genomes, are far less studied, but hold the key to understanding gene regulation, evolution, genetic basis of complex phenotypes, etc. The goal of this project is to develop computational methods to infer the function of noncoding sequences by leveraging the plethora of data from publicly available genomic data and state-of-the-art algorithms from machine learning. These algorithms can greatly expand the utility of existing genomic data, improving the accuracy of annotating pathogenicity of noncoding variants, and offering a new way of studying grammars of gene regulation encoded by noncoding sequence. The project will additionally create opportunities to facilitate interactions between biologists and computer scientists, and offer interdisciplinary training for both undergraduate and graduate students, especially those from traditionally underrepresented groups.
The goal of this project is to develop a new computational framework based on deep learning to understand noncoding sequences. Over the past few years, researchers have generated thousands of genome-scale datasets on chromatin accessibility, histone modifications, DNA methylation, protein-binding, and others, spanning a broad range of tissue and cell types. This project will integrate these heterogeneous datasets to derive a comprehensive characterization of noncoding sequence through innovative machine learning algorithms based on convolutional and recurrent neural nets, and deep generative models. The PI will develop deep learning algorithms to map the relationship between noncoding sequences and the diverse genomic measurements, learn chromatin states and discover novel functional elements from these measurements, and predict effects of noncoding genetic variants. Training a flexible and scalable learning model with large amounts of data provides a way of characterizing noncoding sequences in an unbiased and robust fashion, and offers a better chance of extracting complex regulatory rules encoded within noncoding sequences than conventional methods. This project will provide the genomics community with a versatile, modular, open-source toolbox of software packages, with the goal of greatly improving the accuracy of current genome analyses.
|
0.957 |