2016 |
Kundaje, Anshul |
DP2Activity Code Description: To support highly innovative research projects by new investigators in all areas of biomedical and behavioral research. |
Deep Learning Frameworks For Regulatory Genomics.
PROJECT SUMMARY The deluge of genome sequencing and functional genomic data in multiple cellular contexts across healthy and diseased individuals provides a unique opportunity to decipher the regulatory and genetic architecture of diseases and traits. Novel computational methods are required that can address fundamental problems involving data representation, data integration, learning accurate predictive models from large-scale datasets and extraction of novel biological insights from complex models. We propose novel machine learning frameworks based on deep neural networks with new interpretation and hypothesis generation engines capable of integrating a wide variety of key genomic data types to learn predictive models of chromatin architecture and chromatin state; integrative models of transcription factor binding; determinants of macroscale three-dimensional genome architecture involving long-range chromatin contacts and the regulatory basis of functional non-coding, regulatory variants. Our methods are highly generalizable to several other related problems in regulatory genomics and lay the foundation for a paradigm shift in computational genomics.
|
1 |
2017 — 2019 |
Blau, Helen M (co-PI) [⬀] Kundaje, Anshul |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Learning Regulatory Drivers of Chromatin and Expression Dynamics During Nuclear Reprogramming
PROJECT SUMMARY The deluge of sequencing-based functional genomic data profiling the transcriptome, regulome and epigenome in hundreds of diverse cellular contexts has spurred the development of powerful computational methods to learn integrative models of gene regulation. However, models learned on data from static cellular contexts only reveal correlative regulatory relationships. There is a paucity of tractable model systems and experiments profiling dynamic cellular processes and a corresponding lack of computational methods that can learn putative causal mechanisms controlling the precise timing and temporal order of changes in genomic chromatin state and gene expression. Here, we propose novel machine learning methods to learn dynamic models of transcription regulation in the context of cellular reprogramming. In Aim 1, we propose deep learning frameworks with new interpretation engines that can integrate dynamic chromatin accessibility and gene expression data to reveal networks of cis regulatory elements, transcription factor binding complexes and cascades of trans-acting regulatory factors that control cell fate. In Aim2, we will apply our modeling framework to investigate early dynamics of nuclear reprogramming of human fibroblasts to pluripotency. We will leverage a powerful heterokaryon cell fusion model system to generate global chromatin and gene expression profiles over a two day timecourse. In Aim 3, we perform perturbation experiments using RNAi and CRISPR/Cas9 technologies to validate hypotheses generated by our models and test the effectiveness of predicted pluripotency factors and regulatory elements in inducing reprogramming. The validation experiments will be further used to iteratively refine the computational models. We will integrate the time-course data generated in our model system with data from large reference compendia of functional genomic data such as the Encyclopedia of DNA Elements (ENCODE) and The Roadmap Epigenomics Project. Our analyses will reveal molecular mechanisms crucial to early and transient stages of nuclear reprogramming, providing novel contributions to our fundamental knowledge of regenerative medicine. Finally, the proposed end-to-end integrative framework is highly generalizable and will be of broad utility to learn dynamic models of transcriptional regulation from time-course datasets in other model systems.
|
1 |
2018 — 2021 |
Baker, Julie C [⬀] Kundaje, Anshul Winn, Virginia D |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Molecular Images and Machine Learning to Extract Placental Function From Maternal Cfdna
Abstract Circulating cell free DNA (cfDNA) has revolutionized prenatal diagnostics, but this is the tip of the iceberg, as cfDNA fragmentation patterns embed epigenetic footprints indicative of cell of origin, cellular function and pathological state. cfDNA is fragmented with sizes centered around 145bp and 166bp which is approximately the length of DNA wrapped around a nucleosome, and a nucleosome plus its linker, respectively. Shorter fragments (30-100bp) also exist and have a clear periodicity of 10bp, corresponding to a turn of the DNA helix wrapped around the core histone. Recent reports have shown that the fragmentation sizes of cfDNA are tissue specific, which is a product of distinct nucleosome spacing that is inherent in the function of individual tissues. When these individual fragments are compared with existing epigenetic data from tissues, they can be binned into cell of origin simply based on whether they reveal the nucleosome positioning information of the originating tissue. Identifying cfDNA fragments of placental origin from maternal circulation would provide a non-invasive means of assessing placental function during human pregnancy. Several major barriers inhibit cfDNA as a non-invasive method for examining placental function: 1) the ability to accurately identify the placental origin of the short <160bp cfDNA fragments that constitute regulatory information (paternal SNPs occur at frequency of approximately 1/2000bp). 2) the ability to use these fragments to piece together precise epigenetic states of the placenta. 3) the cost of deep whole genome sequencing that has traditionally been required to deconvolute epigenetic profiles of admixed cellular origins. Our goal is to overcome each of these barriers by exploiting state-of-the-art genomics and machine learning techniques to extract precise information about human placental function from cfDNA. We will first compile robust and accurate nucleosome information, including epigenetic and transcription factor occupancy, from the human placenta and then we will establish machine-learning platforms to elucidate placental cfDNA from maternal circulation at low cost. Success in this project will enable earlier intervention for high-risk pregnancies and facilitate the longitudinal, non-invasive real-time monitoring of pregnancy progression, thereby informing adaptive treatment decision-making.
|
1 |
2021 |
Kundaje, Anshul Montgomery, Stephen Blair Montine, Thomas J [⬀] |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Multi-Omic Functional Assessment of Novel Ad Variants Using High-Throughput and Single-Cell Technologies
PROJECT SUMMARY / ABSTRACT Through decades of research, genome-wide association studies (GWAS) have identified heritable coding and noncoding single-nucleotide polymorphisms (SNPs) that lead to an increased risk of developing Alzheimer's disease (AD). However, the vast majority of these SNPs remain largely under-characterized, and their contribution to AD pathogenesis remains unclear, marking a critical roadblock to our understanding of AD genetics and pathogenesis. While SNPs within the APOE and TREM2 genes have identified vital nodes in AD biology, most AD-related SNPs reside within the noncoding genome, making their functional roles in the disease less clear. Co-inheritance of nearby SNPs (linkage disequilibrium) and the cell type-specificity of noncoding regulatory elements further complicate functional annotation of noncoding SNPs in AD. As part of the Alzheimer's Disease Sequencing Project Functional Genomics Consortium (ADSP FGC), this project will provide a robust and conclusive functional characterization of AD-related noncoding SNPs. To do this, we will first create a comprehensive single-cell atlas of gene expression and chromatin accessibility across a cohort of diverse clinico-pathologic states related to AD (Aim 1). Using these cell type-specific gene regulatory landscapes, we will develop and implement innovative machine learning and statistical genomics methods to predict functional noncoding, splicing, and coding SNPs (Aim 2). We will then validate these predictions using massively parallel reporter assays (MPRAs) and large-scale, scarless, single-base CRISPR editing of iPSCs followed by cell type-specific differentiations (Aim 3). Taken together (Aim 4), this project will pinpoint the functional SNPs and target cell types for dozens of AD-related risk loci and provide an unprecedented picture of the gene regulatory landscape of AD. This work will be performed as a joint collaboration between Stanford University and the Gladstone Institutes at UCSF. Our team, with many long-standing collaborations, has extensive experience in consortium science with long-term involvement in the Encyclopedia of DNA Elements, The Cancer Genome Atlas, and The Genotype-Tissue Expression Project. The proposed project is thus well- positioned to integrate into the highly collaborative ADSP Functional Genomics Consortium.
|
1 |
2021 |
Kundaje, Anshul |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Predicting Context-Specific Molecular and Phenotypic Effects of Genetic Variation Through the Lens of the Cis-Regulatory Code
ABSTRACT A central challenge in human genomics is to interpret the regulatory functions of the noncoding genome, and to identify and interpret variants with regulatory functions. In this project we plan to leverage recent advances in experimental functional genomics (including single cell methods and high throughput perturbation methods) alongside recent progress in deep learning models of gene regulation, to make fundamental progress on these problems. We have assembled a team of investigators with diverse and complementary expertise ? in deep learning, single-cell genomics, cellular QTLs and GWAS, and high throughput validations ? to build, test, and implement predictive models for interpreting disease associations. Specifically, we aim to (1) Develop interpretable base-resolution deep-learning models for regulatory sequences; (2) Predict and validate cell type- specific effects of regulatory variants on molecular phenotypes and disease; (3) Collaborate with the IGVF Consortium to build nucleotide-level regulatory maps. Our ultimate goal in this project will be to create a nucleotide-resolution cis-regulatory map of the human genome to connect disease variants to functions and phenotypes, in diverse cell types, states, and spatial contexts.
|
1 |