2011 — 2014 |
Engelhardt, Barbara Elizabeth |
K99Activity Code Description: To support the initial phase of a Career/Research Transition award program that provides 1-2 years of mentored support for highly motivated, advanced postdoctoral research scientists. R00Activity Code Description: To support the second phase of a Career/Research Transition award program that provides 1 -3 years of independent research support (R00) contingent on securing an independent research position. Award recipients will be expected to compete successfully for independent R01 support from the NIH during the R00 research transition award period. |
Statistical Models to Investigate Long-Distance Qtl Transcription Regulation
DESCRIPTION (provided by applicant): Thousands of genome-wide association studies link specific diseases or complex phenotypes to single mutations in the human genome. But translating these results to medical treatments requires a precise understanding of how that mutation contributes to the mechanism of disease. Currently, the regulatory role of single nucleotide polymorphisms (SNPs) is, for the most part, confined to local, or cis-, expression quantitative trait loci (eQTLs) in a small number of human tissues. But not all diseases or complex phenotypes are mediated by cis-eQTLs. Very few long-distance, or trans-, eQTLs have been identified and validated in human tissues, although trans-eQTLs play an important role in some complex phenotypes. Alternative splicing has also been shown to modulate certain phenotypes;however, little is known about SNPs that regulate alternative splicing. The proposed K99/R00 research seeks to design statistical methods that build gene and transcript networks to identify SNPs that regulate gene and mRNA isoform transcription, both locally and over long distances, and to validate those findings, for the purpose of providing insight into mechanisms for complex phenotypes and disease. We propose to leverage cis-eQTLs and gene expression data in humans identified in our current work to build precise, directed gene networks on a genome-scale. We will build these networks using Bayesian statistical models to compute the probability of a particular network with respect to each gene in the network jointly, with associated eQTLs providing information about whether regulated genes are upstream or downstream of other network genes. We will use Markov chain Monte Carlo and linear programming relaxation methods that have been shown to find near-optimal solutions to this type of problem. We will use these networks to identify trans-eQTLs, and quantify the effect of each trans-eQTL in a particular process using Bayesian statistical tests developed in our lab. Subsequently, we propose to exploit the opportunities of novel RNA sequencing techniques and nonparametric statistical models to identify transcript isoforms for each transcribed gene and, simultaneously, individual-specific transcript levels by extending sparse factor analysis models. This will enable us to identify QTLs that regulate the transcription of specific transcript isoforms (tQTLs) via alternative splicing events by extending the methods we have for eQTL identification. We will use the methodology we developed for eQTLs to build networks for transcript isoforms (transcript networks). Finally, we will use transcript networks to identify and quantify tQTLs that regulate individual-specific levels of transcript isoforms both locally and over long genetic distances, as with eQTLs. We will make all of our methods and results publicly available. PUBLIC HEALTH RELEVANCE: Thousands of genome-wide association studies link specific diseases or complex traits to single mutations in the human genome, but these results cannot yet be translated to medical treatments because knowing that a mutation is associated with a disease does not, in fact, give us insight into how that mutation contributes to the mechanism of disease. Our proposed research will design and validate statistical methods that provide a comprehensive road map to understanding the biological role of the mutations that are identified in these association studies. With the role of thousands of possibly disease-related mutations in hand, researchers can begin to piece together the mechanism of a disease and translate their findings into treatments for the disease much more quickly.
|
1.009 |
2013 — 2016 |
Brown, Christopher David Engelhardt, Barbara Elizabeth |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Identification and Validation of Cell Specific Eqtls by Bayesian Modeling @ University of Pennsylvania
DESCRIPTION (provided by applicant): Project Summary Non-coding single nucleotide polymorhpisms (SNPs) account for over 85% of the genotype-phenotype associations identified in genomewide association studies (GWAS), yet we understand almost nothing about their functional mechanisms. Numerous lines of evidence demonstrate that regulatory SNPs play causal roles in many complex human phenotypes. GWAS associations are enriched for variants associated with gene expression levels (eQTLs) and within cis-regulatory elements (CREs). Because eQTLs and CREs are often functional in a subset of cell types, and because a particular cell type is often of interest for a disease, it is critical that analyses of GWAS-eQTL overlap consider cell specificity. Our long term research objective is to determine, for every non-coding SNP, if it is functional in a particular cell type and, if so, the specific mechanism by which it functions. In order to reach this goal, we need to have in hand a large set of cell specific, causal functional SNPs from which we can begin to generalize; the results from current eQTL studies are typically insufficient because they are not always relevant for a cell type of interest, they identify tag SNPs instead of the causal SNP, and they do not integrate CREs. Our objectives in this proposal are to develop statistical models to identify, quantify, and functionally interpret cell specific eQTLs in cis and trans, and to experimentally validate causal variant predictions using novel massively parallel CRE reporter assays. In Aim 1, we will develop multivariate Bayesian regression models that will improve power for eQTL detection, improve the interpretibility of eQTL cell specificity, and identify the CREs through which each SNP functions. In Aim 2, we will develop structured sparse latent factor models to identify cell specific gene coexpression modules that will be used to identify trans-eQTLs while simulataneously controlling for hidden confounding variables. In Aim 3, we will develop and apply massively parallel CRE reporter assays to validate thousands of predicted causal variants that underlie eQTL associations. With such a large collection of cell specific causal eQTNs and CREs in hand, we hope to mechanistically interpret GWAS associations, identify cancer-causing somatic mutations, and specify novel drug targets for human disease.
|
0.979 |
2017 — 2020 |
Brown, Christopher David Engelhardt, Barbara Elizabeth |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Epigenetic Fine-Mapping of Cardiometabolic Disease Loci in the Human Liver @ University of Pennsylvania
Epigenetic ?ne-mapping of cardiometabolic disease loci in the human liver Summary Cardiovascular disease (CVD) is the leading cause of mortality in the world: an estimated 17; 500; 000 people worldwide died from CVD-related illness in 2012. While disease altering therapies such as statins have had a tremendous health impact, many individuals are unresponsive to treatment or go undiagnosed until a fatal event occurs. Moreover, while clinical risk factors and family history are signi?cantly predictive of CVD risk, risk prediction and early clinical intervention must be improved to diminish the lethality of the disease. Scienti?c studies have uncovered common genetic variation at more than 182 separate genetic loci that contribute to variability in CVD, coronary artery disease (CAD), myocardial infarction (MI) risk, and associated metabolites including blood lipids. However, several critical limitations have restricted the translational impact of these study ?ndings on clinical medicine. Importantly, while we know that temporal, genomic, and cellular context varies dramatically across individuals, current GWAS studies assume a static context across all samples. Indeed, it is precisely this dynamic context that will shed light on how a speci?c genetic variant impacts molecular traits, which, in turn, modulate disease risk. Furthermore, a primary tissue involved in CVD is the human liver, which has been dif?cult to deeply phenotype because of the dif?culty of acquiring liver samples. In this proposal, the PIs will address these critical limitations by creating a deep molecular phenotype map of 200 human liver biopsy samples, by developing essential statistical tools to predict the genomic regulatory signals in these rich liver data, and by using these predictions to drive experimental validation of regulatory signals through reporter assays and genome editing in order to study the mechanisms of the genetic regulation of CVD risk. In Aim 1, in collaboration with two transplant surgeons at Penn, the PIs pro- pose to build a comprehensive map of the genetic and epigenetic traits of 200 human liver biopsy samples. In Aim 2, the PIs propose to develop statistical methods to identify regulatory genetic variants using paired sample design to share strength across the multiple epigenetic traits. While study data of this type is cur- rently rare, we anticipate substantial growth in studies of this type and broad use of our analytic approaches. In Aim 3, the PIs propose to develop experimental methods to validate the mechanisms by which functional SNPs impact CVD risk. In particular, we will develop massively parallel CRE reporter assays and genome engineering in iPSC derived hepatocytes to characterize the precise mechanism of multiple CVD risk vari- ants. Throughout this proposal, the PIs will develop, evaluate, and make public new analytic tools that take advantage of many-core computing environments, and will make publicly available all of the genetic and epigenetic data generated from the liver samples.
|
0.979 |
2019 — 2022 |
Ramadge, Peter [⬀] Adams, Ryan (co-PI) [⬀] Vonholdt, Bridgett Engelhardt, Barbara Mittal, Prateek (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri Acquisition of a High Performance Large Memory Computing Cluster For Large Scale Data-Driven Research
This project will acquire a state-of-the-art High Performance Computing (HPC) cluster to support large scale, data-driven research. The instrument will support a variety of projects from computer science, electrical engineering, ecology, evolutionary biology, neuroscience and genomics. In neuroscience, the cluster will allow the use of advanced statistical techniques at scale to identify and connect anatomical and functional brain-imaging features of diseased and healthy subjects with specific underlying genetic profiles. In computer science, using machine learning algorithms deployed on the instrument, researchers will to seek new ways to protect the security and privacy of users in large-scale networked systems. Finally, the cluster will also enable research that will improve our understanding of evolutionary history and the molecular complexities of traits through the analysis of multi-animal, large-scale genomic datasets. In addition, through short courses and multiday boot-camps, the instrument will provide valuable opportunities for training postdoctoral fellows, graduate students, and advanced undergraduates in large-scale computational data science. The instrument will also be a valuable asset for certificate programs in statistics and machine learning (one for undergraduate students, the other for graduate students) and for a certificate program in computational science, all of which will support broadening participation of groups underrepresented in STEM. The research and training enabled by the instrument is expected to help improve our understanding of human health and well-being, help create new knowledge that will aid economic competitiveness, and help maintain the country's leadership in science and engineering.
The computing cluster will be formed of by nodes with very large memory. The system complements the institution's investments in research cyberinfrastructure and will be managed by the Princeton Institute for Computational Science and Engineering (PICSciE) and the Office of Information Technology (OIT). The instrument would initially be used by five research groups, part of the Center for Statistics and Machine Learning (CSML), which will leverage existing programs and partnerships to increase participation in data science. The initial five specific projects are united under a common theme: machine learning will be used for analyzing big data sets that may not be easily broken into smaller pieces for processing. Specifically, they will examine the following: 1) the use of probabilistic models for large-scale scientific analysis and de novo design in applications areas such as mechanical metamaterials and mixed-signal circuit development; 2) statistical machine learning in genomics, biomedicine, and health biostatistics including the analysis of hospital records to aid doctors in taking early action to improve patient outcomes, the heritability of neuropsychiatric diseases and drug responses, and statistical and experimental examination of cardiovascular disease risk; 3) security and privacy challenges in networked systems using machine learning techniques to detect and isolate attackers in networked systems such as social media; 4) large-scale machine learning for neuroscience such as joint analysis of many large-scale, multi-subject fMRI datasets where the size and number of the datasets; 5) evolutionary genomic and epigenome analyses through collection and analysis of large datasets to investigate the evolutionary history and molecular complexities of traits. Collectively, these research groups are composed of forty graduate students, ten postdocs, and include, on average, thirteen undergrad research projects per year. The instrument will also be used by other researchers engaged in large-scale, data-driven research across a wide variety of disciplines. Hence both the capacity and the capability aspects of the proposed instrument will be highly utilized and will enable the continued advancement of research at the University.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.915 |