2005 — 2009 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Approaches to Multiorganismal Comparative Proteomics @ Brigham and Women's Hospital
[unreadable] DESCRIPTION (provided by applicant): Comparative genomics has had great success in revealing evolutionary mechanisms and in predicting the functionality of proteins and non-coding functional regions. Mass spectrometry-based proteomics generates new types of data on the composition and localization of protein complexes and the interactions between complexes and individual proteins. The comparative analysis of these data has great potential as a useful approach to the characterization of the organization of the cell protein machinery. We propose to develop computational and experimental strategies for cross-species proteomic analysis. In Aim 1, we will improve existing, and develop new, computational methods for the homology-based identification of proteins and protein complexes using mass spectrometry data. These methods will not rely on the availability of protein sequences in the current database and, thus, will enable the analysis of multiple organisms currently out of the scope of proteomics. They will also increase the sensitivity of protein complex identifications in organisms with sequenced genomes because of the robustness with respect to incorrect gene predictions, sequencing errors, splicing isoforms and polymorphisms. In Aim 2, we will compare the protein interaction data for available and de novo generated examples of "molecular machines" from different organisms. We will further derive common patterns of evolutionary events in terms of the changes in protein interaction graphs. The observed changes in the composition of protein complexes, the interactions between complexes and in the individual proteins will be correlated with changes at the level of protein's sequence and structure. Gene duplication events will be characterized in terms of the interaction pattern, such that we will seek to design approaches that generate meaningful functional predictions for the specific "molecular machines." In Aim 3, we will develop automated computational methods for the comparative analysis of protein-protein interaction networks from different organisms. The application of these methods to emerging large-scale data on protein-protein interaction networks from multiple species will enable the identification of conserved interaction sets, the understanding of functionality and robustness of "molecular machines" and the characterization of the functional role of individual proteins. [unreadable] [unreadable] [unreadable] [unreadable]
|
0.915 |
2007 — 2011 |
Sunyaev, Shamil |
RL1Activity Code Description: Undocumented code - click on the grant title for more information. |
Mass Spectrometry of Proteins Involved in Organogenesis (5 of 10) @ Brigham and Women's Hospital
NIH Roadmap Initiative tag; mass spectrometry
|
0.915 |
2007 — 2017 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
New Methods and Enhanced Software For Predicting Functional Snps @ Brigham and Women's Hospital
[unreadable] DESCRIPTION (provided by applicant): Single nucleotide polymorphisms (SNPs) comprise the majority of the genetic differences between human individuals. Non-synonymous coding SNPs (nsSNPs), which result in amino acid replacements in protein sequences, together with c/s-regulatory SNPs affecting transcription and splicing are thought collectively to account for much of the genetic component of individual variation in susceptibility to complex diseases, response to Pharmaceuticals, and other phenotypes. Identification of functional nsSNPs can be facilitated by computational predictions based on the analysis of protein multiple sequence alignments, 3D structures and sequence annotations. This analysis was earlier automated in the computer program PolyPhen, an online tool maintained in our laboratory. Numerous researchers in diverse fields currently use PolyPhen to predict the effect of nsSNPs on protein structure and function. However, there is an increasing need for more accurate computational approaches to improve such predictions and to expand applicability of PolyPhen to all classes of polymorphisms. This proposal focuses on improving methods to predict the functional effect of SNPs in the human genome incorporated in PolyPhen and on transforming PolyPhen into scalable user-friendly cross-platform software. The proposal targets three Specific Aims: First, we propose to improve accuracy of PolyPhen by introducing new computational strategies for prediction of the effect of nsSNPs on protein structure and function (Specific Aim 1). Methodological innovations will include development of a multiple sequence alignment pipeline suppressing false predictions arising from misalignments. A new method will eliminate false-negative predictions resulting from compensatory substitutions in homologous sequences. We will use a structurally optimized Bayesian classifier to predict the functional effect of nsSNPs based on multiple features derived from protein sequence and structure. Next, we propose to extend the prediction method to non-coding SNPs (Specific Aim 2). We plan to take advantage of the extensive comparative genomic data that have been and continue to be generated. We will introduce a computational approach to predict functional SNPs in non-coding regions on the basis of probabilistic evolutionary models Finally, we plan to incorporate these developments into a new version of the PolyPhen software system, which will address significant demand for a robust, cross-platform tool that can be easily applied by diverse investigators to the problem of functional analysis of human SNPs (Specific Aim 3). This new version of PolyPhen will be incorporated into the Clinical Research Chart developed by I2b2 National Center of Biomedical Computing and integrated with VISTA visualization tools. [unreadable] [unreadable] [unreadable] [unreadable]
|
0.915 |
2008 — 2010 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Statistical Methods For the Design and Interpretation of Deep Resequencing Studie @ Brigham and Women's Hospital
DESCRIPTION (provided by applicant): The ability to generate sequence data is rapidly becoming a reality. Sequencing efforts are already underway at candidate gene regions surrounding association peaks identified by genome-wide association studies (GWAS), paving the way for "whole-exome" and, ultimately, whole-genome sequencing studies. Comprehensive sequencing has the potential to reveal a vast trove of low frequency variants, but most statistical association methods used for GWAS are likely inadequate because they are targeted towards common variants and have been optimized for identifying associations at a single variant at a time, and therefore, do not account for multiple variants acting at the same locus. For sequencing studies to attain their full potential, the development of new statistical methods will be critical. We propose to develop new methods for both targeted and genome-wide sequencing approaches. In Specific Aim 1 we will develop statistical methods for identifying causal variants inside a targeted region, such as a GWAS peak or candidate gene. DNA sequencing provides a complete picture of genetic variation, enabling the localization of association signal(s) in order to identify true causal alleles against a background of correlated variants due to linkage disequilibrium. We will design statistical strategies for finding causal variants underlying association peaks. We will consider the presence of multiple causal alleles at a locus. In Specific Aim 2 we will develop statistical methods for sequencing studies to optimally capture the association signal arising from multiple rare variants acting within the same disease gene. The initial focus will be on candidate gene sequencing with an eye towards whole-exome and even whole-genome sequencing. Associations of individual rare alleles with disease are difficult to detect because low-frequency alleles have limited power in single-variant association tests. We will develop methods combining multiple rare variants from the same gene (or pathway) and treat genes (pathways) rather than individual alleles as the unit for the association test. Recent studies demonstrate that genes underlying certain quantitative phenotypes display an excess of rare coding variation in individuals at one phenotypic extreme. In addition to combining multiple rare variants in a single test, we will also develop methods incorporating both rare and common variants, which will be important when whole- genome sequencing eventually becomes practical. In Specific Aim 3 we will assess the power of both targeted and genome-wide approaches and generate study design recommendations, using a population genetic model based on allele frequency distributions from empirical sequencing data sets. We will make recommendations on sequencing strategies, sample sizes, and inclusion of specific populations. All power calculations and recommendations will critically depend on assumptions about allele frequency distributions, which we will rigorously model using empirical sequence data. Our population genetic model will incorporate complex demographic histories, recombination and natural selection in addition to mutation and genetic drift. RESEARCH NARRATIVE: The study of human genetic variation has already begun to pay big dividends, as genome- wide association studies (GWAS) focusing on common genetic variation has identified risk variants for numerous complex diseases. However, for most diseases the fraction of genetic heritability explained by these findings is extremely small, motivating deep resequencing studies, which will be able to identify rare risk variants. These resequencing studies will require new statistical methods that will have great potential for furthering our understanding of disease etiology, leading to possible drug targets, and may also be useful for diagnostic testing in healthy individuals.
|
0.915 |
2012 — 2016 |
Beier, David R. (co-PI) [⬀] Goessling, Wolfram Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Mutant Mapping and Identification in Zebrafish by Next Generation Sequencing @ Brigham and Women's Hospital
DESCRIPTION (provided by applicant): The power of the zebrafish system stems from its utility as a developmental biology model combined with the ease of its genetic manipulation and experimentation. Our understanding of key genetic mechanisms of vertebrate development has been propelled by the phenotypic characterization, genetic mapping and positional cloning of induced and spontaneous mutations in zebrafish. However, the potential of this system has not been fully realized, as inefficient microsatellite-based mapping remains the primary method in the field. We propose to apply technological and computational advances of present day genomics to genetic mapping in the zebrafish system. Specifically, we propose to develop a method for rapid and accurate mapping of recessive zebrafish mutants using Next Generation Sequencing (NGS) of pooled samples. We also propose to investigate parameters of screen design and sample analysis to optimize the use of this protocol. Finally, we aim to develop methods for identification of the causal mutation among the variants discovered within the mapping interval. Application of NGS technology, complemented by specifically developed computational techniques, will provide an efficient, accurate and inexpensive method for genetic mapping in zebrafish. This approach will enable the simultaneous identification of informative genetic markers, mapping of the mutation position, and potential identification of the causal sequence change in a single experiment. The data obtained in these genomic analyses and the methods developed will be made available to the zebrafish community. Importantly, these approaches will also be widely applicable to genetic analysis of other model systems. PUBLIC HEALTH RELEVANCE: Analysis of zebrafish mutants has enabled the identification of genes contributing to fundamental biological processes, including human diseases; however, the methods used for mutant mapping and gene discovery are inefficient. Here, we propose a fast and cost-effective method for genetic mapping and mutation identification using Next Generation Sequencing. This will facilitate the more rapid discovery of gene function and the translation of this knowledge to biomedical investigation.
|
0.915 |
2013 — 2016 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Statistical Methods For Studies of Rare Variants @ Brigham and Women's Hospital
DESCRIPTION (provided by applicant): Genome-wide association studies focusing on common variants have explained a fraction of the heritable risk for many complex traits, but for many psychiatric diseases, the majority of heritable risk remains unknown. It is widely believed that rare variants also contribute to disease risk, and we and others have published examples of rare variants that contribute to psychiatric disease. Improvements in technology have now made it possible to generate large comprehensive data sets focusing on rare variants, using exome sequencing as well as the exome chip that we designed. We propose to assess the overall contribution of rare variants to disease heritability, develop statistical tests to localize these signals that are robust to population stratification, and build a map of mutation rates across the human genome for application to analysis of de novo mutations and case-only association tests. We will guide our research using >40,000 samples from psychiatric disease data sets. In Specific Aim 1 we will quantify components of heritability attributable to rare variants. Initial exome sequencing studies in complex traits have had limited success in identifying new disease genes. This leaves the field of genetics at a crossroads. Should even greater resources be invested in sequencing studies with very large sample sizes, or should the focus shift to other approaches? We will explore the idea that even if current sample sizes are not large enough to identify new genes, they are large enough to quantify the components of heritability explained by rare variants. We will develop new methods and apply them to several psychiatric disease data sets. This work will quantify the potential of future sequencing studies in larger sample sizes to identify new disease genes. In Specific Aim 2 we will extend rare variant tests to account for population stratification. We and others have developed statistical tests for multiple rare variants, including both burden and over-dispersion tests. These tests can succeed in detecting genes containing multiple associated rare variants, but only if sample sizes are very large. Unfortunately, large sample sizes increase the dangers of false-positive associations due to population stratification. Recent work showing differing patterns of population structure in common versus rare variants highlights the dangers of applying standard approaches using information from common variants. We will develop new methods to effectively correct for population stratification in rare variant tests and perform extensive simulations to demonstrate the efficacy of each approach. In Specific Aim 3 we will build a map of mutation rates across the human genome. We and others have recently shown that de novo mutation screens have a potential to identify genes of interest for neuropsychiatric phenotypes. We will construct a mutation rate map informed by comparative genomics and functional genomics data and will develop new statistical approaches for the analysis of human de novo mutations and their involvement in psychiatric diseases.
|
0.915 |
2014 — 2017 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Improving Polygenic Prediction Using Next-Generation Data Sets @ Brigham and Women's Hospital
DESCRIPTION (provided by applicant): Understanding the relationship between genotype and phenotype is the central goal of genetics. Available heritability estimates for many human traits of medical relevance suggest that 30-80% of phenotypic variation is due to underlying genetic variation. The ability to predict phenotypes based on genotypes is the ultimate test of our understanding of complex trait genetics. Since the dawn of complex trait genetics in the early 20th century, progress has been limited by the availability of genetic data in well-phenotyped populations. Now, due to the extraordinary progress in technology, microarray genotyping datasets, exome sequencing datasets and targeted sequencing datasets are available for large clinically phenotyped populations, and functional data is becoming available. A future explosion of whole-genome sequencing data is also widely anticipated. This shifts the focus from data acquisition to data interpretation and development of computational and statistical methods for predicting phenotypes from genotypes and functional information. We propose to develop new methods for predicting phenotypes from genotypes and apply these methods to newly collected data on human complex traits of direct medical interest, including both quantitative and disease traits. Our work on phenotype prediction will be informative about the allelic architecture of complex traits and will provide guidance for future genetic studies. From a practical perspective, there is an ongoing debate on the potential of genetic diagnostics in identification of individuals at elevated risk for specific complex diseases early in life. If successful, genetic diagnostics may inform selection of patients for early therapeutic intervention. However, the practical utility of genetics in evaluating risk of complex diseases has not been proven and is widely debated. We will rigorously test the hypothesis of the utility of genotype-based phenotypic predictions. In Specific Aim 1 we will develop and test new statistical methods for predicting phenotypes from microarray genotyping data. We will investigate several model selection and shrinkage strategies. We will evaluate whether it is more efficient to estimate contributions of individual markers independently or to fit all markers simultaneously. In Specific Aim 2 we will improve polygenic prediction in populations of diverse ancestry. It is important that medical progress not be limited to European populations. Our methods will generate predictions across human populations, accounting for population differences in allele frequencies, rates of allelic variation and patterns of linkage disequilibriu. In Specific Aim 3 we will develop and test statistical methods for predicting phenotypes from sequencing data. Sequencing data provide a distinct set of statistical challenges because they contain low-frequency and rare allelic variants, and often the effects of individual rare variants cannot be estimated. In Specific Aim 4 we will incorporate functional data into methods for phenotype prediction. We will investigate whether incorporation of functional data can improve phenotype predictions from genetic data.
|
0.915 |
2017 — 2021 |
Sunyaev, Shamil |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Rare and Common Variants in Complex Disease @ Brigham and Women's Hospital
Analyses of common and rare genetic variation have produced key biological insights for many complex diseases. However, for most diseases, including psychiatric disease, the bulk of heritability remains unexplained. The genetics community is increasingly focusing on rare variants, motivated by improvements in technology that are enabling the generation of large whole-genome and whole exome sequencing (WGS and WES) data sets. A growing number of high-profile studies on rare and common variant analysis have been published, including studies published by the PIs of this renewal application and funded by R01MH101244. Nonetheless, there are many unanswered questions about the genetic architecture of complex diseases. Here, we propose a research program that will investigate complex disease architectures and develop methods to optimally leverage rare and common variant contributions to produce new biological discoveries. We will assess contributions to disease heritability across the allele frequency spectrum; identify gene sets and functional annotations that are enriched for disease heritability; and leverage these findings to increase statistical power in studies of rare and common variants while controlling for confounding. Our collaboration has multiple strengths: our statistical and computational expertise; our extensive publication record in the previous funding cycle; our track record of producing practical software that is widely used by the community; and our data- driven approach, which ensures that the methods we develop will be broadly applied to psychiatric and other disease data sets. We will guide our research using hundreds of thousands of samples from large psychiatric GWAS, WES and WGS disease data sets.
|
0.915 |
2018 — 2021 |
Sunyaev, Shamil |
R35Activity Code Description: To provide long term support to an experienced investigator with an outstanding record of research productivity. This support is intended to encourage investigators to embark on long-term projects of unusual potential. |
The Origin, the Function and the Phenotypic Impact of Human Alleles @ Brigham and Women's Hospital
Genetic variation is the primary source of evolutionary innovation and a major factor responsible for phenotypic variation. Consequently, understanding such variation has great importance in both basic biology and evolution, and ultimately Mendelian and complex disease. We will study the origin of genetic variation through spontaneous mutational processes. Computational analysis of sequencing datasets will shed light on the mechanistic forces underlying germ-line and somatic cancer mutations in human. We will design new statistical models of de novo mutation that will have applications in population genetics, cancer genomics and genetics of neuropsychiatric disease. Next, we will improve computational methods for interpreting and predicting the effect of mutation on molecular function, including both coding and non-coding variation. Our methods integrate data from evolutionary genetics and biophysics and rely on comparative, functional and structural data. The newly developed methods will have applications in both medical and population genetics. We will study the population dynamics of alleles to estimate the forces that shape genetic variation within populations. We will rely on population genetics models to analyze evolutionary maintenance and genetic architecture of human phenotypes. Fascinated by the relationship between genotype and phenotype, we will combine theoretical models and statistical analysis of large-scale sequencing datasets to infer properties of the allelic architecture of complex traits. We will design new approaches to characterize and predict the genetic component of common disease risk. !
|
0.915 |
2021 |
Price, Alkes L Raychaudhuri, Soumya [⬀] Sunyaev, Shamil |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Predicting the Impact of Genetic Variants, Genes and Pathways On Human Disease @ Brigham and Women's Hospital
Project Summary Over the past decade, genome-wide association studies have discovered complex disease-associated genetic variants while at the same time whole genome sequencing studies have been identifying risk alleles for Mendelian and complex diseases. These variants have the potential to shed light on human disease mechanisms. But there are several important challenges. More than 90% of complex disease associated variants lie within non-coding regions, posing a challenge of identifying relevant cell types and cell states, target genes, and regulatory mechanisms. The important task of linking these variants to genes itself can be challenging. In addition, as our ability to identify de novo and rare mutations for complex and Mendelian diseases is rapidly expanding, defining the function of those de novo alleles, which genes and pathways they affect remains uncertain. To address these challenges, we will predict the functional impact of disease risk variants at the level of individual variants, individual genes, and pathways to elucidate disease biology. In all aims of this proposal we will utilize IGVF functional genomic data. In Aim 1, we will predict the regulatory potential of variants in disease-critical cell types/states at a single base-pair resolution. We will identify pathogenic cell-states by analyzing single cell transcriptional data sets in a disease context, and then integrate single-cell epigenetic data to define the regulatory landscape of these rare disease cell-states. These regulatory regions identified in this analysis can be used to annotate variants for potential function. Finally, to understand functionality of specific variants in regulatory regions, we quantify selective pressure using large-scale whole genome sequencing data. In Aim 2, we will predict functional impacts of genes by effectively linking variants to genes. Defining causal diseases genes is critically important since they may be important for therapeutic targeting. We develop strategies to use genetic data and functional genomic data to predict downstream genes, and evaluate these methods with a set of gold-standard casual genes from Mendelian phenotypes. In Aim 3, we focus on rare and de novo mutations with large effect sizes. Here we recognize that predicting the function of these alleles requires an understanding of the pathways they effect, models to connect rare non-coding variants to genes, and strategies to define functionality of the variants based on population genetic parameters. In Aim 4, we develop a framework to synergize with the IGVF consortium to advance consortium goals, outlining our integration plan and flexible programmatic framework. The proposal represents a collaboration between Drs. Soumya Raychaudhuri, Alkes Price, and Shamil Sunyaev, bringing analytical expertise across functional genomics, single-cell data integration, and population genetics. These investigators have a history of successful collaborations with a strong publication records integrating functional genomics data with GWAS and sequencing studies to uncover disease mechanisms.
|
0.915 |