2002 — 2006 |
Stephens, Matthew |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genome Analysis: Data Accuracy, Haplotyping, and Mapping @ University of Washington
DESCRIPTION (provided by applicant): Long-term objective: to develop quantitative methods and software for the interpretation and analysis of human genetic variation. The methods will be tailored to the specific needs of large-scale studies of sequence variation, particularly those attempting to understand the genetic basis of complex diseases. The aim will be to supply scientists involved in such studies with an integrated set of tools to a) monitor and improve data quality, b) design effective studies, and c) perform powerful data analyses, ultimately reducing the cost of developing effective medical treatments for common diseases.Major specific aims: 1. To develop automatic methods for calling genotypes from sequence trace data, and for assigning each genotype call a "quality score", quantifying the probability that the call is correct, allowing data accuracy to be carefully monitored. 2. To extend an existing statistical method for inferring haplotypes from population genotype data to allow it to impute missing genotypes, identify potential genotyping errors, and make it more applicable to data on a larger (genomic) scale. 3. To develop methods to infer recombination rates, and identify potential recombination "hotspots" or "coldspots", from population data (information that will aid in the design of effective mapping studies aiming to locate variants affecting disease susceptibility). 4. To develop methods for linkage disequilibrium mapping that make efficient use of data from many SNP markers simultaneously, thus reducing the costs, and increasing the chances of success, of mapping studies. Aim 1 will be achieved through a statistical analysis of pertinent sequence trace features for analyst-called genotypes. Aims 2-4 will exploit population genetics models that make predictions about patterns of haplotypic variation expected in natural populations, and how patterns of linkage disequilibrium will be affected by variations in local recombination rate. Computational statistical methods, such as Markov chain Monte Carlo, will be used extensively in implementing these methods. The methods will be tested on real and simulated data. User-friendly software will be developed, documented, distributed and supported.
|
1 |
2006 — 2008 |
Stephens, Matthew |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Multipoint and Significance Methods For Genome-Wide Association Studies @ University of Washington
[unreadable] DESCRIPTION: (provided by applicant) This project will develop statistical methods for analyzing both genome-wide association studies and studies on multiple candidate genes, where the phenotype of interest is quantitative. The methods will include novel multipoint methods designed to extract the maximum amount of information from the available data, and methods for assessing significance of the results that deal effectively with the large number of multiple comparisons being performed in these large-scale studies. The proposed multipoint approach to association mapping are motivated by the fact that, even with a genome-wide scan of 250,000 SNPs, many SNPs affecting phenotype will be untyped. The idea is to assess whether an untyped SNP affects phenotype by first using surrounding haplotypic variation to predict plausible genotypes at the untyped SNP, and then assessing association between the predicted genotypes and observed phenotypes. The methods for assessing significance will be based on controlling the "False Discovery Rate" (the proportion of positive findings that turn out to be incorrect). The methods will be applied to a genome-wide scan (250,000 SNPs in 1,000 individuals) and candidate gene studies aimed at identifying genetic variants and genes responsible for differential response to statin drugs, and to data from a candidate gene study aimed at identifying genetic variants affecting quantitative phenotypes associated with atherosclerosis, plaque inflammation, and thrombosis, all factors associated with cardio-vascular disease. Findings from these studies may aid understanding of the genetic factors affecting cardio-vascular disease, and its treatment. In addition, user friendly software implementing the statistical methods will be developed and distributed, allowing other researchers conducting similar studies world-wide to have access to these tools. These tools have the potential to improve the effectiveness and efficiency of studies aimed at determining the underlying genetic basis of common diseases, potentially leading to new treatment strategies for maintaining health and preventing disease. Public health relevance: This project will generate statistical tools for analyzing large-scale studies that aim to help understand the genetic basis of common diseases and drug response. These tools have the potential to improve the effectiveness and efficiency of such studies, potentially leading to new treatment strategies for maintaining health and preventing disease. [unreadable] [unreadable] [unreadable] [unreadable]
|
1 |
2008 — 2016 |
Stephens, Matthew |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genome Analysis: Data Accuracy, Haplotyping and Mapping
[unreadable] DESCRIPTION (provided by applicant): This project will develop a comprehensive array of new statistical methods for analyzing genome-wide association studies, and will apply these methods, and other appropriate methods, to perform in-depth analyses of NIH-funded association studies that attempt to unravel the genetic basis of common complex diseases. [unreadable] [unreadable] The overall objective is for the work to produce, and enable others to produce, discoveries and insights that aid the development of medical diagnostic tests, more effective therapies, and, ultimately, prevention of disease. Our focus will be primarily on developing new Bayesian statistical methods, which complement and improve on existing analysis approaches. The specific aims include the refinement of existing Bayesian statistical approaches to assessing correlation between genotype and quantitative phenotype to improve their robustness to deviations from underlying modeling assumptions; extension of these methods to allow analysis of binary (case/control) phenotypes, and family-based designs; and modification of these approaches to incorporate relevant biological prior information (e.g.~information on molecular pathways). The result will be a suite of tools, implemented in user-friendly software, for performing both single-marker and multi-marker analyses for many of the most commonly-used association study designs, including both quantitative and binary (case/control) phenotypes for population samples and parent-offspring trios. [unreadable] [unreadable] [unreadable]
|
1 |
2009 |
Stephens, Matthew |
P41Activity Code Description: Undocumented code - click on the grant title for more information. |
A Nested Mixture Model For Protein Identification Using Mass Spectrometry @ University of Washington
This subproject is one of many research subprojects utilizing the resources provided by a Center grant funded by NIH/NCRR. The subproject and investigator (PI) may have received primary funding from another NIH source, and thus could be represented in other CRISP entries. The institution listed is for the Center, which is not necessarily the institution for the investigator. Mass spectrometry provides a high-throughput way to identify proteins in biological samples. In a typical experiment, proteins in a sample are first broken into their constituent peptides. The resulting mixture of peptides is then subjected to mass spectrometry, which generates thousands of spectra, each characteristic of its generating peptide. Here we consider the problem of inferring, from these spectra, which proteins and peptides are present in the sample. We develop a statistical approach to the problem, based on a nested mixture model. In contrast to commonly-used two-stage approaches, this model provides a one-stage solution that simultaneously identifies which proteins are present, and which peptides are correctly identified. In this way our model incorporates the evidence feedback between proteins and their constituent peptides. Using simulated data and a yeast dataset, we compare and contrast our method with existing widely-used approaches. For peptide identification, our single-stage approach yields consistently more accurate results. For protein identification the methods have similar accuracy in most settings, although we exhibit some scenarios in which the existing methods perform poorly.
|
1 |
2009 — 2010 |
Stephens, Matthew |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genome Analysis: Data Accuracy Haplotyping and Mapping
DESCRIPTION (provided by applicant): This project will develop a comprehensive array of new statistical methods for analyzing genome-wide association studies, and will apply these methods, and other appropriate methods, to perform in-depth analyses of NIH-funded association studies that attempt to unravel the genetic basis of common complex diseases. The overall objective is for the work to produce, and enable others to produce, discoveries and insights that aid the development of medical diagnostic tests, more effective therapies, and, ultimately, prevention of disease. Our focus will be primarily on developing new Bayesian statistical methods, which complement and improve on existing analysis approaches. The specific aims include the refinement of existing Bayesian statistical approaches to assessing correlation between genotype and quantitative phenotype to improve their robustness to deviations from underlying modeling assumptions;extension of these methods to allow analysis of binary (case/control) phenotypes, and family-based designs;and modification of these approaches to incorporate relevant biological prior information (e.g.~information on molecular pathways). The result will be a suite of tools, implemented in user-friendly software, for performing both single-marker and multi-marker analyses for many of the most commonly-used association study designs, including both quantitative and binary (case/control) phenotypes for population samples and parent-offspring trios.
|
1 |
2013 — 2015 |
Stephens, Matthew |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Statistical Analysis of Gene Expression Quantitative Trait Loci (Eqtl)
DESCRIPTION (provided by applicant): In recent years, studies that associate genetic variation with gene expression (eQTL studies) have become a major tool for identifying regulatory genetic variation. However, the difficulty of securing primary tissue samples means that up to now these eQTL studies have been conducted in a limited range of cell and tissue types. Most notably the largest studies have been conducted in EBV-transformed lymphoblastoid cell lines, and it is unclear to what extent eQTLs identified in these cell lines wil be relevant to human disease mapping. The GTEx Project will provide data to remedy this situation, collecting RNA-seq and genotype data on 30 tissues in hundreds of individuals. However, current analytic tools are limited in their ability to fully exploit the richness of these data. In particular, available methods fall short in their ability to jointly analyze data on all tssues to maximize power, while simultaneously allowing for differences among eQTLs present in each tissue. Here we propose to develop novel statistical methods to help address these issues. We will apply these methods to identify eQTLs in the GTEx project data, integrate the GTEx data with other relevant data such as those available from the ENCODE project, and disseminate the results on the internet in a convenient form. We will also provide researchers with convenient tools to cross-reference results of the GTEx project with results of genome-wide association studies. The overall goal of the project is to build and apply an infrastructure for improved eQTL analyses, helping to maximize the utility and accessibility of GTEx data to the broad community of scientists who would like to use these data.
|
1 |
2019 — 2021 |
Stephens, Matthew |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Genome Analysis: Statistical Methods and Applications
Project Summary In recent years new data and technologies have transformed our understanding of transcriptional processes and how they are influenced by genetic variation. The GTEx project has measured both genetic variation and transcriptional variation in 50 tissues across hundreds of individuals, and identified hundreds of thousands of genetic variants that are associated with gene expression (eQTLs). And technological innovations have now made it possible to interrogate transcription, genome-wide, in single cells. The Human Cell Atlas (HCA) project is currently using such technologies to profile millions of cells, with the ambitious goal of providing a comprehensive atlas of the diverse cell types that make up human bodies. However, current analytic tools are limited in their ability to fully exploit the richness of these data. Current analysis tools for identifying eQTLs across 50 tissues perform well for identifying associations ? both tissue- specific effects and those that are broadly shared across tissues ? but are not yet designed for fine-mapping the underlying functional variants that explain these association signals. And methods for summarizing and characterizing transcriptional heterogeneity among single cells are not capable of capturing the complex layered character of this heterogeneity - for example, that cells might cluster into different groups depending on which genes or transcriptional processes are considered. Here we propose to develop novel statistical methods to address these issues. We will develop dimension reduction techniques for single cell analysis, aimed at capturing the complex patterns of heterogeneity that existing methods ignore. We will develop statistical tools for reliably assessing the genes and processes that show transcriptional differences among groups of cells. And we will develop and apply methods to fine-map the functional variants underlying many of the eQTLs in the GTEx project data, fully exploiting the information in the many tissues profiled, and disseminate the results on the internet in a convenient form. The overall goal of the project is to build and apply methods and software to help fully exploit the rich information in projects like GTEx and HCA, and make them available to the broad community of biological and medical scientists who can benefit from the results.
|
1 |