2012 — 2013 |
Zhang, Min |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Meeting: Interactions Between Omics and Statistics: Analyzing High Dimensional Data to Be Held At the 8th Intl Purdue Symposium On Statistics June 20-24, 2012, West Lafayette, In
To meet the challenges brought by the increasingly amount of high-throughput data generated in many fields, especially in the area of plant genome research, a one-day session entitled "Interactions Between Omics and Statistics: Analyzing High Dimensional Data" will be held as part of the 8th International Purdue Symposium on Statistics June 20 - 24, 2012. The theme of the overall symposium is "Diversity in the Statistical Sciences for the 21st Century" and the session will be organized as an interactive frontier for prestigious researchers within plant biology and statistics, to bridge the gap between plant biologists and statisticians, and provide new insight into addressing the issues associated with high-dimensional data analysis. The topics of presentations include, but are not limited to, an overall introduction of high dimensional omics data, the special statistical challenges of these data, and the newly developed statistical methods to analyze these high dimensional data. In addition, future directions will be discussed to further improve the analysis of such high dimensional omics data. A companion workshop featured in the symposium entitled "iPlant Data Store and iPlant Discovery Environment" will be organized and presented by the iPlant Collaborative. NSF funds will in part defray the costs of participation of graduate students and postdocs to attend the symposium, session and workshop as part of their training in the highly interdisciplinary field of statistical genomics.
|
0.961 |
2014 — 2017 |
Zhang, Min Mukherjee, Bhramar [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Set Based Tests For Genetic Association and Gene-Environment Interaction in Longitudinal Studies @ University of Michigan Ann Arbor
Most human diseases have a multifactorial etiology, characterizing complex interplay of multiple genes and environmental factors. The effects of these genetic and environmental factors on disease risk are likely to change dynamically over different life stages. Longitudinal studies of risk factors for common and chronic diseases like blood pressure, body mass index, provide a valuable opportunity to explore how genetic variants affect these traits over time. The ability to detect disease susceptibility genes can be improved if we jointly utilize the entire set of longitudinal outcomes. Moreover, since disease risk factors and phenotypes are likely influenced by the joint effect of multiple variants in a gene or in a genomic region, a joint analysis of these variants considering linkage disequilibrium and potential interactions among the variants may help to explain additional heritability. Integrating repeated measures of environmental exposure data into these genetic association models will help to identify specific sub-groups of individuals who may be more susceptible to environmental exposures. Identification of gene-environment interactions may have implications for targeted intervention and prevention. In this project, the investigators will try to utilize the temporally varying outcome-exposure profile available in a longitudinal genetic association study to enhance the power of statistical tests for genetic association and interaction. Using data from a muti-ethnic cohort, the project team will explore time-dependent genetic associations and gene-environment interactions with genes and pathways as the unit of analysis instead of a single marker at a given locus. This approach is biologically more meaningful as genes are the functional units, not the single nucleotide polymorphisms and by joint analysis of rare and common genetic variants in a region one may be closer to capturing functional variation. Several environmental factors measuring an individual's diet, physical activity, psychosocial behavior and perception of the neighborhood they live in will be considered in the planned analysis.
There are several technical challenges that will be addressed in the project. A primary goal of the study team will be to develop simple generalized score tests derived under a random field model involving multiple phenotypes, genes and environmental factors in a longitudinal study. The approach reduces the dimensionality of the inference problem by translating the association testing involving many predictors in terms of a reduced number of parameters and resultant tests with reduced degrees of freedom. The developed methods will use and extend classical spatial random field theory and recent results on multi-marker tests to characterize complex time-dependent associations and interactions. Several essential methodological improvements necessary for handling longitudinal data will be carried out to enhance the robustness to misspecification of within subject correlation structure and to improve computational efficiency. The methods will then be extended to a gene-environment set association test using longitudinal data. Several important dimension reduction techniques to handle correlated environmental exposure data are proposed. The project team also considers treatment of time varying exposure and time varying interaction effects under this set-based framework. There are no multi-marker based tests presently available in the literature that use the richness of longitudinal outcome and exposure data and the current project is expected to fill that gap. To summarize, the project introduces a novel genetic random field framework to formulate this class of multivariable association problems involving disease outcomes, gene, environment, and time to lead to powerful statistical inference.
|
0.951 |
2015 — 2017 |
Zhang, Min |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Big Data Training For Translational Omics Research
? DESCRIPTION (provided by applicant): The explosion of biomedical big data (e.g. imaging, clinical records, and omic analyzes) that captures multiple levels of complexity has the potential to dramatically accelerate the translation of knowledge from bench to bedside. However, the effective use of these data requires skills in computer science, statistics, and bioinformatics, as well as detailed knowledge of biology and medicine to aid in the interpretation of the data analysis. Unfortunately, biomedical researchers are not trained in the computational and statistical methods needed to handle high-density biomedical big data. As a result, many biomedical scientists are frustrated by their inability to: (a) analyze big data, (b) utilize the valuable public resources containing big data, and (c) effectively communicate with computer scientists, statisticians and bioinformaticians. These barriers have significantly hampered the translational application of the large body of big data that has accumulated thus far. In order to overcome these challenges, this team proposes to create a summer training course that is built upon case studies and that is specifically designed for biomedical researchers who are novices in big data analysis. The investigators identified the need for this course in a survey of administrators and researchers at Midwest and Big Ten universities. This course will raise knowledge of the potential uses of biomedical big data and will develop skills for locating, accessing, managing, visualizing, analyzing, and integrating various types of big data that are publicly available. The proposed big data training program has three goals: (1) introduce the fundamental concepts of big data in biomedical research to raise awareness of the value of this research approach, (2) provide face-to-face instruction that develops the technical competency needed for big data science, and (3) develop educational and data analysis resources using the HUBzero platform to aid our face-to-face instruction and provide post-instruction opportunities for reinforcing and expanding technical skills. The course will exploit available big data resources and tools so that biologists can productively explore big data within a short time. The educational program will target graduate students, postdoctoral trainees, physician-scientists and biomedical scientists, with strong biomedical backgrounds but who have limited advanced coursework in statistics, bioinformatics, and computer science. This course will be centered at Purdue University, a large public university with recognized strengths in statistics and computer science, with a goal to serve scientists in the Midwest area. Also, the HUBzero platform, a unique technology developed at Purdue, will be used to house computational tools and deliver the educational program, and to lower the technical barriers that challenge participants. This approach will complement the classical curricula in biomedical training programs and serve as a foundation for more advanced training. The proposed course is directly responsive to RFA-HG-14-008 because it will enable biomedical researchers to more confidently explore existing biomedical big data, implement their own data collection and analysis plans, and communicate within research teams.
|
0.961 |
2016 |
Zhang, Min |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Administrative Supplement to: Big Data Training For Translational Omics Research
? DESCRIPTION (provided by applicant): The explosion of biomedical big data (e.g. imaging, clinical records, and omic analyzes) that captures multiple levels of complexity has the potential to dramatically accelerate the translation of knowledge from bench to bedside. However, the effective use of these data requires skills in computer science, statistics, and bioinformatics, as well as detailed knowledge of biology and medicine to aid in the interpretation of the data analysis. Unfortunately, biomedical researchers are not trained in the computational and statistical methods needed to handle high-density biomedical big data. As a result, many biomedical scientists are frustrated by their inability to: (a) analyze big data, (b) utilize the valuable public resources containing big data, and (c) effectively communicate with computer scientists, statisticians and bioinformaticians. These barriers have significantly hampered the translational application of the large body of big data that has accumulated thus far. In order to overcome these challenges, this team proposes to create a summer training course that is built upon case studies and that is specifically designed for biomedical researchers who are novices in big data analysis. The investigators identified the need for this course in a survey of administrators and researchers at Midwest and Big Ten universities. This course will raise knowledge of the potential uses of biomedical big data and will develop skills for locating, accessing, managing, visualizing, analyzing, and integrating various types of big data that are publicly available. The proposed big data training program has three goals: (1) introduce the fundamental concepts of big data in biomedical research to raise awareness of the value of this research approach, (2) provide face-to-face instruction that develops the technical competency needed for big data science, and (3) develop educational and data analysis resources using the HUBzero platform to aid our face-to-face instruction and provide post-instruction opportunities for reinforcing and expanding technical skills. The course will exploit available big data resources and tools so that biologists can productively explore big data within a short time. The educational program will target graduate students, postdoctoral trainees, physician-scientists and biomedical scientists, with strong biomedical backgrounds but who have limited advanced coursework in statistics, bioinformatics, and computer science. This course will be centered at Purdue University, a large public university with recognized strengths in statistics and computer science, with a goal to serve scientists in the Midwest area. Also, the HUBzero platform, a unique technology developed at Purdue, will be used to house computational tools and deliver the educational program, and to lower the technical barriers that challenge participants. This approach will complement the classical curricula in biomedical training programs and serve as a foundation for more advanced training. The proposed course is directly responsive to RFA-HG-14-008 because it will enable biomedical researchers to more confidently explore existing biomedical big data, implement their own data collection and analysis plans, and communicate within research teams.
|
0.961 |
2016 |
Zhang, Min |
R03Activity Code Description: To provide research support specifically limited in time and amount for studies in categorical program areas. Small grants provide flexibility for initiating studies which are generally for preliminary short-term projects and are non-renewable. |
New Statistical Methods to Model Metabolite Profiles For Disease Detection
PROJECT SUMMARY Increasingly available metabolomics data enable greater understanding of metabolite changes in response to physiological or disease processes. Recent developments have proven metabolomics to be a valuable technology for significantly advancing medical research by accelerating the translation of knowledge from bench to bedside. However, the effective use of these data requires expertise from both metabolomics and statistics due to a series of data pre-processing steps prior to statistical analysis, such as data conversion, data scaling, data normalization, peak alignment and metabolites annotation, among many others. Despite the promise of metabolomics in the clinic, there are well documented challenges that limit the full potential of metabolomics, such as identification of metabolite biomarkers, validation of metabolite biomarkers, and metabolites-based disease predication or progression. These barriers have significantly hampered the application of metabolomics to clinical and translational research. To overcome these challenges, our team proposes to develop a series of multivariate statistical methods that are specifically designed for metabolomics data analysis. More specifically, instead of investigating one metabolite at a time, a group of biologically related metabolites will be modeled simultaneously. Meanwhile, other clinical covariates (such as gender, age, BMI, etc.) will be evaluated for their effects on the metabolites. The proposed project has three main goals: (1) introduce the new idea of using a group of metabolites as potential biomarkers for diseases. By incorporating the biological knowledge in grouping correlated metabolites, we propose to employ the seemingly unrelated regression model to investigate the relationship between a group of metabolites and disease status while adjusting the effects of other clinical covariates. (2) Construct metabolic networks to better understand their systematic perturbations accompanied by human diseases, where the network can serve as more robust biomarkers for disease diagnostics, and (3) advocate the disease prediction by the combination of metabolite profiles, clinical covariates, as well as their interactions. A direct modeling approach, generalized orthogonal components regression, is proposed to handle the large number of metabolites compared to the relatively small number of individuals. The utility of the methods will be evaluated extensively by simulation studies, and real data collected from different diseases including publically available as well as in-house data from our ongoing cancer care engineering project. With all these data, the methods will be compared to the most popular method, partial least squares discriminant analysis. The proposed statistical methods will be made freely available to the research community through GitHub, cceHUB, Metabolomics Consortium Data Repository, and the Metabolomics Workbench. The project is directly responsive to RFA-RM-15-021 because it will foster close collaboration between metabolomics experts and biostatisticians, produce efficient and reliable statistical methods that can be used to maximize the value of existing metabolomics resources, and enable the promise of metabolomics in early diagnosis of common complex diseases.
|
0.961 |
2019 — 2021 |
Zhang, Min |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Big Data Training For Cancer Research
PROJECT SUMMARY The increasing volume of big data in cancer research has the potential to dramatically accelerate the translation of knowledge from bench to bedside. Unfortunately, most cancer researchers are unable to: (i) utilize the valuable big data that is readily available in the public domain, and (ii) extract knowledge from cancer big data through communicating with computer scientists, statisticians and bioinformaticians. Traditionally, cancer researchers are trained in the biologically related sciences that are relevant to the manifestation of the disease. This knowledge is, and remains, critical for understanding the biological and molecular mechanisms that result in the disease and that can be targeted for clinical intervention. However, historically, cancer researchers have not been trained to handle large volumes of data. There was no need; there were not many approaches that were generating large scale data. Yet, with the advent of high-throughput approaches, in particular those related to genomics, proteomics and metabolomics, a significant gap in the training of cancer researchers has become apparent ? the need for skills in computer science and statistics to analyze big data and interpret results from the analyses. In the absence of quantitative training for cancer researchers, a bottleneck will remain in the translation of the large body of cancer big data to clinical practice. This need was confirmed in a needs assessment of researchers from 95 Cancer Centers sent out last year (including all 69 NCI-Designated Cancer Centers). To address the need for a big data training course, the investigators propose to build on a previously NIH-funded big data training course, to develop and deliver a new training course tailored to cancer researchers across the country. In a partnership between the Purdue University Center for Cancer Research (PCCR), the Indiana University Simon Cancer Center (IUSCC), and a group of traditionally trained biostatisticians, the team is in a unique position to leverage basic and clinical cancer centers (the only two NCI-Designated Cancer Centers in the State), to work together on this multi-disciplinary training program. In contrast to the previous successful big data training course designed for general biomedical researchers who were novices in big data science, this new course will target cancer researchers with the knowledge of big data value but lacking the quantitative skills necessary to work with it. Based on case studies from both PCCR and IUSCC researchers, the goal of the course is to help participants develop skills for managing, visualizing, analyzing, and integrating various types of cancer big data that are publicly available. This is increasingly important as more and more precision oncology- focused treatments are coming on line. With this customized big data training, cancer researchers can realize the transformative potential of big data by translating it from bench to bedside.
|
0.961 |
2020 — 2021 |
Raftery, Daniel Zhang, Dabao (co-PI) [⬀] Zhang, Min |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Modeling Homeostasis of Human Blood Metabolites @ University of Washington
PROJECT SUMMARY Metabolite levels in human blood are regulated by a relatively strict system of homeostatic control. Previous investigations of homeostasis have taken a number of approaches, and models of glucose and a few other metabolites have been developed, typically focused on a single organ. However, while potentially extremely useful, an accurate and quantitative model of blood metabolite levels under homeostasis does not currently exist. It is well known that numerous demographic and clinical factors such as gender, age, BMI, smoking, etc., as well as pre-analytical factors and many diseases, significantly affect the levels of blood metabolites. Numerous studies in the field of metabolomics have attempted to account for the effects of many such factors. However, efforts to quantify these effects and validate them across different studies have so far been challenging, and resulted in consistent failures to validate discovered putative biomarkers. The challenges to integrate metabolite profiles with clinical and demographic factors are complicated by the high dimensionality of the data and the numerous correlations among the metabolites. Traditional statistical methods are incapable of accounting for these factors, and hence, investigations suffer from a high false discovery rate (FDR). To overcome these challenges, we propose to develop quantitative statistical models of blood metabolite levels in healthy adults, and thereby produce a predictive model of homeostasis. Our preliminary work indicates that we can predict metabolite levels with much reduced variance using the reproducibly measured levels of a large pool of blood metabolites and clinical and demographic variables. We propose to develop sophisticated models of homeostasis based on advanced statistical methods and evaluate their predictive performance across different sample sets and metabolite classes. The proposed project has four main Aims: (1) Obtain broad-based metabolomics data on blood samples collected from geographically distinct sites to explore the effects of a range of confounding effects on metabolite levels. (2) Model individual or biologically related groups of metabolite levels using multivariate statistical approaches to determine the contribution of clinical/demographic and pre-analytical variables and their predictability across collection site. (3) Investigate the interactions between metabolites and clinical/demographic variables using machine learning approaches to identify stable metabolites and key interactions. (4) Provide the community with user-friendly software packages for the prediction of blood metabolite levels under homeostasis. An overall model of the metabolite concentrations in blood will be highly useful for a number of applications that include a better understanding of systems biology at the whole organism level, and ultimately improved risk prediction, disease diagnosis, treatment monitoring and outcomes analysis.
|
0.955 |