2001 — 2006 |
Portnoy, Stephen (co-PI) [⬀] He, Xuming Martinsek, Adam |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Developments On Quantile Regression @ University of Illinois At Urbana-Champaign
Francis Galton, the progenitor of modern regression, chided those of his statistical colleagues who "limit their inquiries to Averages and do not revel in more comprehensive views". Arguing that any complete analysis of the full variety of experience requires the entire distribution of a trait, not just a measure of its central tendency, he introduced the empirical quantile function as a convenient graphical device for this purpose. Unfortunately, the very success of least squares methods throughout applied statistics has obscured the need for a more complete analysis of the statistical relationship among variables. The least squares regression limits its inquiries to the conditional mean function and thus can fail to find when structural relationships in the data may depend on the size of the response. For example, patients with long survival times may respond to treatment differently from those with average survival times; or persons with long periods of unemployment may respond to training differently from those with shorter unemployment periods. Such differences could not be seen in standard analyses that model only the mean response. The investigators propose to extend conditional quantile functions to more complex situations, specifically to parametric and semiparametric regression quantiles for correlated or censored response variables (which are common in both examples mentioned above). The computation of the conditional quantile functions is facilitated by modern linear programming algorithms, and appropriate statistical inference can be developed through traditional large sample theory or Markov Chain Marginal Bootstrap being developed by the PI and his colleagues.
Conditional quantile functions help data analysts understand general heterogeneity in the population. They are often of direct interest in applications ranging from biomedical research, economic and business analyses to infrastructure studies. The proposed research is to establish a firm statistical theory for regression quantiles and provide a complete toolkit for their applications in complex problems with correlated and/or censored data.
|
1 |
2006 — 2010 |
He, Xuming Liang, Feng |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A Virtual Center to Promote Collaboration Between Us- and China-Based Researchers in Statistical Science @ University of Illinois At Urbana-Champaign
In this project, the PIs set up a virtual center for promoting and supporting exchanges and collaborations in statistical science between US-based and China-based researchers and educators. The center will be guided by a scientific committee. The goal is to enable research collaborations in the areas where the two countries have complementary strengths. The center will sponsor or co-sponsor Ambassador Lectures in China given by prominent researchers from the US on cutting-edge topics in statistics, invite prominent Chinese scholars to give theme lectures in the US and to collaborate with US researchers, co-sponsor workshops in China on frontiers of statistics and assist US participants to attend those workshops, and support collaborations between researchers from the two countries, especially junior researchers.
The virtual center recognizes the importance of enabling US researchers and educators to advance their work through international collaboration. The activities supported by this proposal also help ensure that future generations of US researchers in statistical science have substantial interaction with their counterparts in China, a country that is accelerating its research and education at a rapid speed.
|
1 |
2006 — 2010 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Inferential Methods For Quantile Regression @ University of Illinois At Urbana-Champaign
While least squares regression targets the conditional mean function in a regression model, quantile regression provides more complete information on the conditional distribution of the response variable. It is especially valuable when there is heteroscedasticity or general heterogeneity in the population. To facilitate quantile regression modelling in a wider areas of applications, this proposal aims to develop inferential procedures for quantile regression models to account for the presence of random-effects or random censoring in the observations. Although random-effects and censoring have been well studied under linear models equipped with parametric, and often Gaussian, likelihoods, the conventional inference procedures do not have straightforward extensions to the quantile regression model when standard minimal assumptions are made on the conditional distributions. The principle investigator aims to make focused attempts in developing new ideas and tools to make possible appropriate inference in quantile regression models with random-effects or with censoring. The proposed research will build upon the recent developments in quantile regression modelling and incorporate some innovative ideas to develop appropriate inferential methods that are mathematically justified, mainly through large sample theory, and statistically meaningful at realistic sample sizes.
Currently available methods for statistical inference in quantile regression models are not well-developed to handle random-effects or random censoring. For example, the analysis of GeneChip data in genomics would result in inflated false discovery rates without taking the array effect as random. The proposed research will develop new methods that preserve statistical confidence in a wider range of quantile regression based applications. The PI will pursue collaboration with other scientists to ensure that the methodologies under development are valuable to researchers in the biological sciences, health sciences, engineering, economics, and finance. The proposed activities will also involve training of graduate students for future researchers in statistics as well as providing selected undergraduate students with research experience.
|
1 |
2007 — 2011 |
He, Xuming Wuebbles, Donald [⬀] Liang, Xin-Zhong Shao, Xiaofeng (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cmg Collaborative Research: Statistical Evaluation of Model-Based Uncertainties Leading to Improved Climate Change Projections At Regional to Local Scales @ University of Illinois At Urbana-Champaign
This research project brings together an interdisciplinary team of atmospheric scientists and statisticians to attack an outstanding issue in the field of climate change research: namely, how to obtain statistically robust projections of future climate change at regional to local scales. It is well known that global change is modified by local and regional features in ways that even regional models are challenged to capture, producing unique patterns in each individual region. Quantifying these patterns of change is essential to identifying appropriate adaptation and mitigation strategies to cope with the likely impacts of climate change on both human and natural systems. Driven by both the persistent limitations in present-day modeling capacity, as well as the potential global-scale impacts of climate change, the investigators propose to develop a set of scientifically- and statistically-advanced techniques to reduce the uncertainties inherent in use of global and regional climate model output fields to generate local-scale climate projections. Utilizing available observations, reanalysis data, and historical global and regional climate model simulations, the investigators will first develop a set of statistical techniques that will reduce the dimensionality of both global and regional model differences relative to observations. Statistical techniques to quantify model-observational differences and capture the range of future climate projections will include proven methods for spatial interpolation of observations, as well as new spectral and wavelet analyses, and development of an advanced quantile regression approach with Bayesian empirical likelihoods. Building on the investigators? previous research analyzing the ability of both global and regional climate models to simulate key atmospheric dynamical features, we will then assess the physical features of the models that are likely contributing to these differences. Both physical and statistical characterizations of model limitations will then be applied reduce uncertainty in a range of IPCC AR4 global model simulations of future climate change, based on multiple realizations of future emissions scenarios and available regional climate model simulations. The final project goal is to synthesize the above methods into a generalized framework that combines physical and statistical analyses to assess historical global and regional model performance, and then use these characterizations of model performance to reduce the uncertainty in future projections of key surface climate variables at regional to local scales.
The work proposed addresses an on-going and crucial need in climate change research to characterize and account for model limitations in order to reduce uncertainties at the regional to local scale where the societal, economic, and environmental impacts of climate change occur. This project is unique from both a scientific and statistical perspective, combining a well-established research program on global and regional climate model analysis with innovative statistical approaches. Advanced statistical methods will be used to merge all available information including observations, data assimilations, global and regional climate model simulations, and other depictions of the internal variability of the climate system to characterize model differences relative to observations, and to produce improved high-resolution projections of future changes in surface climate. This project will involve the extensive use of high-performance computing capabilities The capabilities that will be developed are designed to reduce uncertainties in the likely range of future climate change, enabling more effective analyses of the potential impacts of climate change at regional to local scales. At the same time, the project will challenge the state-of-the-art in terms of the techniques and statistical tools developed, and their application to the field of regional climate projections. The proposed collaborative research will also provide interdisciplinary training to students and postdoctoral fellows at several institutions, with the cross-disciplinary fertilization of ideas fostered through the close interactions on this project providing invaluable insights into both the research and the educational processes.
|
1 |
2008 — 2011 |
He, Xuming Hu, Jianhua |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Low-Rank Approximation to Probe-Level Data With Application to Exon Tiling Arrays @ University of Texas Md Anderson Can Ctr
[unreadable] DESCRIPTION (provided by applicant): Findings from the Human Genome Project highlight the intricacy of interactions between cell regulation, genes and proteins. It is generally understood that biological functions and biological activities are controlled by subsets of genes interacting with proteins in a highly controlled manner. High throughput technologies such as microarrays are valuable for studying a large number of biological components simultaneously, but sound conclusions from these technologies depend on appropriate statistical analyses of the genomic/proteomic data. The long-term objective of this proposal is to develop appropriate statistical tools to explore gene/protein interactions and to discover how these interactions function in biological activities (e.g. induction of disease phenotype). This proposal concerns the analysis of short oligonucleotide data, as in GeneChip studies and exon tiling arrays. Low-rank approximations to the expression data matrices play a central role in the proposed research. The specific aims are: (1) to develop a fast and robust low-rank algorithm to perform low-rank approximation to a data matrix that is subject to outliers; (2) to develop diagnostic tools and statistical tests for determining whether a low-rank representation is adequate to capture gene expression profiles; (3) to develop both nonparametric and likelihood-based approaches for flagging and detecting alternative splicing with exon tiling arrays. Singular value decomposition is a starting point for the proposed work towards those specific aims. Alternating robust (outlier resistant) regression methods will be used for Aims (1) and (3). Likelihood- based and data adaptive methods will be developed for Aims (2) and (3). The proposed research distinguishes itself from most of the existing statistical work on microarray data, as it focuses on probe-level rather than gene-level data. The investigators believe that the standard uni-dimensional summary of gene expression data could lead to loss of important information. PUBLIC HEALTH RELEVANCE: Successful completion of the proposed research will lead to efficient and effective statistical tools for analyzing microarray data that have wide-ranging applications in biomedical and public health research, as evidenced by the recent discovery of target genes for cervical cancer and prostate cancer. Those tools are needed to support better applications of microarray technology in clinical and biomedical research. [unreadable] [unreadable] [unreadable]
|
0.931 |
2009 — 2010 |
He, Xuming Hu, Jianhua |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Nonparametric Analysis of Reverse-Phase Protein Lysate Array Data @ University of Tx Md Anderson Can Ctr
DESCRIPTION (provided by applicant): Proteins play major roles as biological effectors and diagnostic markers. One level of its complexity is due to the post-translational modifications that cannot be detected at the genome level, which makes it desirable to measure proteins directly. Recently, some new protein microarray technologies have begun to bloom for this purpose. We focus on the reverse-phase protein lysate arrays that allow us to quantify the relative expression levels of a protein in many different cellular samples simultaneously. One advantage of this technology is that it requires a small amount of cells with just one antibody binding. However, it is more challenging to analyze protein lysate arrays than DNA arrays, and at the present time, the applications of protein lysate arrays are still in the exploratory stage with a lack of reliable statistical tools for quantifying the information (including the uncertainty) from protein arrays. We find that it is difficult, if at all possible, to model all the samples with a simple parametric family of response curves. We propose a robust approach to quantify the protein lysate arrays by fitting a monotone nonparametric response curve to all samples on the same array. The proposed method has been shown to fit the data more adaptively, avoiding bias due to parameterization. We aim to incorporate the modern shrinkage ideas in statistics into the nonparametric approach, leading to more stable quantification in time course experiments where the number of replicates is small at each time point. We also propose to use wild-bootstrap for assessing uncertainty of the protein concentration estimates and for assessing the influence of such uncertainties in follow-up analyses. When completed, our research will enable more reliable analysis of protein lysate arrays, and provide feedback to chip makers to improve the design of the protein microarrays, both of which are essential in making lysate arrays a useful tool in biological and medical research. PUBLIC HEALTH RELEVANCE: Successful completion of the proposed research will lead to efficient and effective statistical and computing tools for analyzing protein lysate array data that have wide-ranging applications in biomedical and public health research, as evidenced by the recent discovery of target protein in signal pathway profiling related to prostate cancer. These tools are needed to support better applications of protein lysate array technology in clinical and biomedical research.
|
0.93 |
2010 — 2014 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Efficient Modeling in Quantile Regression @ University of Michigan Ann Arbor
Quantile regression has in recent years emerged successfully as a powerful supplement to the more conventional least squares regression. By modeling the conditional quantile functions, the researchers are often able to gain a much more comprehensive picture of how a response variable is associated with its covariates. The prevailing approach in quantile regression is to perform analysis of the conditional quantile functions one percentile level at a time. This approach offers great modeling flexibility at the cost of statistical efficiency. The Principle Investigator proposes to develop and study new approaches to efficient modeling of conditional quantile functions. By "borrowing strength" across neighboring quantiles and utilizing a Bayesian empirical likelihood approach, the investigator aims to advance the theory, methodology, and applications of efficient quantile regression. Efficiency gain is an important consideration of any statistical research, and the proposed modeling techniques are especially helpful in the analysis of quantiles in the data-sparse areas. The Bayesian empirical likelihood approach for quantile regression can be used in conjunction with optimal weighting for semiparametric efficiency, and with Markov chain Monte Carlo sampling for effective computation in a high dimensional parameter space.The proposed models, to be called semi-local quantile models, strike to balance bias and variance; when the models do not hold exactly, the proposed estimators follow the spirit of regularization.
Inference in data-sparse areas, including but not restricted to the analysis of high tails, is highly valuable in a wide range of scientific and social studies. The proposed research is motivated by the investigator's interdisciplinary research in climate studies and public health, and will provide researchers in statistics and other fields novel tools for better understanding and quantifying relationships between measurements. The proposed activities include new opportunities for graduate students to participate in transformative research, and will enable the investigator to continue integration of research with teaching and mentoring. The investigator pursues active academic exchanges through lecturers and collaborations, and free distribution of software, for broad dissemination of the research results.
|
1 |
2013 — 2017 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
New Directions in Quantile-Based Modeling and Analysis @ University of Michigan Ann Arbor
Quantile as a data descriptive and analytic tool has earned its place in statistics for over a hundred years. In recent years, research on quantile modeling to incorporate the effect of covariates and to handle multivariate data has accelerated in response to the needs arising from a broad area of applications. The investigator addresses an important but often neglected question on the validity of posterior inference on quantile regression for the pseudo-Bayesian methods that have become popular in the literature. The investigator conducts a careful investigation into how the choice of a working likelihood and the choice of a prior play their respective roles, both in finite-sample problems, and in the asymptotic theory. The investigator studies a new class of shrinking priors as an asymptotic framework to understand the efficiency gains of the Bayesian methods for estimation and prediction of quantiles in data sparse areas and in problems involving high dimensional covariates. The proposed research will deepen our understanding of the validity of pseudo-posterior inference and suggest asymptotically valid and efficient inferential methods for quantile regression at single or multiple quantile levels. The research will also facilitate a new pseudo-Bayesian framework for model selection beyond quantile regression. Furthermore, the investigator studies a new notion of quantile for multivariate data.
The proposed activities will stimulate novel ideas and critical thinking in the areas of quantile modeling and Bayesian inference. The new insights and the new tools to be developed will be useful for estimation, prediction, and hypothesis testing regarding rare events in climate research, public health, and other scientific endeavors. The notion of multivariate quantiles will lead to an efficient statistical downscaling method for better climate projections at localized scales. The proposed activities will engage graduate students directly as part of their academic training. The investigator will work with other researchers and scientists to ensure that the research results are disseminated appropriately to the broad scientific community.
|
0.943 |
2016 — 2019 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
New Algorithms For Consistent Model Selection Beyond Linear Models @ University of Michigan Ann Arbor
Statistical model building is an important part of scientific discovery. In the big data era, high dimensional data arise frequently. Model selection in the presence of high dimensional features in the framework of linear models, generalized linear models, and models with censored data has been a very active area of research in recent years. The PI aims to develop new algorithms for model selection, within a Bayesian computational framework, that are scalable for high dimensional problems. The PI motivates the proposed research through collaborations with scientists in atmospheric sciences, genetics, and kinesiology, and aims to develop methodologies that are broadly applicable in statistical modeling and data analysis. Much of the recent work has focused on shrinkage through penalization or regularization. Bayesian computational methods, when interpreted broadly, play a valuable role in statistics, including model selection and estimation, but face important hurdles in high dimensional statistics, both in theoretical intricacy and in computational scalability. The PI aims to develop a theoretical framework to demonstrate model selection consistency from the frequentist perspective, which offers interesting insights into why Bayesian model selection methods can provide an asymptotic approximation to the L0 penalty. An important part of the proposed work is the development of a modified Gibbs sampler in the selection of sparse models that is much more scalable than standard MCMC algorithms in the presence of high dimensional variables. The Bayesian methods are especially useful in problems with non-convex objective functions, where Bayesian computation methods can be more robust in performance than direct optimization. A primary application of such a problem considered in the project is quantile regression for censored data. In addition to model selection, the PI proposes a new estimation method for censored quantile regression that promises to be computationally and statistically efficient. Equally importantly, the new method adapts easily to general forms of censoring that other estimation methods have found difficult to handle. The PI will continue integrating research with education by working with PhD students and by providing research experiences for undergraduate students. The research output will be properly disseminated through conferences and workshops and through publication in widely read journals in statistical science.
|
0.943 |
2018 — 2019 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistics At a Crossroads: Challenges and Opportunities in the Data Science Era @ University of Michigan Ann Arbor
The workshop "Statistics at a Crossroads: Challenges and Opportunities in the Data Science Era" is scheduled to take place at the Crystal Gateway Marriot, Arlington, Virginia, October 15-17, 2018. A previous workshop was held at the National Science Foundation in 2002 to discuss the future challenges and opportunities for the statistics community. That was a time when the statistics community saw rapid changes and sustained growth from the emergence of more and larger-scale data. Since then, the growth of the field, including the size of the undergraduate and graduate programs in statistics and the breadth of interactions between statistics and other fields, has accelerated. In the meantime, both the public and the private sectors have embraced big data, as more and more people recognize that big data can provide insights into the nature of biological processes, precision medicine, climate change, social and economic behavior, risk assessment and decision making. The statistics community recognizes that we are at a crossroads with an unprecedented opportunity to modernize the discipline to become the major player in data science, but also with a non-ignorable risk to make ourselves obsolete in the broad community of data science. The proposed workshop asks a critical question, where do we go from here? The workshop seeks to invite leading researchers and educators in statistics and data science to address the question and to produce a report narrating the challenges to, and opportunities for, statistics stakeholders (e.g., individual researchers, academic departments, and funding agencies).
The steering committee of the workshop has identified six themes for in-depth discussion. They are (1) Foundations of statistics and data science; (2) Statistics and computation; (3) Emerging applications; (4) Data challenges; (5) Inference in the age of big data; and (6) Statistics education in the new era. These themes were identified to cover a wide range of issues and research topics that are timely and forward-looking for the purpose of discussion. Pre- and post-workshop discussions via online forums are planned to seek broader community input. The workshop report will be made available to the National Science Foundation and to the scientific community at large. The recommendations made at the workshop will provide guidance to young researchers and students who pursue careers in quantitative fields and beyond. The workshop will also promote participation of women and other under-represented groups in the research community. More information can be found at the workshop website https://hub.ki/groups/statscrossroad/overview.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.943 |
2019 — 2022 |
He, Xuming |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Towards Efficient Bias Correction in Data Snooping @ University of Michigan Ann Arbor
The choice of a statistical model is often a critical part of data analysis because a useful model helps researchers extract relevant information from noisy data to reach interpretable findings. While scientific or economic theories do help formulate models in some applications, most data analysts have to rely on empirical models. Using the same data to select a model and then to perform model-based statistical inference is commonly known as data snooping. Unfortunately, data snooping is intrinsically risky without a careful analysis of the potential bias resulting from such practices. The primary goal of this project is to study how to understand and correct bias from data snooping and develop sound statistical inference methods. The research will provide valuable tools for scientists, researchers, and policy makers who rely on data-driven models for uncertainty assessment and confirmatory data analysis.
This project focuses on regression-adjusted inference on treatment effects and inference on the best selected subgroup. The proposed work is motivated by the pressing need for more fundamental research related to the handling of "post-selection bias" in statistical analysis. The repeated data-splitting method for de-biased inference on a structural parameter (for example, the average treatment effect) enables efficient bias removal in addressing an intrinsic scientific question. The proposed inference on the best selected subgroup provides a bias-correction to a natural estimate of the subgroup effect size, and therefore reduces the risk of data-snooping and false discoveries in subgroup analysis. In the big data era, data-driven models and subgroup analyses are often used to take advantage of anticipated sparsity in the data structure or to explore data heterogeneity. The proposed research aims to provide insights, theory, and tools for more informed decision making in such endeavors. The project will involve collaborations with researchers investigating the risk of concussion as well as scientists in the biotechnology industry who routinely rely on subgroup analysis. Graduate and undergraduate students will be engaged in the proposed research.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.943 |