2007 — 2010 |
Zou, Hui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistical Modeling With High-Dimensional Data: Variable Selection and Regularization @ University of Minnesota-Twin Cities
With high-dimensional data parsimonious models are preferred because they are much more interpretable and at the same time reduce prediction errors. Regularization is also an essential component in most modern developments for data analysis, in particular when the number of predictors is large. Non-regularized fitting is guaranteed to give badly over-fitted and useless models. The investigators take a regularization approach to the variable selection problem in high-dimensional statistical modeling such that the resulting model enjoys excellent prediction accuracy and at the same time has a sparse representation. In particular, the investigators develop: (1) new fused variable selection methods in proteomics data analysis which has been a revolutionary cancer diagnostic tool; (2) a novel kernel logistic regression model which automatically adopts a support-vector representation; (3) several new techniques for performing simultaneous variable selection in estimating multiple quantile regression functions. The investigators also study the theory of these new variable selection techniques. Efficient algorithms and software are developed for public use.
Modern scientific innovations allow scientists to collect massive and high-dimensional data. It is critical in scientific investigations to extract useful information from the huge amount of data. For this reason, variable selection and dimension reduction play a fundamental role in high-dimensional statistical modeling. Variable selection problems arise from a wide range of fields, machine learning, drug discovery, biomarker finding, genetics, proteomics, brain imaging analysis, financial modeling, environmental sciences, to name a few. The research project aims to develop state-of-the-art statistical tools that help researchers in various fields to analyze their data.
|
0.954 |
2009 — 2015 |
Zou, Hui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: New Statistical Methodology and Theory For Mining High-Dimensional Data @ University of Minnesota-Twin Cities
The area of high-dimensional modeling is developing rapidly. This research aims to push these developments forward to meet new challenges arising in different fields. In particular, the investigator studies (a). new statistical methodology and theory for mapping high-dimensional datasets onto a space with much-lower dimensions while assuring minimum distortion; (b). efficient and robust variable selection in semiparametic models; (c). a novel regularization approach to nonparametric model selection and estimation.
Modern computing power and scientific innovations allow scientists to easily collect high-dimensional data in various disciplines. Analysis of high-dimensional data poses many challenges and offers great opportunities to statisticians. The availability of high-dimensional data has reshaped statistical modeling. This proposal focuses on new statistical methodology and theory for knowledge discovery and information retrieval from high-dimensional data. The investigator plans to develop User-friendly computer programs for public use. The research will make significant contributions to areas outside statistics as well, including biology, computer science, biomedical engineering, medical informatics, economics, and so on. The integrated educational program includes substantial initiatives that will involve undergraduate and graduate students and expose them to state-of-the-art research in the topics related to the proposal. These include new courses, workshops and mentoring. The research results will be integrated into K-12 education and be applied to industrial research.
|
0.954 |
2015 — 2018 |
Zou, Hui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: New Statistical Methods and Theory For High-Dimensional Data @ University of Minnesota-Twin Cities
High-dimensional data have become ubiquitous in this big-data era. In recent years, many statistical methods and theory have been developed for analyzing high-dimensional data with successful applications in practice. There are still many challenges and open problems to be addressed. Their solutions call for innovative ideas. The proposed research projects are motivated by real applications where the current state-of-the-art high-dimensional data analytic methods fail to deliver good solutions. The research results will be directly applicable in various fields such as genomics, medical imaging, public health, social networks, E-commerce, and among others. For example, methods developed in this proposal will enable us to better understand how a social network evolves and how brain functions change with age. The research results will be disseminated through journal publications, conference presentations and seminar talks. This proposal has an education program that contributes to the education and training of the next-generation statisticians.
In this project novel statistical methods and theory are proposed to study three important topics of large-scale statistical inference: (a) dynamic graphical models and latent graphical models, (b) high-dimensional regression with noisy and corrupted data, and (c) profile matrix inference in structural pursuit. The investigators will develop innovative techniques to handle the methodological, computational and theoretical challenges. The research results will not only provide new powerful data analytic tools for solving open problems in (a), (b) and (c), but also shed light on general principles for statistical learning from complex high-dimensional data. In order to make the research outcomes readily available to other researchers and practitioners, the investigators will implement the methodology developed in this proposal into software packages that will be publicly distributed.
|
0.954 |
2019 — 2022 |
Zou, Hui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Flexible Statistical Modelling For High Dimensional Data @ University of Minnesota-Twin Cities
Scientific and technology innovations have made massive high-dimensional data ubiquitous in various fields, such as biological science, medical studies, public health, social sciences, e-commerce, finance, climate studies, and so on. During the past decade statisticians have developed a rich collection of new tools for high-dimensional statistical modeling. Despite these important advances, there are still many challenges and open problems to be dealt with in high-dimensional data analysis. Their solutions require innovative ideas and techniques to handle the methodological, computational and theoretical challenges. The goal of this research is to develop mathematically solid and computationally efficient methods to address these pressing and important inferential challenges.
This research consists of three projects. The first project concerns measurement errors in high-dimensional M-estimation. The PI will study a new unified convex approach to solve the error-in-variables penalized M-regression including Huber regression, logistic regression, quantile regression, and the support vector machine. In the second project the PI will establish a new inference tool named composite M-estimation and demonstrate its applications in high-dimensional learning. In the third project the PI will study a flexible heterogeneity pursuit method for understanding the heterogeneity effects in high-dimensional data. Software packages will be created to make the new methods readily available to other researchers and practitioners.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.954 |
2020 — 2023 |
Zou, Hui |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Novel Inference Procedures For Non-Standard High-Dimensional Regression Models @ University of Minnesota-Twin Cities
Statistical theory of hypothesis testing plays a fundamental role in virtually all scientific studies. In the era of big data, high-dimensional data are ubiquitous in many scientific fields such as natural sciences, social sciences, medicine, and public health. Therefore, modern applications often involve hypothesis testing under high dimensions, which calls for new statistical inference theory. As regression is the most popular statistical analysis tool in applications, some recent work has been focused on hypothesis testing in high dimensional least squares regression. However, it is well-known that the standard least squares regression model has severe limitations in real applications. This research aims to develop new statistical inference theory for more flexible high dimensional regression models.
This research focuses on the development of inference theory for several important non-standard regression models under ultra-high dimensions. Specifically, the PI will develop tests for testing linear hypotheses under three models: high dimensional expectile regression, high dimensional heteroscedastic regression, and robust high dimensional regression. Asymptotic distributions of the test statistics will be established rigorously. The theoretical study will fill important gaps in the high-dimensional statistics literature. A unified efficient algorithm will be developed to tackle the computational challenges. The research will provide principled tools for studying expectile functions, for examining the heterogeneity in high dimensional data and for performing robust inference. Research training opportunities for graduate students will be provided.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.954 |
2022 — 2025 |
Zou, Hui Qian, Feng (co-PI) [⬀] Ding, Jie |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Imr: Mm-1a: Evolutionary Modeling and Acquisition of Multidimensional 5g Internet Measurements @ University of Minnesota-Twin Cities
Commercial 5G networks are being quickly rolled out in the U.S. The high-throughput, low-latency natures of 5G enable numerous exciting applications, such as cloud/edge assisted machine learning, networked virtual/augmented reality, connected and autonomous vehicles, low-latency IoT applications, and digital agriculture. However, despite 5G’s potential, the research community still lacks a thorough understanding of 5G performance in the wild in the following aspects: (1) unlike its predecessors, 5G encompasses more diverse technologies; (2) the underlying data patterns are often time-varying at different scales; and (3) Scientists often have limited resources to model and acquire multidimensional 5G measurements. An overarching goal of this project is to develop novel statistical methods for modeling complex internet measurements and designing data collection under practical constraints. The proposed research is expected to have a broader impact on the practice and education across statistics, machine learning, signal processing, internet data analysis, and data privacy. The project will integrate the materials developed by this project into courses in statistics and computer science. In addition, the project will actively outreach to local high schools and colleges to organize workshops or summer camps for underrepresented minorities in STEM and engage them in hands-on learning projects.<br/><br/>In this project, the cross-disciplinary team aims to significantly advance the fundamental understanding of the modeling and sampling of 5G measurements. The PI and Co-PIs will leverage their expertise to develop learning frameworks, advanced algorithms, and analysis techniques for internet measurements in two interconnected research thrusts. First, evolutionary space-time modeling, an innovative and principled framework for statistical modeling of the 5G internet measurements across space and time will be developed. This modeling paradigm can flexibly incorporate modern nonparametric supervised learning techniques and perform online computation/updating of the model. Second, an influence-based approach to data acquisition, in which system designers can address various constraints, such as variable sparsity and data privacy, will be developed. The research outcomes will offer valuable and practically powerful tools for scientists to understand the 5G internet from streaming data across a large span of space and time.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.954 |