2007 — 2010 |
James, Gareth |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Generalized Variable Selection With Applications to Functional Data Analysis and Other Problems @ University of Southern California
When variable selection is performed in situations where the number of predictors is significantly larger than the number of observations, one generally assumes sparsity in the regression coefficients, i.e., most of the coefficients are zero. However, there turn out to be many practical applications where, rather than the parameters being sparse, certain predefined functions of the parameters are sparse. This is referred to as "Generalized Variable Selection" (GVS). Specifically, the investigators study four important applications of GVS in areas as diverse as functional regression, principal component analysis (both standard and functional), multivariate non-parametric regression, and transcription regulation network problems for microarray experiments.
The investigators have direct connections in many fields outside statistics such as Biology, Finance, Manufacturing, Marketing, Medicine and Physics. The investigators believe that statisticians can, and should, make important contributions in all these areas. With the advent of new technologies, such as bar code scanners and microarrays etc., enormous data sets are becoming increasingly common in these and many other fields. Such vast quantities of data have made it important to develop statistical methodologies that can produce sparse and interpretable solutions. The investigators aim to systematically develop software to implement the proposed methods through free software packages, like R, and then make them readily available and publicize them in all these fields. The investigators believe that, because of the interpretive power of their proposed methods, once the software is available, it will be widely utilized.
|
0.915 |
2009 — 2012 |
James, Gareth Fan, Yingying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Regularization Methods in High Dimensions With Applications to Functional Data Analysis, Mixed Effects Models and Classification @ University of Southern California
Historically statistics has dealt with the problem of extracting as much information as possible from a small data set. However, over the last decade, because of technological advances in various fields such as image processing, computational biology, climatology, economics and finance, one of the most important active research topics in statistics now involves dealing with data sets with enormous numbers of predictors. Such large scale problems may be abstracted as statistical regression and classification problems with the number of explanatory variables much larger than the number of observations. In these situations some form of regularization is essential. The investigators study a general class of penalty functions and the theoretical properties of the resulting regularization methods in regression and classification settings. In addition, two specific penalty functions that each motivate a different methodology are developed. The theoretical and empirical properties of these methods in the most common linear regression setting are investigated. Finally, the investigators study extending the methodologies to areas that are less well explored in the high dimensional setting, namely, mixed effects models, functional linear regression, and classification problems.
The proposed research is expected to have a broad impact on the practice and education, both of statistics, as well as on fields outside statistics. The common theme underlying this entire proposal is that of developing general regularization penalties and related methodologies for high dimensional problems. The investigators together have direct connections in many fields outside statistics such as Computational Biology, Finance, Marketing, Machine Learning, and Econometrics. The investigators will systematically develop software to implement the proposed methods through free software packages, like R, and then make them readily available and publicize them in all these fields. High dimensional data are becoming increasingly common, so the developed methodologies and software will be widely utilized. The research will also contribute to the training and development of future data analysts (including both statisticians and researchers outside statistics who analyze data).
|
0.915 |