2000 — 2004 |
El Ghaoui, Laurent |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Robust Optimization and Applications @ University of California-Berkeley
Most models used in engineering are prone to errors: poor measurement of physical parameters, complexity of phenomena at hand making "exact" models too hard to handle, etc. These uncertainties are now an important limiting factor in the use of optimization, which by nature tends to finely tune solutions to problem data. I propose a general framework for decision problems with uncertainty, based on a robust optimization approach. In this framework, uncertainty on problem data is treated as deterministic, unknown-but-bounded. A robust solution is one which tolerates changes in the problem data, up to the given bound. Such a solution is too hard to compute in general, but relaxation techniques based on convex optimization can be devised to approximate the problem in an exhaustive manner. An efficient trade-off between computer effort and accuracy is obtained by finely exploiting the uncertainty structure in the relaxation process. The idea of robustness is well-known in the area of feedback control systems analysis and design, where I and others have developed very effective relaxation methods based on Linear Matrix Inequalities and related convex optimization. Obviously, robust optimization is relevant far beyond feedback control, for example to the combinatorial optimization problems with uncertain data arising in communications network design. This opens new avenues of research where seemingly very different problems are analyzed and solved in a unified framework. My proposal centers around theory and algorithms for robust optimization: convex optmization relaxations, large-scale problems, accuracy/complexity trade-offs, software. A specific target of this research is the development ofvarious ellipsoidal approximation tools, with applications to data fitting and estimation, simulation, and control of systems with structured uncertainty. In addition, I seek to apply these techniques to several important design problems in engineering, including timing analysis of RC circuits, filter and antenna array design, communications network design, and others. In addition to robust optimization tools and engineering applications, I plan to study sev- eral important problems in finance and management under the viewpoint of robustness and worst-case risk-notions that begin to stir great interest in the community. I intend to develop fast and efficient robust optimization tools that are specific to finance and management issues, based on ellipsoidal techniques for worst-case simulation and control. My education plans aims at providing a solid theoretical and practical basis for our students in the area of optimization, emphasizing the crucial issues of uncertainty and robustness. This goal will be implemented by course development intwo subjects: an advanced course in robust optimization, and a basic graduate course in convex optimization. An important aspect of this plan is the development of user-friendly robust optimization software tools that complement those I have developed in the past for convex optimization.
|
1 |
2006 — 2009 |
Jordan, Michael (co-PI) [⬀] El Ghaoui, Laurent |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Mspa-McS: Sparse Multivariate Data Analysis @ University of California-Berkeley
This proposal develops and studies sparse variants of classic multivariate data analysis methods. It primarily focuses on sparse principal component analysis (PCA) and the related sparse canonical correlation analysis (CCA), but also intends to explore sparse variants of methods such as correspondence analysis and discriminant analysis. The motivation for developing sparse multivariate analysis algorithms is their potential for yielding statistical results that are more interpretable and more robust than classical analyses, while giving up as little as possible in the way of statistical efficiency and expressive power. The investigators have derived a convex relaxation for sparse PCA as a large-scale semidefinite program. The proposed research first studies the theoretical and practical performance of this relaxation as well as the computational complexity involved in solving large-scale instances of the corresponding semidefinite programs. In a next step, it focuses on extending these results to the other multivariate data analysis methods cited above.
Principal Component Analysis (or PCA) is a classic statistical tool used to study experimental data with a very large number of variables (meteorological records, gene expression coefficients, the interest rate curve, social networks, etc). It is primarily used as a dimensionality reduction tool: PCA produces a reduced set of synthetic variables that captures a maximum amount of information on the data. This makes it possible to represent data sets with thousands of variables on a three dimensional graph while still capturing most of the features of the original data, thus making visualization and interpretation easier. Unfortunately, the key shortcoming of PCA is that these new synthetic variables are a weighted sum of all the original variables making their physical interpretation difficult. The proposed research will study algorithms for computing sparse PCA, i.e., computing new synthetic variables that are the weighted sum of only a few problem variables while keeping most of the features of the original data set. Sparse PCA is a hard combinatorial problem but the investigators have produced a relaxation that can be solved efficiently using recent results in convex optimization. The investigators plan to study the theoretical and practical performance of this relaxation and extend these results to other statistical methods.
|
1 |
2008 — 2014 |
Yu, Bin El Ghaoui, Laurent |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cdi Type Ii: Collaborative Research: Sparse Inference: New Tools For Structural Knowledge Discovery @ University of California-Berkeley
In most areas of human knowledge, the information revolution has resulted in a massive data explosion. The scale of data sets, their distributed and heterogeneous nature, and the need to quickly deliver results easily interpretable by non-experts, raise new theoretical and computational challenges for statistical learning, from variable selection and structural inference to visualization and online learning. This project uses sparse statistical inference as a powerful approach to meet these challenges. Its key insight is that seeking sparsity is a meaningful way of simultaneously stabilizing inference procedures, and highlighting structure in the underlying data. This work thus combines fundamental advances in sparse statistical learning with cutting-edge computational tools from mathematical programming to create a new framework for structural knowledge discovery in large-scale, streaming data sets. It is focused on two fundamental themes in sparse inference: variable selection and structural inference. Variable selection seeks to isolate a few key variables from high dimensional data sets and is a fundamental preprocessing tool in statistical learning. Structural inference then aims to consistently identify a few core dependence relationships among these variables to highlight its structure. From a computational point of view, many recent results in machine learning have relied on advanced methods from convex optimization such as semidefinite programming and robust optimization and this project seeks to improve the complexity of these algorithms and their capacity to handle very-large scale, streaming data. In practice, this project is motivated by the desire to help the public understand our democracies by analyzing large-scale political and social data sets, with a particular focus on voting records, online news sources, and polling data. Its approach is to apply statistical inference principles to social sciences, using collaborations with experts in political science and economics to forge the models and techniques under study. In carrying out the research, this project will be training graduate students from statistics, electrical engineering/financial engineering into interdisciplinary researchers at the interface of statistics, optimization, and subject matter areas such as finance and political science. In addition we plan to develop a web site, accessible first to a restricted set of social science researchers, to allow them to analyze mid-sized corpora of online news in text format, in the form of say, sparse graphs of words showing statistical associations between given keywords. The PIs plan to develop a software toolbox implementing these results, interfaced with common numerical packages such as MATLAB, R or python as well as an undergraduate course on ``Statistical Analysis of Online Data" at Berkeley and Princeton, incorporating some of the material produced in this project into the course program.
|
1 |
2008 — 2013 |
El Ghaoui, Laurent D'aspremont, Alexandre |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cdi-Type Ii: Collaborative Research: Sparse Inference: New Tools For Structural Knowledge Discovery
In most areas of human knowledge, the information revolution has resulted in a massive data explosion. The scale of data sets, their distributed and heterogeneous nature, and the need to quickly deliver results easily interpretable by non-experts, raise new theoretical and computational challenges for statistical learning, from variable selection and structural inference to visualization and online learning. This project uses sparse statistical inference as a powerful approach to meet these challenges. Its key insight is that seeking sparsity is a meaningful way of simultaneously stabilizing inference procedures, and highlighting structure in the underlying data. This work thus combines fundamental advances in sparse statistical learning with cutting-edge computational tools from mathematical programming to create a new framework for structural knowledge discovery in large-scale, streaming data sets. It is focused on two fundamental themes in sparse inference: variable selection and structural inference. Variable selection seeks to isolate a few key variables from high dimensional data sets and is a fundamental preprocessing tool in statistical learning. Structural inference then aims to consistently identify a few core dependence relationships among these variables to highlight its structure. From a computational point of view, many recent results in machine learning have relied on advanced methods from convex optimization such as semidefinite programming and robust optimization and this project seeks to improve the complexity of these algorithms and their capacity to handle very-large scale, streaming data. In practice, this project is motivated by the desire to help the public understand our democracies by analyzing large-scale political and social data sets, with a particular focus on voting records, online news sources, and polling data. Its approach is to apply statistical inference principles to social sciences, using collaborations with experts in political science and economics to forge the models and techniques under study. In carrying out the research, this project will be training graduate students from statistics, electrical engineering/financial engineering into interdisciplinary researchers at the interface of statistics, optimization, and subject matter areas such as finance and political science. In addition we plan to develop a web site, accessible first to a restricted set of social science researchers, to allow them to analyze mid-sized corpora of online news in text format, in the form of say, sparse graphs of words showing statistical associations between given keywords. The PIs plan to develop a software toolbox implementing these results, interfaced with common numerical packages such as MATLAB, R or python as well as an undergraduate course on ``Statistical Analysis of Online Data" at Berkeley and Princeton, incorporating some of the material produced in this project into the course program.
|
0.951 |
2010 — 2013 |
El Ghaoui, Laurent |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Mathematical Programming For Streaming Data @ University of California-Berkeley
A large amount of data is now easily accessible in real-time in a streaming fashion: news, traffic, temperature or other physical measurements sent by sensors on cell phones. Applying statistical and machine learning methods to these streaming data sets represents tremendous opportunities for a better real-time understanding of complex physical, social or economic phenomena. These algorithms could be used, for example, to understand trends in how news media cover certain topics, and how these trends evolve over time, or to track incidents in transportation networks.
Unfortunately, most algorithms for large-scale data analysis are not designed for streaming data; typically, adding data points (representing, say, today's batch of news articles from the Associated Press) requires re-solving the entire problem. In addition, many of these algorithms require the whole data set under consideration to be stored in one place. These constraints make classical methods impractical for modern, live data sets.
This project's focus is on optimization algorithms designed to work in online mode, allowing for faster, possibly real-time, updating of solutions when new data or constraints are added to the problem. Efficient online algorithms are currently known for just a few special cases. Using homotopy methods and related ideas, this work will seek to allow online updating for a host of modern data analysis problems. A special emphasis will be put on problems involving sparsity or grouping constraints; such constraints are important for example to understand how a few key features in the data set that explain most of the changes in the data. These new online algorithms will be amenable to distributed implementations to allow for parts of the data to be stored on different servers.
These methods will be applied to streaming news data coming from major US media, and also to the problem of online detection, which arises when tracking some important signal over, say, a communication network, in an online fashion.
|
1 |
2012 — 2017 |
Huang, Haiyan (co-PI) [⬀] El Ghaoui, Laurent Yu, Bin El Karoui, Noureddine (co-PI) [⬀] Bickel, Peter [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Frg: Collaborative Research: Unified Statistical Theory For the Analysis and Discovery of Complex Networks @ University of California-Berkeley
The investigators will develop a unified nonparametric theoretical framework to study stochastic network models and design scalable algorithms (and software) to fit these models. They intend to carry out the development and validation of their methods with collaborators in biology who have gathered extensive new data for the assessment of protein structure and determination of biological pathways particularly in Drosophila. Such problems are omnipresent in genomics and they expect their methods to carry over widely. They will also use networks with uncertainty measures to study relationships between words and phrases in a newspaper database in order to provide media analysts with automatic and scalable algorithms. In all cases they will focus on methods of general statistical confidence which have been lacking in work so far.
Our world is connected through relationships, among "actors" who can be people, organizations, words, genes, proteins, and more. Advancements of information technology have enabled collection of massive amounts of data in all disciplines for us to build relationships between these actors. These relationships can be effectively described as networks, and properties or patterns in these networks can be random or knowledge. Responding to this recent data availability and a huge potential for knowledge discovery, research in networks is attracting much attention from researchers in physics, social science, computer science, and probability. While contributing to the development of core statistical research, the proposed research will directly impact the interdisciplinary field of network analysis and the study of complex networks. The applications of their research results are diverse and well beyond the two fields studied in the proposal: genomics and media analysis. They include national security, communications, sociology, political science, and infectious disease. The statistical tools developed are unifying and could change how many scientists approach network analysis. As a result, statistical research will become more prominent in the networks community.
|
1 |