2009 — 2015 |
Nguyen, Xuanlong Clark, James (co-PI) [⬀] Gelfand, Alan Agarwal, Pankaj |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cdi-Type Ii: Integrating Algorithmic and Stochastic Modeling Techniques For Environmental Prediction
Predicting biodiversity, i.e., abundance of species, in response to climate change is a goal of environmental change research. Despite recent valuable advances in understanding biodiversity and climate, the current grasp is limited. There are two widely recognized obstacles: first, because of the complexity of the underlying processes, the existing models intended for understanding and prediction are not (computationally) scalable. Second, the coarse-scale environment models fail to capture interactions among species, which control biodiversity, and the models based on fine-scale, short-term observations are unable to make long-term predictions. This project aims to develop a prediction framework that coherently combines broad-scale pattern data with fine-scale data on species interactions and that is computationally scalable. It focuses on prediction at the geographic scale and in using geographic-scale data to better understanding at the scales where species interactions occur.
The goal is to develop a multiscale modeling framework and to design algorithms that make environmental models computationally scalable. The approach hinges upon strong interplay of algorithmic and statistical techniques. Statistical inference brings stochastic modeling sophistication in space and time, yielding improved characterization of the process and the possibility of full inference. Sophisticated algorithms make models and processes scalable and provide trade-offs between accuracy and efficiency.
The project draws on a wide range of topics in computer science and statistics, including geometric algorithms, approximation algorithms, hierarchical specifications within a Bayesian framework, and space-time process modeling. The problem areas address in the proposed prototypical example indicate more broadly applicable consequential challenges for both computer science and statistics. These include maintaining/updating distributions and summaries, dynamic algorithms, data driven algorithms, stochastic optimization, and assessing uncertainty and multi-scale nonlinear interactions in inference. Techniques for obtaining trade-offs between conflicting goals are needed in order to optimize the overall performance of the model.
|
0.97 |
2010 — 2013 |
Michalak, Anna [⬀] Scott, Clayton (co-PI) [⬀] Cafarella, Michael Nguyen, Xuanlong Yadav, Vineet (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Si2-Ssi: Real-Time Large-Scale Parallel Intelligent Co2 Data Assimilation System @ University of Michigan Ann Arbor
Intellectual Merit: This proposal seeks to address this need by creating a state-of-the-art autonomous software platform for real-time integration of in-situ and satellite-based atmospheric CO2 measurements within a Data Assimilation (DA) system for producing estimates of global land and oceanic CO2 exchange at weekly to bi-weekly intervals. The proposed software infrastructure will be capable of autonomous processing of large volumes of data through a multi-stage pipeline, without the delays conventionally associated with such processing. Within the DA component, we will provide options for multiple DA algorithms for estimating global CO2 exchange. Users will, for the first time, have the capability to use these multiple methods as part of a single system for comparing estimates of CO2 exchange, and to obtain an improved understanding of the relative advantages of the various DA methods. As part of the analysis component of the software, we will build a carbon-climate surveillance system by drawing from a range of techniques in pattern recognition and high-dimensional statistical inference. This system will be able to detect and analyze localized variations in CO2 exchange within any user-specified spatio-temporal window. In addition, summaries of the CO2 exchange will be provided at annual and monthly temporal scales for continents and countries.
Broader Impacts: This software can be used by researchers and governmental institutions for evaluating both the natural components of the carbon cycle and anthropogenic carbon emissions, as well as in the design of new satellites for improved monitoring of CO2. All data and software will be publicly available and open-source development platforms will be used whenever possible. The algorithm prototypes developed as part of this project will be used in undergraduate and graduate courses at the University of Michigan, and will be made available online for educators at other institutions. Finally, the project will train three graduate students, with a focus on developing their cross-disciplinary skills in the field of Earth science, statistics, computer science, and atmospheric science.
|
1 |
2011 — 2014 |
Nguyen, Xuanlong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Cif: Collaborative Research:Small: Distributed Detection Algorithms and Stochastic Modeling For Large Monitoring Sensor Networks @ University of Michigan Ann Arbor
Operational and safety goals for the built environment demand robust, scalable and reliable large scale monitoring for infrastructure systems. High performance real-time event detection and decision making requires models and algorithms to process large amounts of data from dense sensor networks deployed in these systems. Despite advances in the development of detection algorithms for such networks, there are two widely recognized and conflicting obstacles: detection rules need to be sufficiently complex to adapt to the spatiotemporal changes in the environment, requiring the sharing of data; but rules are constrained by statistical performance guarantees and computation and communications budgets imposed by the network. This project addresses these challenges by developing a fundamentally new approach that jointly accounts for statistical detection, communication constraints and distributed computation. This research develops a framework that integrates the distributed computation and communication constraints of the underlying network infrastructure with flexible stochastic modeling and learning algorithms with spatiotemporal data. The modeling and algorithms enable simultaneous and sequential decision making at many local sites, by borrowing information across the network in a statistically coherent and computationally efficient manner. Combining the formalism of sequential change point detection, nonparametric and probabilistic graphical models and spatiotemporal statistics, the project develops distributed and sequential message-passing algorithms for detecting changes in the underlying distributions generating network data. The models developed also offer new theoretical understanding of the trade-offs between statistical model complexity, distributed computation efficiency, and structure of communication constraints within the network.
This interdisciplinary research brings together students and researchers from different areas, utilizing and developing knowledge and cross-disciplinary skills in the fields of computer science, statistics, signal processing and civil engineering.
|
1 |
2014 — 2019 |
Nguyen, Xuanlong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Geometric Approaches to Hierarchical and Nonparametric Model-Based Inference @ University of Michigan Ann Arbor
Hierarchical and nonparametric models present some of the most fundamental and powerful tools in modern statistics. Despite valuable advances made in the past decades, there are several widely recognized and emerging problems. First, even as these models are increasingly applied to large data sets and complex domains, a statistical theory for inferential behaviors of the hierarchy of latent variables present in the models is not yet available. Second, local inference methods based on sampling, although simple to derive, tend to converge too slowly, thereby losing their effectiveness. Third, most existing methods are incapable of handling highly distributed data sources, which are increasingly responsible for the influx of big data. Addressing these challenges requires fundamentally new ideas in theory, modeling and algorithms that must account for the contrast and interplay between the global geometry of an inference problem and the need for decentralization of inference and algorithmic implementation. This project aims to make fundamental contributions toward advancing hierarchical model-based inference. They include a statistical theory for the latent hierarchy of variables and for analyzing the effects of transfer learning. They also include variational inference algorithms based on the global geometry of latent structures and geometric analyses of the tradeoffs between statistical and computational efficiencies. Both the algorithms and theory are unified by the use of Wasserstein geometry, which arises from the mathematical theory of optimal transportation. Moreover, scalable hierarchical models will be developed that can exploit highly distributed data sources and decentralized inference architectures.
This research will improve our ability to manage, analyze and make decisions with large-scale, high dimensional and complex data, especially in the research and applications of networks and the environment. The decentralized detection algorithms for highly distributed data sources have the potential of advancing the state of the art technologies that support data-driven and high-performance distributed computing architectures. As such, this research has the potential of extending the capabilities of the real-time detection and tracking devices currently deployed in the health-care and security domains. The optimal transport based theory will deepen our understanding of hierarchical Bayesian inference, a fundamental concept of modern statistics. The algorithms and geometric analyses will provide useful tradeoffs between statistical and computational complexity, an important issue lying in the interface of Statistics and Computer Science. This research will also provide support for broadening the current statistics curriculum at the University of Michigan. The PI will integrate the teaching of statistical and computational tools with modern applications, by developing synthesis courses which interact closely with research topics of the project. This provides an excellent opportunity to train students with a broad base of knowledge and cross-disciplinary skills in the fields of statistics, probability, machine learning, distributed computation and networked systems.
|
1 |
2014 — 2018 |
Nguyen, Xuanlong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Twc: Medium: Collaborative: Data Is Social: Exploiting Data Relationships to Detect Insider Attacks @ University of Michigan Ann Arbor
Insider attacks present an extremely serious, pervasive and costly security problem under critical domains such as national defense and financial and banking sector. Accurate insider threat detection has proved to be a very challenging problem. This project explores detecting insider threats in a banking environment by analyzing database searches.
This research addresses the challenge by formulating and devising machine learning-based solutions to the insider attack problem on relational database management systems (RDBMS), which are ubiquitous and are highly susceptible to insider attacks. In particular, the research uses a new general model for database provenance, which captures both the data values accessed or modified by a user's activity and summarizes the computational path and the underlying relationship between those data values. The provenance model leads naturally to a way to model user activities by labeled hypergraph distributions and by a Markov network whose factors represent the data relationships. The key tradeoff being studied theoretically is between the expressivity and the complexity of the provenance model. The research results are validated and evaluated by intimately collaborating with a large financial institution to build a prototype insider threat detection engine operating on its existing operational RDBMS. In particular, with the help of the security team from the financial institution, the research team addresses database performance, learning scalability, and software tool development issues arising during the evaluation and deployment of the system. Research results are reported via technical papers and disseminated through conferences and journals, through a new research webpage at the UB's NSA- and DHS-certified center of excellence (CAE) in Information Assurance, and at the center's future workshops.
|
1 |
2020 — 2023 |
Nguyen, Xuanlong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Parameter Estimation Theory and Algorithms Under Latent Variable Models and Model Misspecification @ Regents of the University of Michigan - Ann Arbor
Latent variables models have become one of the most powerful tools in modern statistics and data science. They are indispensable in the core data-driven technologies responsible for advancing a vast array of domains of engineering and sciences. While these tools represent remarkable achievements in which statisticians have played fundamental and decisive roles, there are urgent and formidable challenges lying ahead. As these tools are increasingly applied to ever bigger data sets and systems, there are deep concerns that they may no longer be understood, nor is their construction and deployment reliable or robust. When treated as merely black-box modeling devices for fitting densities and curves, latent variable models are difficult to interpret and can be hard to detect or fix when something goes wrong, either when the model is severely misspecified or the learning algorithms simply break down. This project aims to address the theoretical and computational issues that arise in modern latent variable models, and the learning efficiency and interpretability of such statistical models when they are misspecified.
The goals of this project are to develop new methods, algorithms and theory for latent variable models. There are three major aims: (1) a statistical theory for parameter estimation that arises in latent variable models; (2) scalable parameter learning algorithms which account for the geometry of the latent structures, as well as the geometry of the data representation arising from specific application domains; and (3) impacts of model misspecification on parameter estimation motivating the development of new methods. These three broadly described aims are partly motivated by the PI's collaborative efforts with scientists and engineers in several data-driven domains, namely intelligent transportation, astrophysics and topic modeling for information extraction. In all these domains, latent variable models are favored as an effective approximation device, but practitioners are interested in not only predictive performance but also interpretability. In terms of methods and tools, this research draws from and contributes to several related areas including statistical learning, nonparametric Bayesian statistics and non-convex optimization. In terms of broader impacts, the development of scalable geometric and variational inference algorithms for latent variable models will help to expand the statistical and computational tool box that are indispensable in the analysis of complex and big data. The investigation into the geometry of singularity structures and the role of optimal transport based theory in the analysis of models and the development of algorithms will help to accelerate the cross-fertilization between statistics and mathematics, computer science and operations research. In terms of education and training, the interdisciplinary nature of this project provides an exciting opportunity to attract and train a generation of researchers and students in variational methods and optimization, statistics and mathematics, as well as machine learning and intelligent infrastructure. The materials developed in this project will be integrated into an undergraduate honor course and a summer school for statistical science and big data analytics developed at the University of Michigan.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |