2012 — 2017 |
Dasgupta, Sanjoy (co-PI) [⬀] Freund, Yoav [⬀] Chaudhuri, Kamalika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Medium: Quantifying and Utilizing Confidence in Machine Learning @ University of California-San Diego
This project defines meaningful notions of confidence in prediction, designs procedures for computing such notions, and applies these procedures to core machine learning tasks such as active learning, crowd-sourced learning, and tracking. In many applications it is helpful to have classifiers that output, together with each prediction, a rating of the confidence that the prediction is in fact correct. Existing literature either provides various ad-hoc ways for computing such ratings which typically lack a rigorous mathematical footing, or provides mathematically consistent methods (in the Bayesian framework) for computing confidence ratings under very strong assumptions that are unlikely to hold in practice. The research team investigates methods of computing measures of confidence that are mathematically rigorous while making minimal assumptions on the way data is generated, and use these measures to further develop solutions to core machine learning tasks.
Defining and computing mathematically sound measures of confidence lies at the heart of machine learning, pattern recognition and uncertainty in AI. Confidence-rated prediction, active learning, and tracking are fundamental tasks of machine learning and statistics that arise repeatedly in large-scale problems; this project will develop rigorous solutions to these problems. The algorithms developed in this work are tested and used in the Automatic Cameraman project, an interactive, audio-visual installation in the UCSD Computer Science department. The interactive Automatic Cameraman system are used an educational tool to be extended in many different directions, by teams of students at a variety of skill levels.
|
0.976 |
2013 — 2018 |
Chaudhuri, Kamalika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Differentially-Private Machine Learning With Applications to Biomedical Informatics @ University of California-San Diego
Machine learning on large-scale patient medical records can lead to the discovery of novel population-wide patterns enabling advances in genetics, disease mechanisms, drug discovery, healthcare policy, and public health. However, concerns over patient privacy prevent biomedical researchers from running their algorithms on large volumes of patient data, creating a barrier to important new discoveries through machine-learning.
The goal of this project is to address this barrier by developing privacy-preserving tools to query, cluster, classify and analyze medical databases. In particular, the project aims to ensure differential privacy --- a formal mathematical notion of privacy designed by cryptographers which has gained considerable attention in the systems, algorithms, machine-learning and data-mining communities in recent years. The primary challenge in applying differentially-private machine learning tools to biomedical informatics is the lack of statistical efficiency, or the large number of samples required.
The project will overcome this challenge by drawing on insights obtained from the PI's expertise to develop differentially-private and highly statistically-efficient machine learning tools for classification and clustering. The proposed research will advance the state-of-the-art in privacy-preserving data analysis by combining insights from differential privacy, statistics, machine learning, and database algorithms.
The proposed research is closely tied to the development of the undergraduate and graduate curricula at UCSD, feeding into the PI's new undergraduate machine learning class, a new graduate learning theory class, and updates to an algorithm design and analysis class. The corresponding materials will be publicly disseminated through the PI's website. The PI is strongly committed to increasing the participation of women and minorities, and will engage in outreach activities to attract and retain women in computer science.
|
0.976 |
2016 — 2019 |
Chaudhuri, Kamalika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Collaborative Research: New Directions in Spectral Learning With Applications to Comparative Epigenomics @ University of California-San Diego
The goal of this project is to design algorithms and statistical tools to build complex probabilistic models from massive quantities of data in a computationally efficient manner. This work is motivated by an important current problem in genomics, namely comparative epigenetics. While every cell in an organism has the same DNA sequence, epigenetic marks on the genome are known to be highly correlated with variation between cells. A pressing question in biology is to compare the epigenetic marks across different cell types to understand these differences. While massive amounts of data has been generated for this purpose, there is a great need for computational tools that can operate on this data and provide biologically meaningful solutions. This work will thus advance the state-of-the-art in the analysis of large complex data sets and advance the field of epigenomics. The broader impact of the work includes organizing workshops and tutorials at machine learning and bioinformatics venues, involving undergraduate students in research, and releasing open source software for the community.
Specifically, this project will focus on spectral learning, which has recently provided principled and computationally efficient methods for learning parameters of probabilistic graphical models. While spectral learning methods are known for some simple latent variable models, a major barrier to realizing the potential of spectral learning in real-world applications is the lack of associated statistical tools such as regularization and hypothesis testing that connect these methods in a principled manner to end-to-end application frameworks. This project proposes to develop such statistical tools by integrating modern spectral learning with the classical statistical literature in econometrics on Generalized Method of Moments. The project proposes to formulate the statistical generalized method of moment procedures for complex graphical models in the context of spectral learning as constrained optimization problems and proposes ways of solving these problems. Finally, the novel algorithms developed will be directly applied to model epigenomics data sets from the ENCODE and Roadmap Epigenomics Projects to yield methods that can operate on the massive quantities of data and provide biologically meaningful solutions. These algorithms and software have the potential to have a widespread impact on the understanding of complex human diseases such as cancer and mental disorders. This will provide a basis for designing therapeutics for these diseases and advance society towards a future of Personalized Medicine.
|
0.976 |
2017 — 2020 |
Javidi, Tara (co-PI) [⬀] Chaudhuri, Kamalika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ccf: Cif: Small: Interactive Learning From Noisy, Heterogeneous Feedback @ University of California-San Diego
The goal of this project is to develop interactive learning frameworks and methods that can learn predictors based on complex, imperfect feedback adaptively solicited in an on-line fashion from human annotators. Such predictors can significantly benefit the practice of machine learning by making it more accessible in domains where annotations are expensive. Currently, beyond a handful of heuristic studies, the only well-understood interactive learning setting is active binary classification, where a single annotator interactively provides labels to a learning algorithm. The main challenge in exploiting richer feedback is that human responses are inherently inconsistent and imperfect. This project will overcome this challenge by assuming that the responses come from unknown probability distributions with some mild yet realistic properties, which will be exploited to provide methods that can learn reliably from complex feedback.
Specifically, this project will introduce a general framework for interactive learning from imperfect, complex feedback, and develop methods for three common cases: (1) Active Learning with Abstention Feedback, where annotators can either provide a label or declare I Don't Know (2) Active Learning for Multiclass Classification, where the goal is to learn a classifier for a large number of classes and (3) Active Learning with Feedback from Multiple Annotators, where the goal is to combine feedback from many labelers with varying amounts of expertise subject to a budget. These problems will be approached through two main tools -- adaptive hypothesis testing and surrogate loss minimization. Combining these approaches will lead to principled algorithms for building accurate machine learning predictors with low annotation cost, which in turn, will benefit the practice of machine learning in domains where annotated data is expensive.
|
0.976 |
2018 — 2023 |
Chaudhuri, Kamalika |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Satc: Core: Frontier: Collaborative: End-to-End Trustworthiness of Machine-Learning Systems @ University of California-San Diego
This frontier project establishes the Center for Trustworthy Machine Learning (CTML), a large-scale, multi-institution, multi-disciplinary effort whose goal is to develop scientific understanding of the risks inherent to machine learning, and to develop the tools, metrics, and methods to manage and mitigate them. The center is led by a cross-disciplinary team developing unified theory, algorithms and empirical methods within complex and ever-evolving ML approaches, application domains, and environments. The science and arsenal of defensive techniques emerging within the center will provide the basis for building future systems in a more trustworthy and secure manner, as well as fostering a long term community of research within this essential domain of technology. The center has a number of outreach efforts, including a massive open online course (MOOC) on this topic, an annual conference, and broad-based educational initiatives. The investigators continue their ongoing efforts at broadening participation in computing via a joint summer school on trustworthy ML aimed at underrepresented groups, and by engaging in activities for high school students across the country via a sequence of webinars advertised through the She++ network and other organizations.
The center focuses on three interconnected and parallel investigative directions that represent the different classes of attacks attacking ML systems: inference attacks, training attacks, and abuses of ML. The first direction explores inference time security, namely methods to defend a trained model from adversarial inputs. This effort emphasizes developing formally grounded measurements of robustness against adversarial examples (defenses), as well as understanding the limits and costs of attacks. The second research direction aims to develop rigorously grounded measures of robustness to attacks that corrupt the training data and new training methods that are robust to adversarial manipulation. The final direction tackles the general security implications of sophisticated ML algorithms including the potential abuses of generative ML models, such as models that generate (fake) content, as well as data mechanisms to prevent the theft of a machine learning model by an adversary who interacts with the model.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.976 |
2019 — 2023 |
Chaudhuri, Kamalika Riek, Laurel Twamley, Elizabeth (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sch: Int: Tailored: Training For Independent Living Through Observant Robots and Design @ University of California-San Diego
The goal of this project is to create human-centered robotics technology to provide personalized neurorehabilitation to support older adults with mild cognitive impairment. The project investigates innovative approaches to this research area, and will make the following contributions to smart and connected health: create new approaches to support longitudinal, personalized robot learning in real-world environments; pioneer novel methods for delivering and sustaining cognitive neurorehabilitation, currently one of the only known treatments to prolong independence and slow the onset of disability caused by MCI; and contribute new methods to the fields of human-robot interaction (HRI), aging science, and behavioral science to support the co-creation of new technologies and new intervention delivery methods with older adults with cognitive impairments, their caregivers, and their providers. Harnessing technology to provide cognitive support and rehabilitation for older adults could potentially assist millions of people to maintain or improve their functioning and quality of life, and maintain their ability to live independently. Ultimately, these improvements could alleviate significant human suffering and lower healthcare costs for millions of people.
This project will inform multiple key research questions including: uncovering new methods for longitudinal preference learning, particularly with regard to how time-varying contextual bandits under concept drift can be employed across multimodal datasets; identifying principles for engaging in community-focused, stakeholder-centered research with people with MCI, with a particular focus on designing for resilience and autonomy; discovering how hybrid approaches to cognitive training delivered via a robot can inform cognitive functioning; and exploring how to design interventions for sustainability, both for people with MCI and other populations. The project team will also engage the public in intergenerational research between older adults and college students, empower older adults with cognitive impairments and their family members by giving them a voice in technology creation, recruit research students from groups underrepresented in computing and behavioral science, and broadly disseminate the research via publications, representations, and publicly available software frameworks with models, algorithms, and evaluation metrics.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.976 |
2022 — 2027 |
Dasgupta, Sanjoy (co-PI) [⬀] Wang, Yusu (co-PI) [⬀] Chaudhuri, Kamalika Mazumdar, Arya (co-PI) [⬀] Saha, Barna [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Encore: Institute For Emerging Core Methods in Data Science @ University of California-San Diego
The proliferation of data-driven decision making, and its increased popularity, has fueled rapid emergence of data science as a new scientific discipline. Data science is seen as a key enabler of future businesses, technologies, and healthcare that can transform all aspects of socioeconomic lives. Its fast adoption, however, often comes with ad hoc implementation of techniques with suboptimal, and sometimes unfair and potentially harmful, results. The time is ripe to develop principled approaches to lay solid foundations of data science. This is particularly challenging as real-world data is highly complex with intricate structures, unprecedented scale, rapidly evolving characteristics, noise, and implicit biases. Addressing these challenges requires a concerted effort across multiple scientific disciplines such as statistics for robust decision making under uncertainty; mathematics and electrical engineering for enabling data-driven optimization beyond worst case; theoretical computer science and machine learning for new algorithmic paradigms to deal with dynamic and sensitive data in an ethical way; and basic sciences to bring the technical developments to the forefront of health sciences and society. The proposed institute for emerging CORE methods in data science (EnCORE) brings together a diverse team of researchers spanning the afore-mentioned disciplines from the University of California San Diego, University of Texas Austin, University of Pennsylvania, and the University of California Los Angeles. It presents an ambitious vision to transform the landscape of the four CORE pillars of data science: C for complexities of data, O for optimization, R for responsible learning, and E for education and engagement. Along with its transformative research vision, the institute fosters a bold plan for outreach and broadening participation by engaging students of diverse backgrounds at all levels from K-12 to postdocs and junior faculty. The project aims to impact a wide demography of students by offering collaborative courses across its partner universities and a flexible co-mentorship plan for truly multidisciplinary research. With regular organization of workshops, summer schools, and seminars, the project aims to engage the entire scientific community to become the new nexus of research and education on foundations of data science. To bring the fruit of theoretical development to practice, EnCORE will continuously work with industry partners, domain scientists, and will forge strong connections with other National Science Foundation Harnessing Data Revolution institutes across the nation.<br/><br/>EnCORE as an institute embodies intellectual merit that has the potential to lead ground-breaking research to shape the foundations of data science in the United States. Its research mission is organized around three themes. The first theme on data complexity addresses the complex characteristics of data such as massive size, huge feature space, rapid changes, variety of sources, implicit dependence structures, arbitrary outliers, and noise. A major overhaul of the core concepts of algorithm design is needed with a holistic view of different computational complexity measures. Faced with noise and outliers, uncertainty estimation is both necessary, and at the same time difficult, due to dynamic and changing data. Data heterogeneity poses major challenges even in basic classification tasks. The structural relationships hidden inside such data are crucial in the understanding and processing, and for downstream data analysis tasks such as in visualization and neuroscience. The second theme of EnCORE aims to transform the classical area of optimization where adaptive methods and human intervention can lead to major advances. It plans to revisit the foundations of distributed optimization to include heterogeneity, robustness, safety, and communication; and address statistical uncertainty due to distributional shift in dynamic data in control and reinforcement learning. The third and final theme of EnCORE proposes to build the foundations of responsible learning. Applications of machine learning in human-facing systems are severely hampered when the learned models are hard for users to understand and reproduce, may give biased outcomes, are easily changeable by an adversary, and reveal sensitive information. Thus, interpretability, reproducibility, fairness, privacy, and robustness must be incorporated in any data-driven decision making. The experience and dedication to mentoring and outreach, collaborative curriculum design, socially aware responsible research program, extensive institute activities, and industrial partnerships would pave the way for a substantial broader impact for EnCORE. Summer schools with year-long mentoring will take place in three states involving a large demography. Joint courses with hybrid, and fully online offerings will be developed. Utilizing prior experience of running Thinkabit lab that has impacted over 74,000 K-12 students so far, EnCORE will embark on an ambitious and thoughtful outreach program to improve the representation of under-represented groups and help create a future generation of workforce that is diverse, responsible, and has solid foundations in data science.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.976 |