2022 — 2026 |
Balasubramanian, Vijay (co-PI) [⬀] Chaudhari, Pratik |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Ri: Medium: Modl: Occams Razor in Deep and Physical Learning @ University of Pennsylvania
Deep neural networks (DNNs) are machine learning models inspired by how neurons perform computations in the animal brain. Over the past decade these models have led to revolutions in many fields of science and engineering, from making predictions of the next word on the keyboard of a mobile phone to selecting between cosmological models that best explain the structure of the universe. Although computer scientists have gained expertise in building these systems, they do not currently understand why they work and when they can fail. The research agenda focuses on developing theoretical tools that will build such a needed understanding for DNNs, with the hope that these same tools will also shed light on how learning occurs in biological systems, e.g., networks of neurons in the brain. The intellectual goal of the project is to identify common themes in the ways artificial and biological systems learn. The educational and outreach goals include (a) developing curricula at the intersection of computer science, neuroscience, and mathematics, (b) organizing tutorials on artificial intelligence for high-school students in Philadelphia, and (c) mentoring young researchers in the LatinX mathematical research community.<br/><br/>Training a deep network reduces to a high-dimensional, large-scale, and non-convex optimization problem; curiously enough, simple algorithms like stochastic gradient descent are not just sufficient but also seemingly necessary for training DNNs. Accepted statistical wisdom suggests that the larger the model class, the more likely the learned model will overfit the training data. Yet, DNNs generalize extremely well to new data. This project seeks to unravel this apparent paradox: The central hypothesis is that DNNs succeed when the learning tasks exhibit a characteristic structure called “sloppiness.” For sloppy learning tasks, the Fisher Information Matrix of the learned network has eigenvalues that are distributed uniformly across a range that is exponentially large in the rank of the matrix. This project will investigate how this sloppy structure results in the training process exploring only a tiny subset of the function space, thereby yielding both rapid training and good generalization. It will characterize the shape of this tiny subset to understand why networks learn simple, low-dimensional functions for typical learning tasks. Connections will be made to biological and physical systems that learn through local learning rules and also exhibit such a sloppy structure (e.g., networks of neurons in the brain and elastic polymer networks such as proteins). The technical objective is to reveal universal principles of learning, namely a drive towards simplicity and low-dimensional internal representations exhibited by both DNNs and physical learning networks.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.913 |