2009 — 2015 |
Tu, Zhuowen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Holistic 3d Brain Image Parsing by Integrating Implicit and Explicit Models @ University of California-Los Angeles
Designing automated algorithms to extract and analyze anatomical brain structures from neuro-images is of significant scientific and clinical importance in detecting abnormal brain patterns, analyzing various brain diseases, and studying the brain growth.
This project will develop a general statistical modeling/computing framework to perform 3D holistic brain image understanding. The framework emphasizes rigorous, efficient, and effective learning-based statistical models to integrate the complex appearances, varying 3D shapes, and the large spatial configuration of anatomical brain structures.
Implicit models through discriminative approaches have the advantages of fusing a large amount of information and obtaining decisions quickly. Explicit models through generative approaches can directly represent the information and thus, better explain the structure and model the transformation and scale change. The PI explores harmonic relationships between discriminative and generative models for 3D image parsing by combining implicit and explicit models along several directions: (1) learning-based models with rich appearance, and implicit shape and context; (2) integrating skeleton with surfaces for 3D shapes; (3) effective 3D shape representation and similarity measure; (4) component-based simultaneous registration and segmentation.
This research will contribute to automating the process of extracting a large number of anatomical structures, and enhancing the shape analysis needed for detecting brain diseases, monitoring health conditions, studying drug effects, and discovering brain functions. The scope of the proposed model goes beyond medical image analysis and can be applied in other problems of statistical modeling/computing, computer vision, multi-variate labeling in machine learning.
|
1 |
2012 — 2016 |
Tu, Zhuowen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Unsupervised Object Class Discovery Via Bottom-Up Multiple Class Learning @ University of California-Los Angeles
This project develops an integrated framework to perform simultaneous object discovery and detector training in an unsupervised setting. It takes advantages of large amount (millions or even billions) of well-organized internet images to automatically learn rich image representations for a wide range of objects. The main activities in this project include the following. (1) The central component of this project is a formulation to turn unsupervised data into weakly-supervised "noisy input" through which commonalities are explored for rich object representation using a new learning method. (2) A large dictionary of mid-level image representations will be learned on a large scale number of images retrieved using thousands of object words through the internet search engine. (3) A new flexible object representation is developed to deal with articulated/non-rigid objects.
The project advances computer vision and machine learning fields by developing an unsupervised paradigm to explore a large scale of internet images. The learned mid-level and high-level representations from images retrieved using thousands of words can significantly enhance the object representation power and benefit researchers in the object recognition field. The formulations, algorithms, and methods resulted from this project are also helpful to researchers in other fields such as medical imaging and data mining. The project dissemination plan includes the source code and learned mid-level and high-level representations.
|
1 |
2016 — 2019 |
Tu, Zhuowen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Unraveling and Building Top-Down Generators in Deep Convolutional Neural Networks @ University of California-San Diego
Deep learning has recently significantly advanced research fields that are closely related to artificial intelligence. The fundamental problem of knowledge representation however remains open and the role of top-down process in deep learning is yet not very clear. For example, to train a deep learning algorithm to detect simply the translation of a dog in an image, a data-driven way of training deep learning would require generating thousands of samples by moving the dog around in the image. However, a top-down model, if available, can directly detect translation using two variables along the axes. The main goal of this project is to explore a path to discover, learn, and build embedded deep learning models, accounting for a rich family of top-down spatial transformation and geometric composition in convolutional neural networks. The resulting models provide a transparent way of understanding the embedded top-down transformation process through neural network layers. The learned neurally-inspired top-down knowledge representation will benefit studies across multiple disciplines, including visual perception, brain sciences, cognitive modeling, and decision making.
The current practice in deep learning, for example convolutional neural networks (CNN), is largely dominated by data-driven bottom-up approaches. While the performances of various applications using convolutional neural networks (CNN) are impressive, there nevertheless exists a big gap between what bottom CNN can offer and what comprehensive intelligence requires. These strongly bottom-up CNN characteristics leave a big room for one to provide deep learning with the ability to also incorporate top-down information for effective knowledge representation, network learning, cognitive modeling, and visual inference. This project is about building a roadmap towards developing top-down generators. This is done by unraveling the role of explicit top-down knowledge representation and propagation, by studying the feature flows produced inside the convolutional neural networks, by building robust analysis-by-synthesis methods that combine top-down and bottom-up processes, and by creating explicit generative models to assist a wide range of applications. The benefit of studying the top-down generators to a broad family of applications is greatly intriguing, including but not limited to: creating network internal data augmentation, building object detection, developing scene understanding systems; modeling compositional and contextual object configurations; and performing zero-shot learning.
|
1 |
2017 — 2020 |
Tu, Zhuowen |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri:Small: Unsupervised Discriminatively-Generative Learning: @ University of California-San Diego
Great success has been achieved in obtaining powerful discriminative classifiers via supervised training where humans provide manual annotations to the training data. Unsupervised learning, in which the input data is not accompanied with task-specific annotations, is of great importance since a large number of tasks have no to little supervision. However, it still remains to be one of the most difficult problems in machine learning. A typical unsupervised learning task learns effective generative representations for highly structured data such as images, videos, speech, and text. Existing generative models for unsupervised learning are often constrained by their simplified assumptions, while existing discriminative models for supervised learning are of limited generation capabilities. This project develops a new introspective machine learning framework that greatly enhances and expands the power of both generation and discrimination for a single model. The outcome of the project, introspective generative/discriminative learning, significantly improves the learning capabilities of the existing algorithms by building stronger computational models for a wide range of fields including computer vision, machine learning, cognitive science, computational linguistics, and data mining.
This research investigates a new machine learning framework, introspective generative/discriminative learning (IGDL), which attains a single generator/discriminator capable of performing both generation and classification. The IGDL generator is itself a discriminator, capable of introspection --- being able to self-evaluate the difference between its generated samples and the given training data. When followed by iterative discriminative learning, desirable properties of modern discriminative classifiers such as convolutional neural networks (CNN) can be directly inherited by the IGDL generator. Moreover, the discriminator aspect of IGDL also produces competitive results in fully supervised classification by using self-generated new data (called pseudo-negatives) to enhance the classification performance against adversarial samples. The training process of IGDL is carried out using a two-step synthesis-by-classification algorithm via efficient backpropagation. Effective stochastic gradient descent Monte Carlo sampling processes for IGDL training are studied. Across three key areas in machine learning including unsupervised, semi-supervised, and fully-supervised learning, IGDL produces competitive results in a wide range of applications including texture synthesis, object modeling, and image classification.
|
1 |
2021 — 2024 |
Tu, Zhuowen Li, Tzu-Mao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Panoptic 3d Parsing in the Wild @ University of California-San Diego
Humans have the remarkable capability of recognizing/understanding 3D objects and scenes, due to the use of effective representations (yet not fully understood) that encode the intrinsic 3D world for the 2D projections. One of the main objectives in computer vision is to develop systems that can "see" the world. This project points to a new direction, panoptic 3D parsing (Panoptic3D), that jointly performs semantic segmentation, object detection, depth estimation, 3D shape reconstruction, and 3D layout estimation for single-view RGB images of natural scenes. The rapid development in 2D and 3D image modeling, representation learning, deep models, as well as large-scale cross-modality datasets provides an unprecedented opportunity to building the Panoptic3D systems. The Panoptic3D system can be adopted to offer assistance to scientific studies and experiments in other disciplines beyond computer science such as cognitive science, neuroscience, health-care, transportation/civil engineering, mechanical engineering, and computational biology.
This project highlights a roadmap to building a novel system, Panoptic 3D Parsing (Panoptic3D), that jointly performs semantic segmentation, object detection, instance segmentation, depth estimation, 3D shape reconstruction, and 3D layout estimation for single-view RGB images in the wild. The problem of image understanding and 3D (shapes and layout) reconstruction for single-view image is deeply rooted in decades of development in computer vision and photogrammetry. The project is inspired by the recent development in holistic image understanding and single-view 3D shape/layout reconstruction, the availability of large-scale 2D/3D image datasets, as well as successes in deep learning and representation learning. A number of technical innovations will be made by developing new 3D modeling and computing algorithms when combating the issue of absent comprehensive sets of multi-modality ground-truth annotations for segmentation/objects/3D shapes/3D layout of natural images in the wild. The potential gain of pursing this new direction is substantial and the proposed Panoptic3D system is applicable to a range of domains including computer vision, computer graphics, autonomous driving, mapping, robotics, human-computer interaction, and augmented reality.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |