2008 — 2013 |
Torralba, Antonio |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Integrated System For Object and Scene Recognition @ Massachusetts Institute of Technology
Abstract
Title: Integrated system for object and scene recognition
PI: Antonio Torralba
Institution: MIT
In traditional computer vision, scene and object recognition are two related visual tasks generally studied separately. By devising systems that solve these tasks in an integrated fashion it is possible to build more efficient and robust recognition systems. At the lowest level, significant computational savings can be achieved if different categories share a common set of features. More importantly, jointly trained recognition systems can use similarities between object categories to their advantage by learning features which lead to better generalization. In complex natural scenes, object recognition systems can be further improved by using contextual knowledge both about the objects likely to be found in a given scene, and also the spatial relationships between those objects. Object detection and recognition is generally posed as a matching problem between the object representation and the image features while rejecting the background features using an outlier process. The PI will formulate object detection as a problem of aligning elements of the entire scene. The background, instead of being treated as a set of outliers will be used to guide the detection process.
In developing integrated systems that try to recognize many objects, the lack of large annotated datasets becomes a major problem. The PI created and will extend two datasets; LabelMe and the 80 million tiny images datasets. LabelMe is an online annotation tool that allows sharing and labeling images for computer vision research. Both datasets offers an invaluable resource for research and teaching on computer vision and computer graphics. The datasets are also intended to foster creativity, as they allows students at all levels to explore well established algorithms as well as devise new applications in computer vision and computer graphics. The PI will also develop new image and video datasets by exploiting the millions of images available on the internet.
The creation of robust systems for scene understanding will have a major impact on many fields by allowing the creation of smart devices able to interact and understand their environment, from aids to the visually-impaired, to autonomous vehicles, robotic assistants, or online tools for searching visual information.
The PI will extend his teaching and research activities beyond the boundaries of the classroom and the laboratory by developing a substantial amount of online material.
URL: http://people.csail.mit.edu/torralba/integratedSceneRecognition/
|
0.999 |
2015 — 2018 |
Oliva, Aude (co-PI) [⬀] Torralba, Antonio Pantazis, Dimitrios (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ncs-Fo: Algorithmically Explicit Neural Representation of Visual Memorability @ Massachusetts Institute of Technology
As Lewis Carroll famously wrote in Alice in Wonderland - It's a poor sort of memory that only works backwards-. On this side of the mirror, we cannot remember visual events before they happen; however, our work will help predict what people remember, as they see an image or an event. Our team of investigators in cognitive science, human neuroscience and computer vision bring the synergetic expertise to determine how visual memories are encoded in the human brain at milliseconds and millimeters-resolution. Cognitive-level algorithms of memory would be a game changer for society, ranging from accurate diagnostic tools to human-computer interfaces that will foresee the needs of humans and compensate when cognition fails.
The project capitalizes on the spatiotemporal dynamics of encoding memories while providing a computational framework for determining the representations formed from perception to memory along the scale of the whole human brain. A fundamental function of cognition is the encoding of information, a dynamic and complex process underlying much of our successful interaction with the external environment. Here, we propose to combine three technologies to predict what makes an image memorable or forgettable: neuro-imaging technologies recording where encoding happens in the human brain (spatial scale), when it happens (temporal scale), and what types of computation are performed at the different stages of storage (computational scale). Characterizing the spatiotemporal dynamics of visual memorability, and determining the type of computation and representation a successful memorability system performs is a crucial endeavor for both basic and applied sciences.
|
1 |
2015 — 2018 |
Torralba, Antonio |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Advancing Visual Recognition With Feature Visualizations @ Massachusetts Institute of Technology
The goal of this work is to develop a set of tools to visualize the information extracted by computer vision systems so that it is easier for researchers, and users, to understand their behavior. With the success of new computational architectures for visual processing, such as deep neural networks with many processing layers (e.g., convolutional neural networks) and access to large databases with millions of annotated images (e.g., ImageNet, Places), the state of the art in computer vision is advancing rapidly, and becoming integrated into many commercial products. But these advances come with the price that systems are becoming more complex, and it becomes harder for researchers and users to diagnose and understand the representations built by these systems. The goal of this work is to develop new techniques for visualizing what the algorithms are doing in order to elucidate their behavior.
The work will focus on developing algorithms for generic feature inversion. Most features perform complex non-linear operations over the image and it is not always possible to obtain analytic expressions to invert those computations. The goal of the proposal is to introduce new techniques that will allow inverting descriptors without constraining the descriptors. The second challenge will consists in understanding the inversion properties in order to allow comparison among different descriptions. If the inversion contains approximations, comparisons among descriptors might not be possible. Therefore it will be important to understand the convergence properties of the inversion algorithms. Another issue arises from the compressive nature of most descriptors. In general, some part of the input image information will be lost when encoded by an image descriptor. Therefore, the inversion will have to be a one-to-many function. Understanding the space of equivalent images under a particular descriptor will provide insights about what will be the likely errors made by a recognition system using them. This proposal will perform a variety of experiments with the feature visualizations, such as examining invariances in both engineered features and learned features from deep learning, visualizing learned models and decision boundaries, and diagnosing false alarms and missed detections.
|
0.999 |