2003 — 2008 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Transductive Learning For Retrieving and Mining Visual Contents @ Northwestern University
Contemporary visual learning methods for visual content mining tasks are plagued by several critical and fundamental challenges: (1) the unavailability of large annotated datasets prevents effective supervised learning; (2) the variability in different working environments challenges the generalization of inductive learning approaches; and (3) the high-dimensionality of these tasks confronts the efficiency of many existing learning techniques. The goal of this research project is to overcome these challenges by exploring a novel transductive learning approach.
The approach provides a unified framework accommodating four subtasks: (1) transduction that integrates unlabelled and labeled data to alleviate the challenge of limited supervision and to enable automatic annotation propagation; (2) model transduction that automatically adapts a learned model to untrained environments for efficient model reuse; (3) co-transduction that facilitates transduction with multi-modalities to handle high-dimensionality in visual data; and (4) co-inference that exploits the interactions among multiple modalities to enable efficient model transduction.
The research is linked to educational activities including the development of an integrated course of content-based visual data mining and the development of innovative course projects to engage students in research. The project disseminates research to other research communities through organizing workshops and tutorials, and to the general public, minority groups and woman students through creating Open House events.
The results of this project will lead to significant improvement on the quality of content-based and object-level multimedia retrieval, will greatly benefit visual recognition that requires large datasets for training and evaluation, will significantly reduce the efforts of training brand new models for un-trained scenarios, and will be very useful in intelligent video surveillance applications thus having a great impact on homeland security. A website, http://www.ece.nwu.edu/~yingwu, contains research results, including demos, constructed benchmark datasets and software can be accessed.
|
1 |
2004 — 2012 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Visual Analysis of High-Dimensional Motion: a Distributed/Collaborative Approach @ Northwestern University
This project is about analyzing high-dimensional motion (HDM) from video. HDM refers to various complex motions with high degrees of freedom, including the articulation of human body, the deformation of elastic shapes and the multi-motion of multiple occluding targets. The goal of this project is to overcome the curse of dimensionality embedded in this challenging visual inference problem, by systematically pursuing a new distributed/collaborative approach that unifies various HDMs.
Substantially different from centralized methods, the new approach distributes HDM into a networked representation of subpart motions, based on Markov network models. Then the prohibitive HDM inference tasks can be effectively and efficiently fulfilled by the "collaborations" among the distributed but mutually constrained small-scale visual inference processes, as revealed by the proposed theoretical study of this model and implemented by the proposed collaborative particle network algorithms. This new approach is expected to be significantly more efficient, more scalable and flexible, and more robust. This project has impact on intelligent video surveillance by making possible fast and accurate human tracking and detection techniques, and significantly benefits the research of human-computer interaction and medical imaging.
The research is linked to educational activities aiming at the promotion of learning and innovation through (1) developing an integrated curriculum for visual computing and statistical modeling; (2) motivating students to explore the unknown frontiers via innovative course projects and real-world applications; (3) outreaching to other related research communities via conferences and websites; (4) disseminating the research to the general public, female and minority students by creating Vision OpenHouse events.
|
1 |
2005 — 2009 |
Katsaggelos, Aggelos (co-PI) [⬀] Choudhary, Alok [⬀] Wu, Ying Memik, Seda Memik, Gokhan (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: High-Performance Techniques, Designs and Implementation of Software Infrastructure For Change Detection and Mining @ Northwestern University
ABSTRACT NSF 0536994, Choudhary NSF 0536947, Fox
Problems in managing, automatically discovering, and disseminating information are of critical importance to national defense, homeland security, and emergency preparedness and response. Much of this data originates from on-line sensors that act as streaming data sources, providing a continuous flow of information. As sensor sources proliferate, the flow of data becomes a deluge, and the extraction and delivery of important features in a timely and comprehensible manner becomes an ever increasingly difficult problem. More specifically, developing data mining and assimilation tools for data deluged applications faces three fundamental challenges. The amount of distributed real time streaming data is so large that even current extreme scale computing cannot effectively process it. Second, today's broadly deployable network protocols and web services do not provide the low latency and high bandwidth required by high volume real time data streams and distributed computing resources connected over networks with high bandwidth delay products. Finally, the vast majority of today's statistical and data mining algorithms assume that all the data is co-located and at rest in files. Here, the real time data streams are distributed and the applications that consume them must be optimized to process multiple high volume real time streams. The goal is to develop novel algorithms and hardware acceleration schemes to allow real-time statistical modeling and change detection on such large-scale streaming data sets. By using Service Oriented Architecture principles, a framework for integrating high -performance change detection software services, including accelerations of commonly used kernels in statistical modeling, into a Grid messaging substrate will be developed and tested. Geographical Information System (GIS) services will be supported using Open Geospatial Consortium standards to enable geo-referencing.
This project has the potential to have near-term and long-term impact in several important areas. In the near-term, the implementation of kernels and modules of statistical modeling and change detection algorithms will allow the end-user applications (e.g., homeland security, defense) to achieve one to two orders of magnitude improvement in performance for data driven decision support. In the longer term, the availability of toolkits and kernels for the change detection and data mining algorithms will facilitate the development of applications in many areas including defense, security, science and others. Furthermore, this research will bring the use of reconfigurable architectural acceleration of functions on streaming data including change detection and data mining, thereby opening new avenues of research and enabling newer data-driven applications on complex datasets. Both graduate and undergraduate students (through undergraduate fellowships) are engaged in the research. In addition, team members actively engage with minority serving institutions using audio/video and distance education tools.
|
1 |
2009 — 2015 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Computational Models of Context-Awareness and Selective Attention For Persistent Visual Target Tracking @ Northwestern University
Although persistent and long-duration tracking of general targets is a basic function in the human vision system, this task is quite challenging for computer vision algorithms, because the visual appearances of real world targets vary greatly and the environments are heavily cluttered and distractive. This large gap has been a bottleneck in many video analysis applications. This project aims to bridge this gap and to overcome the challenges that confront the design of long-duration tracking systems, by developing new computational models to integrate and represent some important aspects in the human visual perception of dynamics, including selective attention and context-awareness that have been largely ignored in existing computer vision algorithms.
This project performs in-depth investigations of a new computational paradigm, called the synergetic selective attention model that integrates four processes: the early selection process that extracts informative attentional regions (ARs), the synergetic tracking process that estimates the target motion based on these ARs, the robust integration process that resolves the inconsistency among the motion estimates of these ARs for robust information fusion, and the context-aware learning process that performs late selection and learning on-the-fly to discover contextual associations and to learn discriminative-ARs for adaptation.
This research enriches the study of visual motion analysis by accommodating aspects from the human visual perception and leads to significant improvements for video analysis. It benefits many important areas including intelligent video surveillance, human-computer interaction and video information management. The project is linked to educational activities to promote learning and innovation through curriculum development, research opportunities, knowledge dissemination through conferences and the internet as well as other outreach activities, and the involvements of underrepresented groups.
|
1 |
2010 — 2012 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Sino-Usa Summer School in Vision, Learning, Pattern Recognition Vlpr 2010 @ Northwestern University
The recent decade has witnessed rapid advances in computer vision research, not only in its fundamental studies but also its emerging applications. This Sino-USA summer school in Vision, Learning and Pattern Recognition (VLPR 2010) is held in Xi'an City, China. It brings together a high-quality team of leading American and Chinese researchers in computer vision to offer a one-week educational program to students and junior scholars from both US and China. This education program provides an important opportunity to discuss recent advance in Perception, Motion and Events, and allows technical and culture exchanges between researchers from two countries. Such interactions are important for fostering new understanding and new collaborations in science, education, and culture.
Summer School web site: http://vlpr2010.eecs.northwestern.edu/
|
1 |
2012 — 2016 |
Argall, Brenna Lynch, Kevin Murphey, Todd (co-PI) [⬀] Colgate, J. Edward Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Mri: Equipment Development: Bimanual Robotic Manipulation and Sensory Workspace @ Northwestern University
Proposal #: 12-29566 PI(s): Lynch, Kevin M; Argall, Brenna; Colgate, J. Edward; Murphey, Todd D; Wu, Ying Institution: Northwestern University Title: MRI/Dev.: Bimanual Robotic Manipulation and Sensory Workspace
Project Proposed: This project, developing an instrument consisting of robot arms/arms and vision-related equipment, aims to advance research in dexterous dynamic robot manipulation. The major components of the system are a two-arm manipulation system consisting of two 7-DOF WAM arms and three-finger hands with tactile and force-torque sensors; a sensory workspace consisting of high-speed vision for object tracking and color-depth cameras for lower-speed color imaging and occupancy maps; and a user command and control workstation, all integrated using software running under the Robot Operating System (ROS). The instrument enables research in various areas, such as manipulation, haptics, learning-by-demonstration, gesture recognition, rehabilitation, prosthetics, and novel sensing modalities (e.g., active electrosense).
Broader Impacts: The area of human-robot interaction should gain much from this instrument. An active area of research, dual-arm manipulation is extremely relevant in the context of manufacturing, which constitutes an important and urgent national concern. In terms of outreach and the involvement of under-represented groups, the team?s track record is evidenced by institutional programs, like summer research opportunities and partnerships to Girl Scout troops and local science museums. This project aims to place PhD students involved in the project as interns at Barrett Technologies, thus providing opportunities to closely collaborate with the robot arm/hand designer. Such collaboration transcends the traditional vendor-buyer relationship to possibly co-design and co-publish material and software components.
|
1 |
2012 — 2017 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Mining and Learning Visual Contexts For Video Scene Understanding @ Northwestern University
This project investigates a fundamental and critical, but largely unexplored issue: automatically identifying visual contexts and discovering visual patterns. Many contemporary approaches that attempt to divide and conquer the video scenes by analyzing the visual objects separately are largely confronted. Exploring visual context has shown its promise for video scene understanding. Discovering visual contexts is a challenging task, due to the content uncertainty in visual data, structure uncertainty in visual contexts, and semantic uncertainty in visual patterns. The goal of this project is to lay the foundation of contextual mining and learning for video scene understanding, by pursuing innovative approaches to discovering collocation visual patterns, to empowering contextual matching of visual patterns, and to facilitating contextual modeling for visual recognition. The research team develops a unified approach to mining visual collocation patterns and learning visual contexts, and to provide methods and tools that facilitate contextual matching and modeling.
This research significantly advances video scene modeling and understanding, and produces an important enabling technology for a wide range of applications including image/video management and search, intelligent surveillance and security, human-computer interaction, social networks, etc. This research program contributes to education through curriculum development, student involvements, and workshops and tutorials outside the vision community. This project also outreaches to K-12 education, and it provides datasets and software on its website to the community.
|
1 |
2016 — 2019 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Modeling and Learning Visual Similarities Under Adverse Visual Conditions @ Northwestern University
In many emerging applications such as autonomous/assisted driving, intelligent video surveillance, and rescue robots, the performances of visual sensing and analytics are largely jeopardized by various adverse visual conditions in complex unconstrained environments, e.g., bad weather and illumination conditions. This project studies how and to what extend such adverse visual conditions can be coped with. It will advance and enrich the fundamental research of computer vision, and bring significant impact on developing "all-weather"computer vision systems that benefit security/safety, autonomous driving, and robotics. The project contributes to education through curriculum development, student training, and knowledge dissemination. It also includes interactions with K-12 students for participation and research opportunities.
This research seeks innovative solution to overcome adverse visual conditions for visual sensing and analytics. It explores a unified approach that avoids explicit image restoration that is in general computationally demanding. It is focused on learning the "alignment" between the two image spaces under adverse and normal conditions, rather than learn everything from scratch. Acting on low-quality data directly without image restoration, this research leads to innovative and computationally efficient solutions to handle adverse visual conditions. Visual restoration can also be done as by-products, and the same approach also provides a general solution to target attribute estimation. The research is focused on: (1) constructing a principled model, called space alignment that models and learns visual similarity, and its theoretical foundation, (2) developing new effective visual matching and tracking approaches based on learning the appropriate visual similarity under various adverse visual condition, (3) investigating visual attribute estimation and identification via learning reconstruction-based visual regression, and (4) developing effective and efficient tools and prototype systems for visual detection, identification, tracking and recognition.
|
1 |
2018 — 2021 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: a Unified Compositional Model For Explainable Video-Based Human Activity Parsing @ Northwestern University
An ultimate goal of computer vision is understanding scene and activities from images and video. This task involves many perceptual and cognitive processes at various semantic levels. A next step beyond visual classification is visual interpretation, that is, to explain the relations among visual entities through visual inference and reasoning. Due to the enormous variability across instances of this problem, semantic parsing for explaining a visual scene and activities is highly challenging. This project studies how the structural composition of visual entities can be used to overcome the diversity in the visual scene and activities. It advances and enrich the basic research of computer vision, and brings significant impact on many merging applications, including autonomous or assisted driving, intelligent robots, and intelligent video surveillance. This research also contributes to education through curriculum development, student training, and knowledge dissemination. It includes interactions with K-12 students for participation and research opportunities.
This research is to develop a unified visual compositional model that can effectively learn complex semantic concepts in a scalable end-to-end fashion, while achieving good generalizability and providing explainable parsing of the visual data. The project is focused on: (1) a principled model and its theoretical foundation, by designing a stochastic grammar based on the probabilistic And/Or-Graph to model the structural composition; (2) an effective computational approach for learning and parsing, by exploiting data-driven pattern mining to discover structural components and by exploring how the patterns may be self-formed; (3) a solid case study on video human activity parsing and interpretation, by inferring the complex compositions of human actions, body movements, and interaction with the environment; and (4) tools and prototype systems for human articulated body pose estimation, contextual object discovery, and video-based human activity analysis and interpretation.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |
2020 — 2023 |
Wu, Ying |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Visual Reasoning and Self-Questioning For Explainable Visual Question Answering @ Northwestern University
Visual question answering (VQA), aiming to answer a question in natural language related to a given image, is still in its infancy. Current approaches lack flexibility and generalizability to handling diverse questions without training. It is therefore desirable to explorep explainable VQA (or X-VQA) that can provide explanations of its reasoning in natural language in addition to answers. This requires integrating computer vision, natural language, and knowledge representation, and it is an incredibly challenging task. By exploring X-VQA this project advances and enriches the fundamental computer vision, image understanding, visual semantic analysis, machine learning, and knowledge representation. And it also greatly facilitates a wide range of applications including visual chatbots, visual retrieval and recommendation, and human-computer interaction. This research also contributes to education through curriculum development, student training, and knowledge dissemination. It includes interactions with K-12 students for participation and research opportunities.
The major goal of this research is to develop a novel computational model with solid theoretical foundation and effective methods, to facilitate X-VQA that provides explanations of its visual reasoning. This challenging task involves many fundamental aspects and needs to integrate vision, language, learning and knowledge. This project focuses on: (1) A unified computational model of X-VQA and its theoretical foundation. This model integrates domain knowledge and visual observations for reasoning: what and how hidden facts can be inferred from incomplete and inaccurate visual observations; how visual observation, hidden facts, and domain knowledge can be represented for efficient question answering; and how the question answering can be scalable. The study of these critical issues creates the foundation for X-VQA; (2) A new model for question-driven task-oriented visual observation. It is inefficient to collect all visual observations before answering a question. Vision needs to be question-driven and task-oriented. This project pursues a new model for the interaction of questions, visual reasoning and visual observation, so as to automatically steer attention to the question-related aspects of an image; (3) An innovative approach to self-questioning for training X-VQA agents. Training simply based on question-answer data is not viable for X-VQA, as it is unable to provide explanations for and insights into the answer. This project pursues a novel approach to self-questioning, in which the VQA agents can also generate and ask questions. It investigates how self-questioning can be combined with reinforcement learning, and how it can deal with versatile questions to improve the scalability of X-VQA; and (4) A solid case study on X-VQA.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |