We are testing a new system for linking grants to scientists.
The funding information displayed below comes from the
NIH Research Portfolio Online Reporting Tools and the
NSF Award Database.
The grant data on this page is limited to grants awarded in the United States and is thus partial. It can nonetheless be used to understand how funding patterns influence mentorship networks and vice-versa, which has deep implications on how research is done.
You can help! If you notice any innacuracies, please
sign in and mark grants as correct or incorrect matches.
Sign in to see low-probability grants and correct any errors in linkage between grants and researchers.
High-probability grants
According to our matching algorithm, Olga Russakovsky is the likely recipient of the following grants.
Years |
Recipients |
Code |
Title / Keywords |
Matching score |
2021 — 2025 |
Russakovsky, Olga Narasimhan, Karthik |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Medium: Improving Grounding, Generalization and Contextual Reasoning in Vision and Language Models
Recent Artificial Intelligence (AI) advances have brought us closer to the possibility of important and exciting real-world applications: ranging from robot assistants for the elderly or differently-abled, to large-scale video analysis of footage from police body-worn cameras to examine police-civilian interactions. Such applications require AI models to understand both visual and natural language cues. However, the state of vision-and-language technology is still not quite ready for these scenarios. Current visual recognition models appear to recognize many different objects but lack an understanding of the interconnection and structure of the visual world. Current image captioning systems output reasonable but completely generic image descriptions. Modern visual question answering systems are not robust to simple changes like synonyms or word rearrangements. This research will lead to fundamental advances in visual recognition and natural language understanding, laying the groundwork for more effective human-machine collaboration.
The goal of this research is to move towards a tighter, more accurate and contextual integration of visual recognition and natural language processing. This involves addressing three key challenges: (1) enabling accurate and scalable grounding by establishing robust bi-directional connections between visual input and natural language tokens; (2) improving generalization of vision-and-language models to novel concepts and tasks; and (3) enabling contextual reasoning to allow models to effectively adapt to human or task-specific needs. The unifying theme is that all three challenges require innovation in not only modeling but also in reliable and insightful benchmarking: current evaluation frameworks are insufficient to drive progress in this space. The roadmap is to redesign existing benchmarks and evaluation paradigms, use the newly formulated metrics to identify the shortcomings in existing systems, and rely on these insights to drive the deep learning modeling innovations. This research uses the team’s expertise in designing multi-modal models for vision and language as well as in constructing effective large-scale benchmarks. The findings will be disseminated through technical workshops, open access publications, and open-source code. They will also be integrated into undergraduate, graduate and K-12 curriculum through collaboration with foundations like AI4ALL.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |