Michael Schatz - US grants

Affiliations:

2010

Computer Science

University of Maryland, College Park, College Park, MD

Tree Info Publications Similar researchers Related pubs Distance to... Nearest Nobel PubMed Report error

We are testing a new system for linking grants to scientists.

The funding information displayed below comes from the NIH Research Portfolio Online Reporting Tools and the NSF Award Database.
The grant data on this page is limited to grants awarded in the United States and is thus partial. It can nonetheless be used to understand how funding patterns influence mentorship networks and vice-versa, which has deep implications on how research is done.
You can help! If you notice any innacuracies, please sign in and mark grants as correct or incorrect matches.

Sign in to see low-probability grants and correct any errors in linkage between grants and researchers.

High-probability grants

According to our matching algorithm, Michael Schatz is the likely recipient of the following grants.

Filter high-probability grants:

Years	Recipients	Code	Title / Keywords	Matching score
1999 — 2004	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Experiments On Dynamics and Control of Spatiotemporal Chaos in Thermal Convection @ Georgia Tech Research Corporation The research will explore the fundamental of controlling fluid flow in convective systems. Experiments will be carried out on surface tension driven (Marangoni) convection where the driving forces, thermally induced surface tension gradients confined to an interface, can be probed and altered for both laminar and chaotic convection. The driving will be determined via infrared imaging of the interfacial temperature field and altered direclty via nearly simultaneous, multipoint heating by an infrared laser scanner. Dynamics in three areas will be investigated: 1) wavenumber selection mechanism of spatially periodic patterns; 2) defect dynamics in disordered convective flow; 3) the role of unstable periodic orbits in spatiotemporal chaos. Show summary Hide summary	0.903
2002 — 2006	Neitzel, G. Paul (co-PI) [⬀] Smith, Marc Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Sger: Opto-Microfluidics: Containerless, Optically Controlled Microscale Fluid Management @ Georgia Tech Research Corporation CTS-0201610 Michael Schatz George Institute of Technology This is a one-year exploratory program to study the use of thermocapillary-based opto-microfluidics with water based samples. It is a "proof-of-principle" study to determine whether evaporation effects adversely affect the intended motion of such fluid samples by the proposed opto-microfluidic techniques. Experiments will be focused on aqueous microdroplets on immiscible liquid substrates. The effect of biologically-based surface-active agents on the motion of the microdroplets, and the metering and mixing of such samples by the proposed technique will also be studied. Show summary Hide summary	0.903
2004 — 2009	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Collaborative Research: Mspa-Cse: State Estimation and Predictability of High-Dimensional Complex Systems--Theory and Experiment @ Georgia Tech Research Corporation The goals of this project are to develop techniques for the detection and prediction of low-dimensional features in high-dimensional, spatially extended systems, to investigate how coherent structures within the system affect predictability, to test control strategies for high-dimensional, spatially extended systems, and to improve understanding of uncertainty as a function of spatial scale in such systems. One motivation for the work is the desire to develop better state estimation techniques for complex environmental systems. The model system for this work is that of shallow thermal convection. The project combines laboratory experiment with theory and modeling. As well as being a good model system, convection is important in many environmental systems in the atmosphere, ocean and interior of the Earth, and in industrial settings. Thermal convection exhibits multi-scale spatial and temporal complexity and laboratory convection experiments provide plentiful, high quality, observational data under relatively well-controlled conditions. The laboratory experiments use carbon dioxide as the convecting medium in a shallow cell. Specific spiral-defect chaos flow patterns can be initiated using a computer-steered laser system that selectively heats parts of the fluid. The primary theoretical tool is a state estimation technique in which a local ensemble Kalman filter is applied to a numerical Navier-Stokes solver. This will be applied to a range of flow patterns of increasing complexity exhibited by the convection cell as the Rayleigh number is increased. The data to be assimilated will be taken from shadowgraph images of the convection. Later stages of the project will include experiments in control of the convection using selective heating guided by output from the state estimation system. It is anticipated that the results of this work will be applicable to other spatially-extended complex systems. Show summary Hide summary	0.903
2006 — 2010	Catrambone, Richard (co-PI) [⬀] Schatz, Michael Marr, Marcus	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Collaborative Research: Institutionalizing a Reform Curriculum in Large Universities @ Georgia Tech Research Corporation In universities with large science and engineering programs, the introductory calculus-based physics course plays a central role in the education of very large numbers of students who will become scientists and engineers. Despite repeated calls from the physics community for improvement and modernization of this introductory physics course, the content and structure of the traditional course taught at most such large institutions has changed very little in the past fifty years. Although science and engineering universities often play a lead role in setting the standards for courses taught at other institutions, the large enrollment in their introductory courses, and the involvement of a large number of research faculty and academic support staff, has made it difficult to implement substantive curricular changes. Recently three large universities (NC State, Purdue, and Georgia Tech) have begun the process of implementing the Matter & Interactions curriculum, which was initially developed at Carnegie Mellon University. Matter & Interactions is a calculus-based introductory physics curriculum in which twentieth century physics is integrated as a central part of the curriculum, in which a small set of fundamental principles are emphasized and used as the starting point for all analyses, and in which computation is an integral part of the course. The collaborative work in this project, focused on facilitating the implementation and on widening the base of dissemination, involves creating supporting infrastructure and activities, studying and documenting the changes and adaptations necessary to make the curriculum work well at different institutions, assessing the impact of this curriculum on both students and faculty, and working on further improvements to the instructional materials used by students. Workshops and working group meetings will initially involve participants from the three institutions; in subsequent years teams from other interested institutions are participating. Intellectual Merit: Research and development in this project focuses on documenting and studying in detail the issues that arise, as well as carrying out the adaptation and customization necessary to implementing a reform curriculum at different large institutions. The project is also studying student learning in the context of this curriculum, and identifying and remedying deficiencies in the instructional materials themselves. Documenting the process of dissemination on this scale can inform future large-scale content reforms, both in physics or in other physical science and engineering disciplines. The existing body of research in physics education does not cover some of the central concepts and skills students in this new curriculum need to acquire, so that continued research on student learning is also important. The involvement of nationally known cognitive scientists brings important expertise and a different perspective to this research. Broader Impact: None of the previous attempts to reform the content and emphasis of the introductory university-level calculus-based physics course have achieved long-term and broad institutionalization, despite the excellence of the content of textbooks such as the Feynman Lectures and the Berkeley Physics series. The importance of contemporary concepts and models is even more marked now than it was in the past, because science and engineering students need this background to work on contemporary problems such as the design of new conducting materials; fast, high density data storage and retrieval; new communication technologies; nanoscience and nanotechnology; and computer modeling of extremely complex systems, including climate and geophysical phenomena. NC State, Purdue, and Georgia Tech are large and highly visible universities with strong science and engineering programs. Effective implementation of an innovative curriculum at these institutions can inspire other large institutions to consider similar reforms. Smaller institutions may not need to make use of all of the materials and structures developed by this project, but much of the work is producing materials and methods also useful in institutions in which teaching is done on a smaller scale. Show summary Hide summary	0.903
2009 — 2013	Schatz, Michael Webster, Donald	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Laboratory Studies of Exact Coherent Structures in Wall Turbulence @ Georgia Tech Research Corporation 0853691 Schatz The PIs plan novel laboratory tests of a new and fundamentally different theoretical view of wall turbulence. Organized flows (coherent structures) near smooth walls play a central role in turbulence production over a wide range of Reynolds numbers. Recent theoretical work with large scale numerical computation has revealed a class of unstable, exact Navier-Stokes solutions termed "exact coherent structures" (ECS), which capture essential features of classic coherent structures. In particular, theory suggests ECS can be used construct simplified descriptions of wall turbulence. While theory/numerics have made a compelling case for an ECS-based description in highly idealized turbulent flows with periodic boundary conditions, very little is known about whether this viewpoint can describe turbulence in the lab. The PIs plan experimental studies of ECS in circular Couette flow where turbulence is initialized by precise, optically-imposed disturbances and measured by 3D velocimetry. The PI's unique method of optical distributed flow actuation has already been tested in other flows; thus, the experiments will focus directly on flow physics instead of actuator development. Laboratory testing of an ECS-based description will begin first with the adaptation of theoretical state space visualization techniques to experiments. The experiments will then identify important ECS. In particular, the PIs will focus on the Lower Branch ECS, which appears to play a key "gatekeeper" role in the transition between laminar and turbulent flow. Characterization of Lower Branch ECS will set the stage for novel approaches to turbulence flow control. Turbulence is a major consideration in the design of transport vehicles on land, at sea or in the air. In practical applications; even modest improvements in turbulence control could have enormous economic impact. The tools of dynamical systems are becoming ever more important in solving engineering problems. Thus, our planned work will implement a laboratory module on state space visualization of pendulum dynamics that will be included in a novel introductory calculus-based engineering physics curriculum at Georgia Tech and at Spelman College, the country's leading Historically Black College/University for women. Show summary Hide summary	0.903
2009 — 2014	Schatz, Michael Kohlmyer, Matthew	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Transforming Homework Into Cyberlearning in An Introductory Stem Course @ Georgia Tech Research Corporation Physics (13) This multi-institutional project is developing novel cyberlearning exercises involving computation for introductory, calculus-based mechanics. The exercises provide students with rich dynamic setting to gain experience with solving a wide variety of mechanics problems and build correct physical intuition by using computation to visualize motion and to model more physically realistic situations (e.g. visualize 3D motion dynamically and calculate a wide range of quantities describing the motion). The exercises are also intended to help students overcome anxieties associated with using computation, thereby reinforcing the importance of computation as a key tool for solving today's science and engineering problems. Show summary Hide summary	0.903
2011 — 2012	Mccombie, W. Richard [⬀] Schatz, Michael Witkowski, Jan	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Meeting On the Future of Plant Genome Sequencing and Analysis to Be Held May 18-20, 2011 At Cold Spring Harbor Laboratory, Cold Spring Harbor, Ny. @ Cold Spring Harbor Laboratory The last few years have seen a revolution in DNA sequencing instrumentation and technology. In that short time period sequencing instrument capabilities have increased more than 1000 fold and are likely to continue to increase about 5-fold each year for the next several years. As such it is now affordable and efficient to sequence to high coverage large plant genomes that had previously been prohibitively too expensive and complex to attempt. However, analysis methods have not improved nearly as much during the same time period and a variety of technical limitations of these new DNA sequencing instruments make it even more difficult to carry out whole genome sequencing of novel genomes (de novo sequencing). These limitations also make it more difficult to use the new instruments to carry out older clone based strategies for de novo sequencing, such as BAC-by-BAC approaches. The purpose of this meeting to be held at Cold Spring Harbor Laboratory May 18-20, 2011 are to assess the current state of next generation sequencing in terms of de novo, whole genome plant sequencing, what can be expected to develop in the near future, and then determine which advances are needed to allow these exciting technologies to be used to carry out de novo sequencing of entire complex plant genomes. The meeting will bring together stakeholders with broad range of expertise in high-throughput sequencing and genomics, plant biology, bioinformatics and databases drawn from the academic, private and international sectors. Meeting outcomes will be captured in the form of a report to be developed by participants that will be submitted for publication to Genome Research. Show summary Hide summary	0.916
2011 — 2013	Schatz, Michael Roy, Rajarshi (co-PI) [⬀] Swinney, Harry Showalter, Kenneth (co-PI) [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Hands-On Research: Complex Systems Advanced Study Institute (China) @ Georgia Tech Research Corporation This award provides partial funding for an advanced study institute (ASI) on complex systems in physics to be held in Shanghai, China, in June 2012. A team of 12 senior researchers and 24 assistants from the U.S. will join colleagues at Shanghai Jiao Tong University (SJTU) to offer a two-week hands-on course demonstrating table-top experiments in complex non-linear physical systems. About 70 participants, primarily junior faculty members, will be selected from underdeveloped regions of Central and Southeast Asia. The objective is to demonstrate that interesting and productive experiments can be conducted with relatively inexpensive and available materials. This ASI is a successor to similar programs that have been held in Africa, India, and Brazil. The local expenses will be supported by SJTU, and participant expenses as well as administrative costs are sponsored by the International Center for Theoretical Physics (ICTP) in Trieste, Italy. This workshop will engage leading researchers and well qualified post-doctoral fellows and graduate students with participants from underdeveloped regions in Asia to conduct the table-top experiments as well as to introduce useful computer software to the participants. As has already been demonstrated by the previous Hands-on ASIs, these activities stimulate the curiosity of the participants and raise a variety of interest research questions that they can pursue on their own and in continuing collaborations. In addition to raising research issues in the study of complex phenomena the institute demonstrates useful and effective methods of scientific education. This experience is beneficial to the young U.S. assistants as much as to the Asian participants. Show summary Hide summary	0.903
2011 — 2017	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Cdi-Type Ii--Collaborative Research: Using Algebraic Topology to Connect Models With Measurements in Complex Nonequilibrium Systems @ Georgia Tech Research Corporation Numerous complex systems in nature and in technology defy concise characterization because they exhibit strongly nonlinear behaviors that lack all symmetries and are highly non-periodic on a wide range of spatial and temporal scales. Characterization by detailed measurement (in lab experiments or direct numerical simulations) is now possible in many cases using modern measurement technologies or computational techniques. However, the resulting deluge of data often leads to little insight; in particular, there is frequently no good way to connect quantitatively experimental measurements of a particular complex system with the output from simulations/models of the same system. New, computationally-based, mathematical tools from algebraic topology have the potential to bridge the gap between measurements and models; the proposed research will explore the use of algebraic topology to link numerical simulations and laboratory experiments in situations where complexity arises because the system under study is driven out of thermodynamic equilibrium. The research focuses on an outstanding paradigm for nonequilibrium complexity: fluid flow driven by temperature gradients (thermal convection). The planned work brings three unique capabilities together in a single effort: (1) the experimental ability both to measure and to manipulate precisely complex, convective flows; (2) efficient methods for state-of-the-art, large scale, high-resolution numerical simulations of convective flow; (3) open source, general purpose, and efficient computational algorithms and software for computing algebraic topological invariants on large data sets. Topological tools will be developed both to characterize and to minimize model error as well as to compare and to quantify dynamical properties including Lyapunov exponents, dimensionality and bifurcations between complex spatiotemporal flow states. This effort should ultimately identify ways in which homology-based metrics can be used for building reduced order models that permit prediction and, perhaps, control of convective flow. More generally, we expect the metrics developed for convection should find broad application to PDE-modeled problems ranging from the control of cardiac arrythmias to the prediction of weather and climate. The behaviors of complex systems in the world around us can now both be measured with high fidelity using advanced sensing technologies and simulated with great realism using modern computer techniques. However, the enormous data sets typically produced in these cases are often difficult to interpret because there exist few good mathematical tools to connect quantitatively the experimental measurements of a given complex system with the output of computer simulations of that same system. The proposed research explores the use of the mathematics of topology to relate lab measurements to computer outputs in a particular complex system, thermal convection. The results of this work should lead to new ways to understand, to predict, and, perhaps, to control convective flow, which plays a direct role in natural processes (e.g., volcanism, earthquake dynamics, continential drift) and industrial applications (e.g., thermal regulation of many devices, the growth of semiconductor materials). Moreover, the topological tools developed for thermal convection should apply more generally to a wide variety of other problems involving complex systems including the forecasting of weather and climate; the dynamics of the biomass in the oceans; the onset of turbulence; the evolution of reagent patterns on a catalytic metal surface; and ventricular fibrillation in a human heart. Show summary Hide summary	0.903
2012 — 2015	Ware, Doreen Lippman, Zachary (co-PI) [⬀] Schatz, Michael Churchland, Anne (co-PI) [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Reu Site: Cshl Nsf-Reu Bioinformatics and Computational Biology Summer Undergraduate Program @ Cold Spring Harbor Laboratory A Research Experience for Undergraduates (REU) Sites award has been made to Cold Spring Harbor Laboratory (CSHL) that will provide research training for 8 students, for 10 weeks during the summers of 2012- 2014. The program trains participants on the present and growing need to integrate biological research with sophisticated computational tools and techniques. CSHL has over 40 faculty members, including members of a newly established Quantitative Biology Department, who will serve as bioinformatics and computational biology mentors in fields ranging from plant biology to machine learning for biology. Through this NSF-REU support, students are afforded the opportunity to conduct full-time research in an appropriately matched lab based on mutual interests and goals. CSHL REU participants have access to individual and shared laboratory facilities such as flow cytometry, high throughput sequencing and analysis, imaging, and proteomics facilities. Participants attend multiple seminars and workshops, such as the responsible conduct in research, professional communication skills, the graduate school application process, and introduction to science careers. REU participants also are invited to attend the CSHL summer courses or meetings, which cover a range of topics such as Computational Neuroscience and Single Cell Analysis. All students are housed on campus within walking distance of their laboratories and the CSHL cafeteria, where they receive the majority of their meals. The multilayer recruitment effort consists of both traditional and digital mailings to potential students and their professors, as well as recruitment visits to universities throughout the country. Students are selected based on academic record, motivation for the proposed program of study, and potential as future researchers. Alumni successes are monitored to determine their continued interest in their academic field of study, their career paths, and the long-term impact of their research experience. Information about the program will be assessed using faculty and student evaluations, as well as the use of an REU common assessment tool. More information is available by visiting http://www.cshl.edu/education/urp/nsf-sponsored-reu-in-bioinformatics-and-computational-biology, or by contacting the PI (Dr. Zachary Lippman at lippman@cshl.edu) or the co-PI (Dr. Doreen Ware at ware@cshl.edu). Show summary Hide summary	0.916
2012 — 2017	Schatz, Michael Grigoriev, Roman	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Dynsyst_special_topics: Dynamics of Turbulent Flow Via Unstable Exact Navier-Stokes Solutions: Connecting Theory & Numerics With Experiments @ Georgia Tech Research Corporation The objective of this research program is to develop and to test experimentally a revolutionary new approach to modeling and predicting two-dimensional turbulent flows. A set of weakly unstable invariant Navier-Stokes solutions will be identified and transitions between invariant solutions will be characterized to provide a coarse global description of the nonlinear dynamics of turbulent flow. Quasi-2D flow in a shallow electrolyte layer continually driven by Lorentz forces provides the setting for theoretical, analytic and experimental development of this approach. Novel and proven techniques, such as periodic orbit theory, group representation theory, Krylov-subspace numerical methods, Newton and variational solvers will be used to develop this viewpoint, which will be tested in experiments where the flow can be measured with full spatial and temporal resolution throughout the entire flow domain. If successful, the results of this research will impact several areas of science, engineering, and medicine. Although the focus of this investigation is on fluid turbulence in two dimensions, the proposed approach has the potential to provide new ways to model, and ultimately control, a wide range of spatiotemporally chaotic systems, such as magnetic confinement fusion reactors and abnormal cardiac dynamics (from mild arrhythmias to potentially lethal fibrillation). The most immediate practical application, however, is the reduction of turbulent drag responsible for a significant part of the fuel consumed in the automotive, aviation, and shipping industries. Even an incremental reduction of drag by the proposed flow control methodology would have a tremendous economic impact. All software and data produced by the research program will be made publicly available with a central aim of lowering the barrier of entry to dynamical systems research by providing well-documented, easy-to-use interfaces to state-of-the-art numerical algorithms. Show summary Hide summary	0.903
2012 — 2017	Schatz, Michael Van Eck, Joyce Lippman, Zachary [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Genes and Networks Regulating Shoot Maturation and Flower Production in Tomato and Related Nightshades @ Cold Spring Harbor Laboratory PI: Zachary B. Lippman (Cold Spring Harbor Laboratory) Co-PIs: Michael C. Schatz (Cold Spring Harbor Laboratory) and Joyce Van Eck (Boyce Thompson Institute for Plant Research) Key Collaborators: Molly Hammell and Jesse Gillis (Cold Spring Harbor Laboratory) Plants show remarkable variation in the number of flowers they produce during their lifetime. This widespread variation traces back to differences in how, when, and where plants switch from making leaves to making flowers - the flowering transition. Although vitally important to crop yields, the transition to flowering and the subsequent effects on shoot growth and flower production remain poorly understood in many types of plants. For example, it is still not known why one plant will form just a single flower each time there is a flowering transition, as in pepper, and yet another plant will grow dozens of branches bearing hundreds of flowers, as in some types of tomato. To address this fundamental question in plant biology, this project is uniting a unique set of genetic, genomic, and natural variation tools in tomato and related Solanaceae plants, such as pepper, potato, and petunia, to reveal the genes and networks controlling how, when, and where plants undergo flowering transitions throughout development to continuously generate new branches and flowers. By analyzing a wide range of tomato mutants and wild Solanaceae species reflecting a wide range of flower production, this research will identify and characterize the differences in gene expression and DNA sequences that underlie variation in flowering transitions and flower production. This multi-dimensional project will provide the most detailed information yet on the key genetic regulators that drive the initiation and production of flowers in both agricultural and wild plants, which will enable the application of novel strategies to improve crop yields. The Solanaceae comprise the most valuable family for vegetable crop production, and we will deliver to both the public and scientific community broad genetic and genomic data in tomato, pepper, and edible wild Solanaceae species that have the potential to become agriculturally important crops. This project will train high school and college students in interdisciplinary plant research, and a unique outreach program has been developed with an elementary school in Queens, New York to excite young students about plant biology and to explain the importance of integrating multiple research disciplines to create the knowledge and tools that will ensure food security. Students will meet scientists, experience plant genetic research in their own school, experiment in a "Virtual Greenhouse" with kid-friendly genetics games, and practice science writing. Each year, several students will be awarded a daylong visit to CSHL to experience firsthand, modern plant biology research. All data from this project, including gene expression, genetic mapping, network analyses, and computational tools for analyzing DNA sequences will be made publically available immediately after passing quality control. All DNA sequence data will be deposited in Genbank (http://www.ncbi.nlm.nih.gov/Genbank/), the SOL Genomics Network (SGN) website (http://www.sgn.cornell.edu/), and a project web site that will be developed. Show summary Hide summary	0.916
2014 — 2019	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Career: Algorithms For Single Molecule Sequence Analysis @ Cold Spring Harbor Laboratory The Cold Spring Harbor Laboratory is awarded a CAREER grant for the PI Michael Schatz to develop new computational methods for processing DNA sequencing data from the latest high-throughput sequencing technologies. DNA sequencing costs and throughput have improved by orders of magnitudes over the last three decades, although many questions remain unsolved, especially because of the short sequence lengths currently available. Emerging "third generation" sequencing technology from Pacific Biosciences, Moleculo, Oxford Nanopore, and other companies are poised to revolutionize genomics by enabling the sequencing of long, individual molecules of DNA and RNA. The sequence lengths with these technologies can reach up to tens of thousands of nucleotides, however few or no analysis packages are capable of dealing with these types of genetic sequence data. This project will overcome these limitations by developing several novel analysis algorithms specifically for long read single molecule sequencing and their associated complex error models. The outcomes will help answer biological questions of profound significance to all of society, such as: What were the genetic implications of the domestication of rice? What genes and regulatory elements give rise to the incredible regenerative properties of the flatworm? or, What can be understood from assembling reference genomes of sugarcane and pineapple towards breeding more robust plant crops and biofuels? Specific objectives of the research include working towards assembling entire plant and animal chromosomes into complete, haplotype-phased sequences; identifying fusion genes and complex alternative splicing patterns responsible for diseases or adaptability; and searching for structural variations associated with improved crop yield or human diseases such as cancer or autism. Even if some future technology is capable of directly reading entire transcripts or entire genomes, this research will remain necessary to examine the higher level relationships across populations of genomes or in measuring the dynamics of gene expression and splicing. This project will tightly integrate research and education, promoting opportunities at high school through postdoctoral levels with the development of new course materials, hands-on research opportunities, and one-on-one mentoring experiences. This effort will specifically target the intersection of computer science and biology, promoting interdisciplinary education, and ensuring the next generation of scientists are ready for the complexities of quantitative and digital biology. To engage the widest possible audience, Dr Schatz will also develop novel online teaching materials made available through a yearly bioinformatics contest. The first round of the contest reached nearly 1000 students around the world and at all levels of education, engaging students far beyond our physical limits. The products of the research will be made available as open-source software, and installed into the graphical iPlant Discovery Environment making them easily accessible to the large community of plant researcher around the world. Show summary Hide summary	0.939
2016 — 2018	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Graduate Teaching Assistant Professional Development (Gta-Pd) Workshop @ Georgia Tech Research Corporation The chief importance of this project is its impact on increasing the number of Science, Technology, Engineering and Mathematics (STEM) degrees awarded at U.S. universities. The nation needs a sufficient number of STEM-degree holders, who possess technical skills that are crucially important for the nation's economic health and growth. Unfortunately, too many potential STEM majors are currently lost because of poor experiences in introductory university STEM courses. This project aims to improve dramatically the retention of students in STEM majors by propagating widely (by means of a national workshop) "best practices" for preparing high-quality instructors of key introductory STEM courses. Graduate Teaching Assistants (GTAs) are the primary instructors in laboratories and recitations in large lecture STEM courses; however, GTAs are often inadequately prepared to apply cutting-edge, evidence-based pedagogies necessary for excellent STEM instruction and, thereby, enhanced STEM-major retention. The goal of this project is widespread improvements to GTA preparation in introductory physics and chemistry courses via propagating GTA professional development programs. In a workshop, planning teams from Ph.D.-granting physics and chemistry departments will devise customized, realistic GTA professional development plans by interacting with each other and with world-class experts on best practices for GTA support. Twenty STEM departments will develop plans; each implemented plan will benefit 15 new GTAs annually, who then interact with 900 new undergraduates. Thus, the propagation of best practices by this workshop should impact positively approximately 13,500 new undergraduates each year. Show summary Hide summary	0.903
2016 — 2019	Schatz, Michael Churchland, Anne [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Reu Site: Cshl Nsf-Reu Bioinformatics and Computational Neuroscience Summer Undergraduate Program @ Cold Spring Harbor Laboratory This REU Site award to Cold Spring Harbor Laboratory (CSHL), located in Cold Spring Harbor, NY, will support the training of ten students for ten weeks during the summers of 2016-2018. This award is supported by the Division of Biological Infrastructure in the Directorate for Biological Sciences (BIO) and the Division for Mathematical Sciences in the Directorate for Mathematics and Physical Sciences (MPS).CSHL's REU in Bioinformatics and Computational Neuroscience (BCN) provides participants with an exceptional research experience, integrating genomics and neuroscience through shared analysis tools. Spanning genomes, cells, organisms and the brain, the program trains students to approach complex biological systems quantitatively. Students conduct full-time, independent research under the mentorship of one of CSHL's approximately 50 faculty members working in genomics, quantitative biology, and neuroscience. Participants have access to state-of-the-art technologies, such as high-throughput sequencing and two-photon imaging, and attend lab meetings and research seminars. The REU curriculum includes workshops on quantitative techniques, responsible conduct of research, scientific communication, and scientific careers. The REU culminates with a symposium in which participants present their work to CSHL's scientific community. Students are housed on CSHL's 110-acre campus, within walking distance of laboratories and dining halls. Participants receive room and board and a summer stipend and have access to campus amenities. Students apply online, supplying a personal statement, two letters of recommendation, and academic records. REU participants are selected based on academics, motivation, and demonstrated potential. It is anticipated that 30 students, primarily from schools with limited research opportunities or those from underrepresented groups, will be trained in CSHL's REU in BCN. Participants will learn to interrogate biological questions with computational tools and techniques. Through the 10-week REU experience, participants will learn how research is conducted. Many will present the results of their work at national scientific conferences, furthering their identity as independent scientists. A common web-based assessment tool used by all REU programs funded by the Division of Biological Infrastructure (Directorate for Biological Sciences) is used to determine the effectiveness of the training program. Students are tracked after the program to determine career paths. Students will be asked to respond to an automatic email sent via the NSF reporting system. More information about the program is available by visiting http://www.cshl.edu/Education/NSF-REU-in-Bioinformatics-and-Computational-Neuroscience.html, or by contacting the PI (Dr. Anne Churchland, churchland@cshl.edu), the co-PI (Dr. Michael Schatz, mschatz@cshl.edu). Show summary Hide summary	0.916
2016 — 2019	Mccombie, W. Richard (co-PI) [⬀] Birnbaum, Kenneth Jackson, David (co-PI) [⬀] Schatz, Michael Gingeras, Thomas	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Maizecode - An Initial Analysis of Functional Elements in the Maize Genome @ Cold Spring Harbor Laboratory PI: Thomas Gingeras (Cold Spring Harbor Laboratory) CoPIs: David Jackson, Robert Martienssen, W. Richard McCombie, Michael Schatz, and David Micklos (Cold Spring Harbor Laboratory), Doreen Ware (USDA-ARS/Cold Spring Harbor Laboratory); and Ken Birnbaum (New York University) Maize (corn) is one of the most economically and agriculturally important crops grown in the world. It has assumed this position after centuries of careful genetic breeding to enhance many of its growth and nutritional properties. The generation of high quality genome sequences paired with diverse molecular data allows scientists to better understand the effects of this selective breeding at both the genetic and epigenetic levels. The MaizeCODE project aims to create a comprehensive reference encyclopedia of highly useful genomic reference sequence resources for breeders and plant scientists to use to improve economically important crop traits like disease resistance and yield. In addition, this project will provide broad and comprehensive training opportunities for students, breeders and practicing scientists through specific courses and workshops that will address various approaches to obtain and analyze MaizeCODE data. The Education, Outreach, and Training (EOT) effort will prepare faculty from primarily undergraduate institutions (PUIs) to analyze MaizeCODE with undergraduate students and will provide travel awards for graduate students to attend MaizeCODE training at professional meetings. The EOT program will be unique in promoting Science, Technology, Engineering and Math (STEM) disciplines by anticipating and encouraging broad participation in primary data analysis by undergraduate and graduate students. Improved assemblies of the maize genome will provide a foundation for the identification of biochemically active and biologically functional elements encoded in this working-draft sequence. A comprehensive catalog of these elements will be a critical component in strategies to link genotype with important traits in maize, a classical genetic system. This is the overarching goal of the MaizeCODE project. The human ENCODE project is a model for such a comprehensive catalog. Building on the project team's leadership experience with ENCODE, this similarly integrated and multi-disciplinary project has three main objectives: 1) to develop high-quality working drafts for two inbred maze lines and one teosinte inbred line, 2) to identify regions of the maize genome that are transcribed, methylated, bound by specific modified histones in six cell types that are the major progenitors of the root and shoot systems (focusing on histone modification in three root cell types) and transcription factors in five unrelated tissues and 3) to store, collate, display and disseminate the data to the broader community of plant biologists worldwide. Given the wealth of "genome to phenome" studies in maize, and the emerging realization that much of the variation under selection acts at the level of gene regulation, it is expected that this project will have broad and significant impact on maize genetics research and breeding, with the potential to inform similar research in other grass crops. The project expects to extend significantly the functional annotations and the current understanding of the regulation of gene activity in maize, adding critical content to established databases and graphical genome display centers. Information generated in this project will be rapidly and broadly disseminated using publically available databases and the CyVerse (www.cyverse.org) online resource. Show summary Hide summary	0.916
2016 — 2019	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Collaborative Research: Revealing the Geometry of Spatio-Temporal Chaos With Computational Topology: Theory, Numerics and Experiments @ Georgia Tech Research Corporation The weather we experience is driven by convection, sunlight warms the earth which heats the atmosphere which is cooled by the cold temperatures of outer space. Most people are not interested in microscopic behavior, for example the behavior of the individual molecules in the air, nor macroscopic behavior, such as worldwide average temperature. What is of interest are mesoscopic patterns, for example weather fronts which result in local changes in temperature. This interest in mesoscopic, as opposed to micro- or macroscopic features, of large scale systems occurs in a wide variety of complex large scale physical phenomena such as combustion in engines, dynamics of biomass in the oceans, ventricle fibrillation in a human heart, etc. These mesoscopic patterns take on many different shapes and sizes and change with time, sometimes slowly and sometimes rapidly. The form of these patterns and how they evolve in time is often very dependent on parameters. New technologies are greatly increasing our abilities to measure and simulate these physical phenomena, resulting in enormous data sets, but our ability to extract and quantify this information in a way that leads to understanding, predictability, and control of these systems is not keeping pace. We will explore the use of new mathematical tools to address this problem. The spatial and temporal complexity of Rayleigh-Bénard convection produces high dimensional time series data. A relatively new algebraic topological tool called Persistent Homology will be used to provide new tools for nonlinear dimension reduction. To ensure the applicability of these methods and that physically important mesoscopic features of the dynamics are preserved they will be developed in conjunction with the further development of carefully controlled high precision convection experiments and state-of-the-art, large scale, high-resolution numerical simulations of the Boussinesq equations. This includes the analysis of the geometry of covariant Lyapunov exponents. The new computational tools developed in this work should find broad application in a wide variety of problems involving complex nonequilibrium systems in nature (oceanic and atmospheric flows, climate and weather forecasting) and in technology (nonlinear optical systems, combustion and chemical reactions) where understanding and prediction of complex behavior is desired. Show summary Hide summary	0.903
2016 — 2019	Dida, Mathews Odeny, Damaris Devos, Katrien Schatz, Michael Khang, Chang Hyun (co-PI) [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Bread Abrdc: Development of Essential Genetic and Genomic Resources For Finger Millet @ University of Georgia Research Foundation Inc Finger millet is a grain crop of strategic importance to food security in Eastern Africa. The grain has high nutritional value, can grow in arid environments and thus is important to the livelihood of smallholder farmers. A major agricultural goal in the region is to develop higher yielding varieties of finger millet through reducing or eliminating diseases that impact growth of the plant. Blast fungus is a pathogen that reduces yield up to 80% and is one of the main diseases affecting finger millet. To understand how to control disease outbreaks, this project uses genomic sequencing as a powerful approach to identify precise strains of the fungus and to study how the fungus causes disease symptoms in the plant. Sequence analyses of blast strains collected in Kenya, Tanzania, Uganda and Ethiopia will provide information on the genetic diversity of the pathogen in Eastern Africa, and provide a resource to identify the factors that are responsible for infection of finger millet. The knowledge from this approach is essential to develop efficient disease management strategies. Furthermore, sequence analyses of the finger millet host will clarify why some cultivars are more resistant to blast than others. The generated resources will also be used as a vehicle to train undergraduate and graduate students in Eastern Africa in bioinformatics, an expertise that is essential to translate the information to improve breeding strategies. The specific aims of the project are to (1) Generate 80X PacBio sequence for the allotetraploid finger millet genome (1C=1.8 Gb) to generate a high quality genome assembly (1C=1.8 Gb); (2) Resequence 200 Eastern African isolates of the finger millet blast fungus Magnaporthe oryzae, including 24 that were collected 10 years ago, to determine the diversity and evolution of this finger millet pathogen both over time and across geographic regions. The blast genome sequences will be mined to identify candidate effector genes using an effector prediction pipeline that incorporates common characteristics of known effectors (secretion and high polymorphism levels;(3) Analyze the blast-finger millet interaction transcriptome using RNA-Seq to identify genes that are induced at early stages of infection. Genes encoding secreted proteins will be identified from the RNA-Seq experiment and cross-referenced to those identified using the effector prediction pipeline. Host genes that are differentially expressed will be compared between compatible and incompatible interactions, and with genes that are differentially expressed during early stages of blast infection in rice, and(4) Develop a nested association mapping panel of some 4000 RILs derived from 21 diverse parents using a double round robin design. This population will represent the first mapping resource that captures substantial diversity present in finger millet germplasm and has a high quantitative trait loci detection power. Show summary Hide summary	0.936
2017 — 2020	Schatz, Michael Grigoriev, Roman	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Geometry and Topology of Fluid Turbulence: Theory and Experiment @ Georgia Tech Research Corporation This research project explores and experimentally tests a radically new mathematical framework for understanding and predicting complicated behaviors in numerous fundamental and practical problems in science, engineering, and medicine (e.g., weather forecasting, characterization of cardiac arrhythmias, etc.). Complex behaviors in many such problems are often governed by patterns that appear fleetingly but repeatedly. The research develops general, powerful techniques for identifying and quantifying key patterns, including the temporal sequences in which the patterns may appear; knowledge of the patterns and sequences can then be harnessed to construct "road maps" for predicting future behaviors. This study will focus on demonstrating "proof of principle" by constructing road maps of complex behavior observed in turbulent fluid flow in laboratory experiments. If successful, the results of this study will lead directly to the development of faster and more accurate ways to make predictions of complicated behavior in large real world problems. For example, the ability to identify and quantify important patterns and sequences in atmospheric turbulence should enable weather forecasts that are better and more rapid than those currently possible today. All software and useful solution data produced by the research activities will be made publicly available. The research program tightly integrates with teaching and learning at the undergraduate and graduate levels and includes activities to increase participation of underrepresented groups. The primary goal of this research program is to develop a novel geometrical/topological approach to modeling and prediction of turbulent flows and to validate it experimentally. Investigation will focus on a weakly turbulent flow in a shallow electrolyte fluid layer. A combination of existing numerical methods and new methods developed as a part of this program will be used to compute a large set of unstable states (known as exact coherent states in fluid dynamics) and the network of connections between these states, given by the numerically exact solutions of the mathematical model of the flow. Temporal averages will be compared with state averages in experiment and simulations to verify the statistical predictions of periodic orbit theory. A low-dimensional predictive model for the dynamics based on the topology of the network of connections will similarly be validated against experiment and simulations. Understanding, prediction, and control of spatiotemporally chaotic dynamics, in general, and of turbulent fluid flows, in particular, is largely an open problem of both practical and fundamental significance. The geometrical/topological framework that will be developed and tested under this project will provide a novel reduced-order, predictive, dynamical description of turbulent fluid flows. This framework will also provide a connection between the dynamical description and the conventional statistical description of fluid turbulence. In addition, this framework should serve as a foundation for a radically new way to control complex dynamical regimes in a wide range of applications. Show summary Hide summary	0.903
2017 — 2021	Van Der Knaap, Esther Van Eck, Joyce Lippman, Zachary [⬀] Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Research Pgr: Structural Variant Landscapes in Tomato Genomes and Their Role in Natural Variation, Domestication and Crop Improvement @ Cold Spring Harbor Laboratory Genome DNA sequences for many crops have been determined in the last two decades, providing the blueprints to discover genes that underlie key agricultural traits. However, a great challenge is identifying the differences in DNA between related varieties of the same crop, which are responsible for the subtle trait variation that plant breeders exploit to improve productivity. A major contributor to this trait variation is 'genome structural variation' where pieces of DNA are deleted, inserted, or rearranged resulting in changes in gene expression. This project will focus on how structural variation contributed to domestication and breeding of tomatoes. A related goal is to expand and develop new molecular tools to create structural variation for crop improvement. This project will improve US agriculture by providing new knowledge and tools to efficiently and predictably enhance crop productivity. A major part of the project will also include training of young scientists in fundamental principles of plant genome research that can be applied to agriculture. This knowledge will also be shared through outreach programs in inner city New York schools that do not have access to research opportunities. Project personnel will develop hands-on teaching activities that will highlight the importance of plant genomics and new genome editing technologies to improve crops and meet the agricultural needs of the 21st century. Limited knowledge on the extent and diversity of structural variation in plant genomes is hindering the ability to link genes to important crop phenotypes. This project will unite new long-read sequencing technologies, computational biology, developmental and quantitative genetics, and genome editing to elucidate and manipulate structural variation (SV) at a scale never before achieved for a major crop. Tomato provides a powerful system due to its relatively small and high quality reference genome and availability of resequenced genomes. By applying SV-detection algorithms to existing short-read Illumina sequencing data from hundreds of accessions, more than 40 genomes will be selected, capturing the majority of predicted SV diversity, to establish new reference genomes using the latest long-read sequencing technology (PacBio and 10X Genomics). From these data, a compendium of validated SVs will be generated and integrated with ongoing genome-wide association studies. Significant gene-associated SVs, including those affecting gene activity measured by genome-wide transcript profiling, will be characterized using CRISPR/Cas9 gene editing and quantitative phenotypic analyses, focusing on reproductive traits that drive crop productivity. In parallel, CRISPR/Cas9 gene editing will be used to generate a collection of SV mutations in known yield and fruit quality genes in two related wild Solanaceae with agricultural potential, with the goal of achieving major steps towards domestication and for comparative developmental genetics studies. This project will greatly expand our knowledge of genomic diversity in tomato, and provide a road map for dissecting SVs in other crops, where such knowledge can be exploited to improve productivity. Show summary Hide summary	0.916
2020	Blobel, Gerd A (co-PI) [⬀] Bodine, David M. (co-PI) [⬀] Hardison, Ross C [⬀] Schatz, Michael Weiss, Mitchell J (co-PI) [⬀] Zhang, Yu	R24Activity Code Description: Undocumented code - click on the grant title for more information.	Vision: Validated Systematic Integration of Epigenomic Data @ Pennsylvania State University-Univ Park Project Summary VISION: ValIdated Systematic IntegratiON of hematopoietic epigenomes Technological advances enabling the production of large numbers of rich, genome-wide, sequence-based datasets have transformed biology. However, the volume of data is overwhelming for most investigators. Also, we do not know the mechanisms by which the vast majority of epigenetic features regulate normal differentiation or lead to aberrant function in disease. We have formed an interdisciplinary, collaborative team of investigators to address the problem of how to effectively utilize the enormous amount of epigenetic data both for basic research and precision medicine. At this point, acquisition of data is no longer the major barrier to understanding mechanisms of gene regulation during normal and pathological tissue development. The chief challenges are how to: (i) integrate epigenetic data in terms that are accessible and understandable to a broad community of researchers, (ii) build validated quantitative models explaining how the dynamics of gene expression relates to epigenetic features, and (iii) translate information effectively from mouse models to potential applications in human health. These needs are addressed by the proposed ValIdated Systematic IntegratiON (VISION) of epigenetic data to analyze mouse and human hematopoiesis, a tractable system with clear clinical significance and importance to NIDDK. By pursuing the following Specific Aims, the interdisciplinary collaboration will deliver comprehensive catalogs of cis regulatory modules (CRMs), extensive chromatin interaction maps and deduced regulatory domains, validated quantitative models for gene regulation, and a guide for investigators to translate insights from mouse models to human clinical studies. These deliverables will be provided to the community in readily accessible, web-based platforms including customized genome browsers, databases with facile query interfaces, and data-driven on-line tools. Specifically, the proposed work in Aim 1 will build comprehensive, integrative catalogs of hematopoietic CRMs and transcriptomes by compiling and determining informative epigenetic features and transcript levels in hematopoietic stem and progenitor cells and in mature cells. CRMs will be predicted using the novel IDEAS (Integrative and Discriminative Epigenome Annotation System) method. Work proposed in Aim 2 will build and validate quantitative models for gene regulation informed by chromatin interaction maps and epigenetic data. Compiling and determining chromosome interaction frequencies will predict likely target genes for CRMs. Gene regulatory models will be built that predict the contributions of CRMs and specific proteins to regulated expression; these models will be validated by extensive testing using genome-editing in ten reference loci. Finally, work in Aim 3 will produce a guide for investigators to translate insights from mouse models to human clinical studies. This effort will include categorizing orthologous mouse and human genes by conservation versus divergence of expression patterns, assigning CRMs to informative categories of epigenomic evolution, and testing the interspecies functional maps experimentally by genome-editing. Show summary Hide summary	0.94
2020 — 2021	Goecks, Jeremy Morgan, Martin T Schatz, Michael	U24Activity Code Description: To support research projects contributing to improvement of the capability of resources to serve biomedical research.	Implementing the Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (Anvil) @ Johns Hopkins University Project Summary The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) powers the next generation of computational genomics research using cloud-scale data and compute resources. The platform is built on a set of established components, including the Terra computing platform and Dockstore for standards-based sharing of containerized tools and workflows. It also provides multiple entry points for data access and analysis, including batch workflows with Terra, notebook environments including Jupyter and RStudio, Bioconductor packages for building analysis on top of AnVIL APIs and services, and will soon offer Galaxy instances for interactive analysis. By providing a unified environment for data management and compute, AnVIL eliminates the need for data movement, allows for controlled access to sensitive data and monitoring, and provides elastic, shared computing resources that can be acquired by researchers as needed. NIH-sponsored biomedical research is increasingly moving to cloud-based data storage and analysis systems, with major cloud portals established for GTEx, Kids First, TOPMed, TCGA and several other major initiatives. However, using these systems together is a challenge. The individual data portals enable researchers to browse and query their own data but have limited functionality to share data or user registrations across portals or with cloud based workspaces, like Terra and Galaxy. The recently established NIH Cloud Platform Interoperability (NCPI) effort aims to address these issues by implementing key interoperability technologies across multiple NIH institutes. Under this project, we will work the NCPI working groups to define the use cases and standards for interoperability as well as implement three major technologies recommended by the NCPI within the Galaxy and R/Bioconductor components of AnVIL. First, we will implement the NIH Researcher Auth Service (RAS) to provide a common mechanism for researchers to establish their identity and access data they are authorized to use across Terra and Galaxy. Second, we will implement the Global Alliance for Genomics and Health (GA4GH) Data Repository Service (DRS) so that data consumers, including workflow systems, can access data objects in a single, standard way regardless of where they are stored and how they are managed. Finally, we will develop initial support in AnVIL for the Fast Healthcare Interoperability Resources (FHIR) standard. This standard describes data formats, elements, and an API for exchanging electronic health records (EHR), especially to ensure these records are available, discoverable, and understandable as patients move around the healthcare ecosystem. FHIR support in AnVIL will facilitate access to eMERGE and related projects by users once the data are ingested in AnVIL. Show summary Hide summary	0.939
2020 — 2021	Goecks, Jeremy [⬀] Schatz, Michael	U24Activity Code Description: To support research projects contributing to improvement of the capability of resources to serve biomedical research.	A Federated Galaxy For User-Friendly Large-Scale Cancer Genomics Research @ Oregon Health & Science University Project Summary Cancer research is now a data-driven discipline, but only a minority of cancer researchers are data scientists. This severely restricts our ability to effectively study and cure the disease. The far reaching significance of our project is in federating disparate data and computational resources in order to provide a unifying analysis platform for computational cancer research. We will extend the popular scientific workbench Galaxy (https://galaxyproject.org) so that it can integrate with distributed data and compute resources used and needed by cancer researchers, including those resources in the NCI Cancer Research Data Commons (NCR DC). Our Federated Galaxy system will allow users to seamlessly access NCR DC data across multiple resources. It will support multiple analysis scenarios tuned to skills and computational requirements of individual researchers. The aims of this project are: Aim 1. Extend Galaxy for working with distributed cancer genomics and phenotypic data. This will enable Galaxy users to access both public and private cancer data regardless of their actual physical location. Best-practice approaches will be used for accessing restricted datasets. Aim 2. Enhance Galaxy for context-aware, distributed cancer genomics analyses using shared workflow representations. This will enable Galaxy users to run genomics analyses on different clouds, ultimately reducing the time, cost, and data transfer associated with analyses. Aim 3. Apply Federated Galaxy to precision oncology research. Workflows developed in this aim will leverage the technologies in Aims 1 and 2 to benchmark machine learning algorithms for predicting tumor phenotype and drug response. Interactive reports will summarize benchmarking results and utilize ITCR visualizations for deep dives into results. Our system will provide a singular access point to distributed cancer datasets and will enable these data to be analyzed within a single portal in a way that satisfies multiple analysis scenarios and utilizes diverse computational resources. Finally, a cloud-centric Galaxy built for the NCR DC will substantially grow the community of users working with the GDC and the NCR DC. This is because Galaxy brings with itself a vibrant world-wide community of users and developers, which numbers tens of thousands of scientists. These individuals will help to tune the GDC and other resources within the NCR DC to the needs of real-life analysis scenarios and will enrich the set of tools accessible to cancer researchers. Show summary Hide summary	0.927
2020 — 2021	Nekrutenko, Anton [⬀] Pond, Sergei L Kosakovsky Schatz, Michael	R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies.	Tuning Big Data Analysis Infrastructure For Hiv Research @ Pennsylvania State University-Univ Park Summary The COVID?19/SARS?CoV?2 pandemic is a once in a generation, ?all?hands?on?deck? event for the scientific community. This pandemic is also the first in which real time genomic data are available, e.g. via GISAID [1], where genomic sequences are deposited daily. Vital insights about the virus and the epidemic depend on rapid and reliable genomic analysis of diverse viral sample sequences by multiple laboratories. Yet we repeatedly encounter the same avoidable shortcomings early in viral investigations, including COVID?19: lack of reproducibility, rigor, and data/analytic sharing. Only about 10% of the published genomes have quality metrics, primary data (read files), or any level of details on analytics, making these data irreproducible and unverifiable; over 40% of GISAID submissions to date provide no information about how the sequences were generated. Essential questions about the extent of intra?host genomic variability (indicative of adaptation or multiple infection), viral evolution (selection, recombination), transmission (phylogenetic and phylogeographic) cannot be answered reliably if researchers cannot trust/replicate the source data and analytical approaches. One of the key goals/deliverables of this supplement will be the open analytic workflows that can be used to curate and standardize genomic data, and high quality annotated variation data. Show summary Hide summary	0.94
2021	Schatz, Michael	U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies.	Integrative Genomic and Epigenomic Analysis of Cancer Using Long Read Sequencing @ Johns Hopkins University PROJECT SUMMARY The last twenty years have experienced extensive growth in the sequencing of cancer genomes, leading to a dramatically increased understanding of the role of genetic and epigenetic mutations in cancer. This has largely been enabled by developments in high-throughput ?second-generation? sequencing technology and analysis that characterize cancer genomes using short-reads. Recently, a new generation of high-throughput long-read sequencing instruments, primarily from Pacific Biosciences and Oxford Nanopore, have become available that are poised to displace short-read sequencing for many applications. We and others have used these technologies to discover tens of thousands of variants per cancer genome that are not detectable using short-reads, including structural variants and differentially methylated regions in known oncogenes and cancer risk genes. These technologies carry the potential to address many open questions in cancer biology, however, the analysis of long-read sequencing data is computationally demanding and needs specialized algorithms that are either too inefficient to use at scale or do not yet exist. In this proposal, we will address several gaps in the application of long-read technology for basic research and clinical use in cancer genomics. First, we will develop improved methods for finding structural variants and complex repeat expansions from long-reads, both of which are major diagnostic and prognostic indicators of disease, yet are not accurately identified using existing methods. Leveraging the improved phasing capabilities of long reads, this work will include the detection of mosaic variants, revealing tumor heterogeneity and variants in precancerous tissues. Next, we will apply machine learning and systems level advances to accelerate and improve the comparison of variants across large patient cohorts. Critically, this will compensate for the error prone nature of single molecule long-read sequencing to make these comparisons more accurate when comparing tumor-normal samples or pedigrees of related patients so that recurrent driving mutations can be accurately identified. Finally, we will develop integrative methods for the joint analysis of genome, transcriptome, and epigenetic profiling of cancer genomes. These advances will improve the identification of fusion genes, and allow for entirely new forms of epigenetic analysis, such as the allele-specific analysis of methylation across transposable elements and other repetitive elements. Synthesizing the many thousands of novel variants we will detect using our methods, we will then develop algorithms that will identify and evaluate recurrent genetic or epigenetic variations as putative driving mutations. All methods will be released open-source and will empower us, our ITCR collaborators, and the cancer genomics community at large to study genetic and epigenetic variants with near perfect accuracy and thereby unlock many new associations to treatment and disease. Show summary Hide summary	0.939
2021	Nekrutenko, Anton [⬀] Schatz, Michael	U24Activity Code Description: To support research projects contributing to improvement of the capability of resources to serve biomedical research.	Democratization of Data Analysis in Life Sciences Through Galaxy @ Pennsylvania State University-Univ Park Project Summary For over a decade, the Galaxy Project (https://galaxyproject.org/) has worked to solve key issues plaguing modern data intensive biology -- the ability of researchers to access cutting-edge analysis methods, to share analysis results transparently, and to precisely reproduce complex computational analyses. Galaxy has become one of the largest and most widely used open source platforms for biological data science. Promoting openness and collaboration in all facets of the project, from technical decisions to training and leadership, has enabled us to build a vibrant community of users, developers, system engineers, and educators who continuously contribute new software features, add the latest tools, adopt to the most modern infrastructure, author training materials, and lead research and training workshops. Genomics research is continuously evolving, and current challenges include the rapid growth in size and complexity of new datasets, the increasing availability of controlled-access datasets with human genomic components, and the continuing expansion in the breadth of research areas capable of generating high throughput data. The core Galaxy development team submitting this proposal will respond to these challenges by focusing on the following key priorities: - Rearchitect Galaxy for scalability and security using software container technologies; - Design new user interface (UI) for working with thousands of tools, workflows, and samples; - Enable interactive exploratory data analysis in Galaxy; - Facilitate community growth and support; - Enable effective training and outreach. Concentrating on these broad priorities will allow us to achieve the ultimate goal of the Galaxy Project: developing a data analysis medium connecting biomedical experts across the full spectrum of skill sets, scientific domains, and research practices. For biomedical researchers it will provide a powerful analysis platform populated with the latest tools and data. For tool developers it will provide a community-supported mechanism for deploying tools before a wide audience of users. For system administrators and engineers it will provide a framework they will feel comfortable deploying on any infrastructure. For educators it will provide a comprehensive collection of materials covering most data analysis needs and an infrastructure for delivering interactive, hands-on training workshops for audiences of different sizes. Show summary Hide summary	0.94
2022 — 2024	Schatz, Michael	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Collaborative Research: Eager: Unraveling the Nature and Onset of Instabilities in Suspension Flows @ Georgia Tech Research Corporation Flows containing particles (suspension flows) are found in countless settings in nature and in technology; examples range from silt-laden water streaming in a river to blood coursing through a cell-counting analyzer. Pure Newtonian fluids are well-known to undergo instabilities that lead to significant changes in the flow behavior. Suspension flows also experience instabilities; however, the mechanisms that drive suspension flow instabilities are not yet understood. In this project, proven techniques for characterizing instabilities in pure Newtonian fluids will be applied to suspension flows instabilities. This approach should reveal how such instabilities can be probed and manipulated in service of developing better ways to predict how the particles move and are distributed in practical applications. The proposed project is also expected to have significant educational impacts, including providing training on complex flow problem-solving for the next generation of scientists and engineers, attracting and training new graduate and undergraduate students from underrepresented groups and communicating the main ideas in a non-technical form to students at all levels of the educational system and the general public.<br/> <br/>The primary goal of this project is to demonstrate that the vast fundamental and applied knowledge of instabilities in pure (Newtonian) flows can be harnessed to achieve breakthrough understanding of instabilities in suspension flows. Specifically, this project will test the main Newtonian insight that structuring the flow geometry can unfold the transition process to reveal well-separated, non-turbulent transitions arising from instabilities that can be manipulated by imposing suitably designed perturbations. The project employs new laboratory experiments and existing theory to explore suspension flows in structured channels. First, the laminar steady state will be characterized as a function of Reynolds number for a specified particle size and selected average particle volume fractions. The research then examines both pure Newtonian fluid and suspension flows instabilities. The outcomes of this project should lay the foundations for future studies to investigate new and heretofore uncharted fundamental fluid physics that arises when inertial particles are added to the flow. The results of our work should set the stage for the discovery of new methods to manipulate flow and particles.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. Show summary Hide summary	0.903
2022 — 2027	Gillis, Jesse Frary, Amy Schatz, Michael Van Eck, Joyce Lippman, Zachary [⬀]	N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information	Research-Pgr: Dissecting the Dynamic Evolution of Paralogs in Shaping Trait Variation Across the Solanum Pan-Genome @ Cold Spring Harbor Laboratory The growing population and climate extremes are threatening food security. Agriculture is largely based on a few major crops, and revolutionary technologies in genome sequencing and CRISPR genome engineering are accelerating their improvement. These technologies can also improve “orphan” crops, which are not widely cultivated or studied but have the potential to increase the diversity and resilience of food production. Orphan crops are related to major crops, allowing translation of knowledge between them. However, orphan crops lack research tools, and an even greater challenge is determining whether specific genetic mutations that benefitted major crops can be engineered to improve traits similarly in orphan crops. This is because gene sequence and function change as species evolve, especially among genes that become duplicated, which is common in plants. This project will take advantage of the nightshade family – a source of many major and orphan crops, such as eggplant, pepino, and tomato – to study how duplicated genes evolve and affect agricultural traits in related species. Combining genome sequencing and CRISPR will reveal sequence diversity among thousands of duplicated genes and enable improved predictability in engineering genes and traits across species. This project will train young scientists with a focus on diversity and inclusion, as well as promote public understanding of genome engineering in plant biology through a community science program on orphan crops. Finally, new curricula and research opportunities for undergraduate students at a small liberal arts college will broaden participation and training of underrepresented groups in the plant sciences.<br/><br/>This project will exploit advances in large-scale reference genome sequencing, gene co-expression analyses, and CRISPR genome editing to dissect how paralog diversification impacts species-specific phenotypes in a genus of both fundamental and applied importance. Fifty Solanum species, including 16 orphan crops, will be sequenced to establish a Solanum Pan-Genome with telomere-to-telomere reference assemblies, providing a foundation for genus-wide comparative genomics and functional genetics. Computational approaches based on genomics data will be developed for precise assembly and comparison of complex genomes, and identification and classification of paralogs and their relationships based on their variants and expression patterns. Simultaneously, transformation protocols and genome editing will be developed and deployed for an array of Solanum to test how paralogs impact genotype-to-phenotype relationships within and between species. By focusing on major domestication gene families and the adaptation and productivity traits they control, this synergistic work will provide both a new understanding of paralog diversification in evolution and a more robust translation of agriculturally relevant genotype-to-phenotype relationships to orphan crops. Beyond a valuable community resource of Solanum reference genomes, expression data, and CRISPR lines for plant researchers and breeders, this multidisciplinary project will result in new tools, resources, and principles that will enable the study and engineering of other taxa and traits of significance to both plant biology and crop improvement.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria. Show summary Hide summary	0.916