1998 — 2001 |
Kim, Junhyong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Estimating Large Scale Phylogenies: the Performance of Reconstruction Methods Under Increased Taxon Sampling
Kim 9806570 Abstract Dr. Kim's proposed project will investigate the scaling properties of phylogenetic reconstruction methods under various models of tree evolution and investigate how such information can be used to design the most powerful and efficient strategies for analyzing large scale phylogenies. The study addresses fundamental and controversial questions of concern to all evolutionary biologists; in fact the difficulty surrounding choice of tree model in light of taxon sampling currently is an intractable problem for most biologists who do phylogenetic reconstructions. The PI will study three clearly justified stochastic tree evolution models, three models of character evolution, and four different taxon sampling strategies. The study also could have broad sociological implications for the scientific environment if it clearly resolved questions these surrounding controversial issues. The study will help to provide answers to fundamental questions raised by systematic biologists.
|
0.97 |
2003 — 2007 |
Donoghue, Michael (co-PI) [⬀] Bader, David Warnow, Tandy Moret, Bernard Kim, Junhyong Williams, Tiffani |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Information Technology Research (Itr): Building the Tree of Life -- a National Resource For Phyloinformatics and Computational Phylogenetics @ University of New Mexico
This collaborative project aims to establish a national computational resource to move the research community much closer to the realization of the goal of the Tree of Life initiative, namely, to reconstruct the evolutionary history of all organisms. This goal is the computational Grand Challenge of evolutionary biology. Current methods are limited to problems several orders of magnitude smaller, and they fail to provide sufficient accuracy at the high end of their range.
The planned resource will be designed as an incubator to promote the development of new ideas for this enormously challenging computational task; it will create a forum for experimentalists, computational biologists, and computer scientists to share data, compare methods, and analyze results, thereby speeding up tool development while also sustaining current biological research projects.
The resource will be composed of a large computational platform, a collection of interoperable high-performance software for phylogenetic analysis, and a large database of datasets, both real and simulated, and their analyses; it will be accessible through any Web browser by developers, researchers, and educators. The software, freely available in source form, will be usable on scales varying from laptops to high-performance, Grid-enabled, compute engines such as this project's platform, and will be packaged to be compatible with current popular tools. In order to build this resource, this collaborative project will support research programs in phyloinformatics (databases to store multilevel data with detailed annotations and to support complex, tree-oriented queries), in optimization algorithms, Bayesian inference, and symbolic manipulation for phylogeny reconstruction, and in simulation of branching evolution at the genomic level, all within the context of a virtual collaborative center.
Biology, and phylogeny in particular, have been almost completely redefined by modern information technology, both in terms of data acquisition and in terms of analysis. Phylogeneticists have formulated specific models and questions that can now be addressed using recent advances in database technology and optimization algorithms. The time is thus exactly right for a close collaboration of biologists and computer scientists to address the IT issues in phylogenetics, many of which call for novel approaches, due to a combination of combinatorial difficulty and overall scale. The project research team includes computer scientists working in databases, algorithm design, algorithm engineering, and high-performance computing, evolutionary biologists and systematists, bioinformaticians, and biostatisticians, with a history of successful collaboration and a record of fundamental contributions, to provide the required breadth and depth.
This project will bring together researchers from many areas and foster new types of collaborations and new styles of research in computational biology; moreover, the interaction of algorithms, databases, modeling, and biology will give new impetus and new directions in each area. It will help create the computational infrastructure that the research community will use over the next decades, as more whole genomes are sequenced and enough data are collected to attempt the inference of the Tree of Life. The project will help evolutionary biologists understand the mechanisms of evolution, the relationships among evolution, structure, and function of biomolecules, and a host of other research problems in biology, eventually leading to major progress in ecology, pharmaceutics, forensics, and security.
The project will publicize evolution, genomics, and bioinformatics through informal education programs at museum partners of the collaborating institutions. It also will motivate high-school students and college undergraduates to pursue careers in bioinformatics. The project provides an extraordinary opportunity to train students, both undergraduate and graduate, as well as postdoctoral researchers, in one of the most exciting interdisciplinary areas in science. The collaborating institutions serve a large number of underrepresented groups and are committed to increasing their participation in research.
|
0.955 |
2003 — 2009 |
Kim, Junhyong |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Atol: Collaborative Research: a Phylogenomic Toolbox For Assembling the Tree of Life @ University of Pennsylvania
The University of Pennsylvania, has been awarded a grant to develop new methods and software tools to help construct the genealogical "tree of life" of all biological species. The focus of these efforts will be on extracting information from the vast molecular sequence databases to build collections of smaller trees that can be assembled into larger, more comprehensive pictures of the tree of life. The scale of the data input is large: GenBank, for example, contains tens of millions of sequences sampled from over 100,000 species. Whereas extensive research has focused on the problem of building a tree from a single data set; relatively little is known about extracting these data sets en masse from sequence databases and then assembling a synthesis. The proposal is to study a set of novel computational problems that are as challenging as the basic tree building problem itself. These occur in three broad areas: (1) assessment of the potential information in sequence databases of various kinds; (2) optimal extraction of data from databases to bring the best information to bear on individual tree reconstruction; and (3) integration of these smaller trees into "supertrees" (larger trees assembled from smaller ones that share species in common), especially by identifying target sets of new sequences needed to construct optimal supertrees. Theoretical results will be evaluated by analysis of three diverse databases that pose a range of computational challenges (a subset of GenBank, SWISS-PROT, and the TIGR EGO database). This work will characterize the phylogenetic information content of these sequence sets, identify maximal sets of combinable sequence information, construct nonredundant partitions of the database to permit estimation of collections of trees, and assemble supertrees from these collections. The interdisciplinary team includes phylogenetic biologists and computer scientists with experience in phylogenetic theory, data analysis, and algorithm development and implementation. The project is also collaborating with three existing Tree-of-Life projects (each aimed at reconstructing particular portions of the tree) to provide tests of the sequence targeting algorithms.
|
1 |
2003 — 2005 |
Kim, Junhyong |
P20Activity Code Description: To support planning for new programs, expansion or modification of existing resources, and feasibility studies to explore various approaches to the development of interdisciplinary programs that offer potential solutions to problems of special significance to the mission of the NIH. These exploratory studies may lead to specialized or comprehensive centers. |
Comparative Approaches to Bio-Knowledge Discovery @ University of Pennsylvania
DESCRIPTION (provided by applicant): In this planning grant to establish a Program of Excellence in Biomedical Computing, the University of Pennsylvania and the Children's Hospital of Philadelphia propose to develop a new organization that will serve as a central conduit of biomedical computing research tying together the activities of three schools and six research institutes. The organization will consist of a scientific steering committee with internal and external members to oversee research activities, an oversight committee to provide institutional support, an executive committee to govern day-to-day activities, and an office of education to coordinate the training activities. The organizational structure will be generated under the umbrella of the Penn Genomics Institute and the Penn Center for Bioinformatics to leverage existing resources. Interdisciplinary research interactions will be promoted by funding 12 new seed grants (made possible by matching funds) focusing on comparative approaches to biomedical knowledge discovery. In the first year, four projects will be funded: (1) pattern discovery in comparative genomics; (2) computational phylogeny reconstruction; (3) comparative text mining for cancer research; and (4) comparative informatics approach to sickle-cell disease. In subsequent years, new projects will be added to the first four through an internal solicitation for proposals. The Scientific Steering Committee will review these proposals and four new projects will be funded in Years 2 and 3. Each year, existing projects will be reviewed and at the end of the planning grant, all projects will be reviewed for consolidation into a small number of high impact projects. New interactions between existing computational faculty and biomedical faculty will be encouraged by holding opportunity presentation retreats to introduce researchers from complementary fields to biological problems and computational methods. Faculty basic education seminars will be held monthly where basic concepts like "transcription" will be discussed in a highly interactive format. The existing core facility for bioinformatics will be augmented with additional high-performance computing hardware, support for teaching basic computational biology tools, and a facility to coordinate dissemination of software tools developed from this grant. An existing PhD level training program in biomedical computing will be supplemented to provide research experience for undergraduates and masters students.
|
1 |
2006 — 2011 |
Davidson, Susan (co-PI) [⬀] Tannen, Val [⬀] Kim, Junhyong Miller, Mark (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Core Database Technologies to Enable the Integration of Atol Information @ University of Pennsylvania
The AToL (Assembling the Tree of Life ) is a large-scale collaborative research effort sponsored by the National Science Foundation to reconstruct the evolutionary origins of all living things. Currently 31 projects involving 150+ PIs are underway generating novel data including studies of bacteria, microbial eukaryotes, vertebrates, flowering plants and many more. The data being generated by these projects include and are not limited to: (i) Specimens and their provenance including collection information, voucher deposition, etc.; (ii) Phenotypic descriptions and their provenance; (iii) Genotypic descriptions and their provenance; (iv) Interpretation of the primary measurements including homology ; (v) Estimates of phylogenies and methods employed; and (vi) Post-tree analyses such as character evolution hypotheses. While the data collection, storage, and dissemination within each projects are well coordinated, there is a critical need to develop the infrastructure to integrate all ATOL data sources, allowing the individual efforts to become multipliers for global hypotheses. Furthermore, as the projects continue to expand and address diverse corners of the Tree of Life, efficient project management will be greatly aided by workflow and data management tools targeted towards the ATOL problem domain. The project will develop new, compact, abstract data models for phylogenetics, leveraging use cases from a broad survey of empirical projects. The integration system will develop novel mappings between different phylogenetic data domains, and allow individual projects to join a network of integrated databases in an incremental manner. The data provenance system, which allows tracking of how each data object was created, will be unique to systematics data management. The provenance system will not only allow tracking of what kinds of decisions were made in producing a particular tree or a particular column of a data matrix, but will also allow tracking of alternative data lineages such that, for example, different opinions on character homology might be tracked. The results of the research will be delivered in robust software tools that can be used by the entire evolutionary biology community. The study will develop a community-based formal model of data objects used in systematics, primarily through a continuing set of workshops. This activity will not only develop new data management tools, but will also have the effect of synthesizing disparate views of the phylogenetics research domains. The results of the system will be extensible to other domains of evolutionary biology, thereby contributing to the broader mission of evolutionary synthesis. The project will also provide training for the general systematics community in latest database technologies. Finally, by leveraging existing outreach efforts at the Penn Center for Bioinformatics, the project will link to other biological database efforts in genomics and biomedical sciences, disseminating phylogenetic information to the broad biomedical research community.
|
1 |
2009 — 2012 |
Eberwine, James H (co-PI) [⬀] Kim, Junhyong |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Single-Cell Comparative Genomics of the Neuron @ University of Pennsylvania
DESCRIPTION (provided by applicant): Mammalian behavior spans a fantastic range of function and ability, from complex linguistic processing, to social and sexual behavior, to simple stimulus-response. Traditional explanations of the mechanisms for this diversity include brain size, neuro- anatomy, and functional neuro-anatomy including connectivity patterns. Establishing these neuro-anatomical differences requires evolutionary differences in the genes guiding developmental processes. However, there has been little comparative studies focused on individual neuronal function in a non-developmental context. Previously, we initiated a project to understand what sequence motifs govern sub-cellular localization of mRNA to dendrites in rat neurons. Surprisingly, we found evidence that an evolutionarily novel element may partly govern dendritic localization. Furthermore, this element is abundant in the rat genome but an order of magnitude less abundant in the mouse genome. A micro-dissection and expression array survey of the mouse neurons seem to suggest that there is only 36% overlap between the homologous mRNA found in the mouse dendrites and the rat dendrites. Thus, we hypothesize that the genome-scale molecular physiology of neurons from different tissues and closely related species have broad differences and functional non-coding RNA derived from evolutionarily novel elements plays a role in establishing these differences. If true, this would have important consequences for translating animal neurobiological studies to humans and also suggest that evolutionarily novel elements such as retroviral-derived elements may be important in brain function and dysfunction. We propose to test our hypothesis using comparative single-cell localization assays, single-cell transcriptome assays, whole-transcriptome sequencing, and functional analysis. PUBLIC HEALTH RELEVANCE: In this project, we hypothesize that the genome-scale molecular physiology of neurons from different tissues and closely related species have broad differences and functional non-coding RNA derived from evolutionarily novel elements plays a role in establishing these differences. We propose to test our hypothesis using comparative single-cell localization assays, single-cell transcriptome assays, whole-transcriptome sequencing, and functional analysis. The results of our investigation will have important consequences for translating animal model neurobiological studies to humans diseases and also suggest that viral-derived elements may be important in brain function and neurodegenerative diseases.
|
1 |
2012 — 2016 |
Eberwine, James H [⬀] Kim, Junhyong |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Role of Single Cell Mrna Variation in Systems Associated Electrically Excitable C @ University of Pennsylvania
DESCRIPTION (provided by applicant): The goal of this U01 is to characterize and understand the variability in the expressed transcriptome of human excitable cells. There are two predominant types of excitable cell in the human body, neurons and muscle cells, including cardiac cells. Many human CNS diseases result from modulation of the electrical responsiveness of neurons while cardiac arrhythmias account for most of heart associated deaths. However, at the level of individual cells there is considerable heterogeneity in function, response, and dysfunction. Here, we present preliminary data showing large-scale single cell variability that is difficult to explain as simple molecular noise. We hypothesize that there is a many-to-one relationship between transcriptome states and a cell's phenotype. In this relationship the functional molecular ratios of the RNA are determined by the cell systems' stoichiometric constraints, which underdetermine the transcriptome state. Because a broad set of multi-genic combinations support a particular phenotype, changes in the transcriptome state do not necessarily lead to changes in the phenotype potentially explaining cellular heterogeneity in phenotype response to variant conditions such as the application of therapeutic molecules. To test this hypothesis we propose to investigate the extent of single cell variation for the whole transcriptome for excitable cells that are in their natural environment using a novel mRNA capture methodology (TIVA-tag), and on a subset of the transcriptome, the mRNAs encoding the therapeutically important and manipulable G protein-coupled receptor (GPRC) pathways. The use of functional genomics techniques developed in the Eberwine and Kim labs (TIPeR) will permit an assessment of the biological role of multigenic transcriptome variation. These studies are truly interdisciplinary involving the collaboration of two clinicians (Drs. Grady, Neurosurgeon and Kuhn, Cardiologist), two genomicists (Drs. Eberwine and Kim) one of whom is a computational scientist (Dr. Kim), a neuro/cardio- pharmacologist (Dr. Bartfai) and a biophotonics expert (Dr. Sul).!
|
1 |
2013 |
Kim, Junhyong |
S10Activity Code Description: To make available to institutions with a high concentration of NIH extramural research awards, research instruments which will be used on a shared basis. |
High Performance Ibm Idataplex/Sonas Computing Cluster For Genomics @ University of Pennsylvania
DESCRIPTION (provided by applicant): Recent dramatic increases in capacity and reduction of costs for Next Generation Sequencing (NGS) is generating revolutionary new information for biomedical sciences. New information from NGS is providing new insights into genomic aberrations, factors underlying complex diseases, and uncovering new non-coding RNAs, as well as novel RNA modifications associated with cell function. However, the NGS data require intensive computing even to put the data in usable form; and, modeling and analysis of these data require even more computing. Simply put, without matching computational resources the NGS machines are useless. The University of Pennsylvania has six installed NGS machines with additional four machines planned, which are being used by a large community of NIH-sponsored investigators. As the costs continue to come down the user community will only increase. Processing the raw data from 10 NGS machines require up to 14.4 million CPU hours of computing per year. Existing computational instruments can only meet 20-30% of this demand. The only other possible source, external commercial cloud computing, has significantly higher costs, data security risks, and still require additional infrastructure to store the resultig data. To meet the critical demand of NGS technologies, we propose to purchase and operate a dedicated high-performance computation instrument. The proposed instrument from IBM (IDataPlex/SONAS) will have 1,440 computing cores and an expandable multi-tier storage with a total capacity of 1.9 petabytes. This instrument features efficient power and cooling, which is critical for extremely large scale computing, and a modular storage system that can be fine-tuned for NGS performance and cost-effectively enlarged using a balance of hard drives and tapes. Unlike most other specialized equipment, this high-performance computing instrument for biomedical data will impact the research of hundreds of investigators, postdoctoral fellows, and graduate trainees. The instrument will remove a significant bottleneck to utilizing NGS technology; potentially alleviate more than $1 million of computing costs per year for NIH-sponsored investigators; and make prior institutional and NIH investment more efficient and useful.
|
1 |
2014 — 2016 |
Eberwine, James H [⬀] Kim, Junhyong |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Advanced Techniques For Single Cell Transcriptomics @ University of Pennsylvania
DESCRIPTION (provided by applicant): The Penn Genome Frontiers Institute (PGFI) will offer an Advanced Techniques in Single Cell Transcriptomics five-day course in 2014, 2015 and 2016 that would be open to researchers (PIs, Postdocs, and Graduate Students) nationally. Course participants will be trained to successfully perform the entire process of quantifying RNA from individual cells from multi- cellular organisms. The base set of techniques that will be taught include single cell isolation, single cell RNA isolation, RNA amplification by aRNA and PCR, NextGen seq library construction, sequencing and data analysis. Lectures and discussions on the broader context of single cell transcriptomics within single cell analysis and on the most recent technological developments will enhance the hands-on bench training. The final session of the course will cover RNAseq data analysis and interpretation. Course materials, including basic curriculum, protocols and datasets, will be available on PGFI-hosted, publically accessible web pages. This Single Cell Transcriptomics course will provide a research training opportunity for which we expect there to be broad interest and applicability among researchers working in gene expression and functional genomics in a diverse range of fundamental and biomedical disciplines. Researchers constrained by access to very small cell populations or directly interested in cell heterogeneity (e.g., characterization of transcriptase's development and normal states, cancer and other disease-state genomics) and its consequences would gain valuable tools from this workshop. Dissemination of tools for single cell analysis is critical for enabling this promising area to expand and deliver on its potential.
|
1 |
2015 — 2019 |
Bucan, Maja (co-PI) [⬀] Kim, Junhyong |
T32Activity Code Description: To enable institutions to make National Research Service Awards to individuals selected by them for predoctoral and postdoctoral research training in specified shortage areas. |
Training Grant in Computational Biology @ University of Pennsylvania
? DESCRIPTION (provided by applicant): The goal of the Penn Computational Genomics Training Grant (PENN CG-TG) Program is to train the next generation of quantitative genomic scientists who will develop new algorithms and quantitative models to address biomedical problems using genomic technologies. Recent developments in genomics of dynamic functional data as well as the $1,000 genome next-generation sequencing are accelerating the need for well-trained computational genomicists. Penn has trained computational genomicists since 1994 and created an independent PhD degree program since 2001. Leveraging extensive experience and resources of Penn's research and training programs, PENN CG-TG will train 8 pre-doctoral students in year 3 and 4 of their PhD program, supporting their training with core courses, seminars, mentoring, and symposiums. Students will learn foundational knowledge to understand algorithms and modeling at a deep level learn biology background to generate computational models of key biomedical problems, learn to communicate and disseminate quantitative material, and understand the importance of provenance and integrity in large-scale data analysis.
|
1 |
2015 — 2016 |
Ives, Zachary [⬀] Kim, Junhyong |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Approximating and Reasoning About Data Provenance @ University of Pennsylvania
? DESCRIPTION (provided by applicant): In many Big Data applications today, such as Next-Generation Sequencing, data processing pipelines are highly complex, span multiple institutions, and include many human and computational steps. The pipelines evolve over time and vary across institutions, so it is difficult to track and reason about the processing pipelines to ensure consistency and correctness of results. Provenance-enabled scientific workflow systems promise to aid here - yet such workflow systems are often avoided due to perceptions of inflexibility, lack of good provenance analytics tools, and emphasis on supporting the data consumer rather than producer. We propose to better incentivize the adoption of workflow and other provenance tracking tools: (1) Instead of requiring a single workflow system across the entire pipeline, which can be inflexible, we allow for integration across multiple autonomous systems (provenance- enabled workflow systems, provenance tracking systems for languages like Python and R, etc.), and even across steps performed without any provenance tracking at all. (2) We develop provenance reasoning capabilities specifically useful to the data provider, such as provenance analytics across time, sites, and users; finding the code modules that best explain why two results are different; regression testing to determine whether a code change would affect prior results; and reconstructing missing provenance for steps that were not captured. These capabilities are expected to lead to wider tracking of data provenance, and ultimately to more consistent, reproducible, and reliable science. We will validate this hypothesis through the evaluation of our technologies within a Next-Generation Sequencing pipeline run by one of the PIs with collaborators at other institutions.
|
1 |
2016 — 2017 |
Choi, Yongwon (co-PI) [⬀] Kim, Junhyong Lee, Daeyeon [⬀] |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Identifying Rare Subtypes of Cd8 T-Cells Using Single Cell Reactors @ University of Pennsylvania
? DESCRIPTION (provided by applicant): Understanding the molecular characteristics of immune cells, such as memory cells, is critical for understanding adaptive immune response. In particular, gaps remain concerning our understanding of whether metabolic changes can drive T cell differentiation (and if so, how), or rather, if these changes are simply by-products of responses to external factors (e.g., cytokines) encountered during the course of differentiation. Further, it remains unclear whether there exists a direct causal relationship between modulation of the fatty acid metabolizing machinery and the critical cell fate decision(s) dictating the transition from an effector to a memory cell, and whether metabolic states in individual cells can be correlated with larger population dynamics. We seek to identify a panel of molecular markers will allow future functional testing and identification of fate decision pathways for this criticaly important cell type. To overcome the limitations of single cell molecular profiling and standard metabolic assays, we will develop a novel microfluidics device that will enable rapid single cell metabolic profiling and sorting coupled to single cell transcriptome profiling. Utilizing this devie, we will isolate rare subpopulations of CD8 T cells by their metabolic signatures and identify molecular markers for each subpopulation. In Aim 1, we will develop a microfluidics system for rapid single cell metabolic profiling and sorting. Specifically, a high-speed fluorocarbon (FC) oil droplet cell encapsulating microfluidics device coupled to metabolic functional dye probes and high-content detector sorting system will be developed. The device will be optimized for speed, capture efficiency, cell viability, high-speed detection, and sorting. In Aim 2, we will identify molecular markers in sub-population of CD8 T cells during response program using single cell RNASeq. To achieve this aim, we will couple the microfluidics device with select functional dyes to characterize individual cell's metabolic states. The profiles of the metabolic readouts will be analyzed by computational methods to identify distinct subpopulations. These metabolic signatures will be used to gate the cells by the microfluidics device in a sorting mode and the resulting subpopulations of cells will be analyzed by single cell RNASeq to identify molecular markers for each subtype. The results of this study will lead to future studies on the role of identified molecules in T cell response dynamics. Understanding the molecular basis of T cell functional subtypes will not only enhance understanding of our body's response to diseases, it will lead to translational applications in areas such as cancer immune-therapy.
|
1 |
2017 |
Ives, Zachary [⬀] Kim, Junhyong |
U01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Approximating and Reasoning About Data Provece @ University of Pennsylvania
? DESCRIPTION (provided by applicant): In many Big Data applications today, such as Next-Generation Sequencing, data processing pipelines are highly complex, span multiple institutions, and include many human and computational steps. The pipelines evolve over time and vary across institutions, so it is difficult to track and reason about the processing pipelines to ensure consistency and correctness of results. Provenance-enabled scientific workflow systems promise to aid here - yet such workflow systems are often avoided due to perceptions of inflexibility, lack of good provenance analytics tools, and emphasis on supporting the data consumer rather than producer. We propose to better incentivize the adoption of workflow and other provenance tracking tools: (1) Instead of requiring a single workflow system across the entire pipeline, which can be inflexible, we allow for integration across multiple autonomous systems (provenance- enabled workflow systems, provenance tracking systems for languages like Python and R, etc.), and even across steps performed without any provenance tracking at all. (2) We develop provenance reasoning capabilities specifically useful to the data provider, such as provenance analytics across time, sites, and users; finding the code modules that best explain why two results are different; regression testing to determine whether a code change would affect prior results; and reconstructing missing provenance for steps that were not captured. These capabilities are expected to lead to wider tracking of data provenance, and ultimately to more consistent, reproducible, and reliable science. We will validate this hypothesis through the evaluation of our technologies within a Next-Generation Sequencing pipeline run by one of the PIs with collaborators at other institutions.
|
1 |
2018 — 2021 |
Bucan, Maja [⬀] Kim, Junhyong |
R25Activity Code Description: For support to develop and/or implement a program as it relates to a category in one or more of the areas of education, information, training, technical assistance, coordination, or evaluation. |
Diversity Action Plan At the University of Pennsylvania (Penn) Genomics Program (Dappg) @ University of Pennsylvania
SUMMARY The objective of the Diversity Action Plan at the University of Pennsylvania (Penn) Genomic Program (DAPPG) is to guide undergraduate and recent college graduates from underrepresented (UR) groups into graduate school to pursue genomics research. The Diversity Action Plan will be developed and implemented in partnership with NHGRI T32 training programs in Computational Genomics, Genomic Medicine, and the Ethical, legal and social implications of genetics and genomics as well as a research program in sub-cellular genomics. This program especially emphasizes vertical integration of regional training, recognizing the geographic mobility restrictions common to many socio-economically disadvantaged UR groups. At least 10 UR scholars (7-10 summer students and 3 post-baccalaureate students), from the Greater Philadelphia area along with additional candidates from across the country, will be identified each year who are interested in genomics and genomic medicine research but lack the experience or expertise necessary for graduate or medical school training. Each scholar will be matched with a research mentor among the graduate training faculty as well as PhD/Postdoctoral level mentors from the T32 programs and provided with a significant one to two-year research project. An Individual Student Development Plan (IDP) is developed for each scholar to ensure that the scholar undertakes training appropriate to his or her own scientific needs and interests, including graduate or advanced undergraduate coursework. Scholars will also develop the skills necessary for success in graduate school by completing workshops in genomics and computational biology, grant writing, the responsible conduct of research, critical analysis of scientific literature, and oral and written presentation skills. Scholars will also participate in a one-on-one writing workshop with a professional writing instructor. Scholars will meet weekly as a group to discuss scientific journal articles and their own research, in order to further develop their skills in the critical evaluation of research and the delivery of scientific presentations, as well as to increase their exposure to research outside their own research groups. In addition, scholars will receive advising for the graduate school application process, including selecting programs, writing application essays, and practicing interviews. If necessary, scholars will take a GRE or MCAT preparation course. Scholars will also participate in various lunches and seminars with Penn faculty, postdoctoral fellows and graduate students who share their training experiences, on-going research, and academic paths. The ultimate measure of the program's success is the percentage of scholars who are admitted to PhD or MD-PhD training programs and pursue a research career. !
|
1 |
2018 — 2021 |
Eberwine, James H. (co-PI) [⬀] Kim, Junhyong |
RM1Activity Code Description: To support a large-scale research project with a complex structure that cannot be appropriately categorized into an available single component activity code. The performance period may extend up to seven years but only through the established deviation request process. ICs desiring to use this activity code for programs greater than 5 years must receive OPERA prior approval through the deviation request process. |
Center For Sub-Cellular Genomics @ University of Pennsylvania
A cell is a highly complex system with distributed molecular physiologies in structured sub- cellular compartments whose interplay with the nuclear genome determine the functional characteristics of the cell. A classic example of distributed genomic processes is found in neurons. Learning and memory requires modulation of individual synapses through RNA localization, localized translation, and localized metabolites such as those from dendritic mitochondria. Dendrites of neurons integrate distributed synaptic signals into both electrical and nuclear transcriptional response. Dysfunction of these distributed genomic functions in neurons can result in a broad spectrum of neuropsychiatric diseases such as bipolar and depressive disorders, autism, among others. Understanding complex genomic interactions within a single cell requires new technologies: we need nano-scale ability to make genome-wide measurements at highly localized compartments and to effect highly localized functional genomic manipulations, especially in live tissues. To address this need, we propose to establish a Center for Sub-Cellular Genomics using neurons as model systems. The center will develop new optical and nanotechnology approaches to isolate sub-cellular scale components for genomic, metabolomics, and lipidomic analyses. The center will also develop new mass spectrometry methods, molecular biology methods, and informatics models to create a platform technology for sub-cellular genomics.
|
1 |
2020 — 2021 |
Humphreys, Benjamin D. Kim, Junhyong Mcmahon, Andrew P. (co-PI) [⬀] |
UC2Activity Code Description: To support high impact ideas through cooperative agreements that that may lay the foundation for new fields of investigation; accelerate breakthroughs; stimulate early and applied research on cutting-edge technologies; foster new approaches to improve the interactions among multi- and interdisciplinary research teams; or, advance the research enterprise in a way that could stimulate future growth and investments and advance public health and health care delivery. This activity code could support either a specific research question or propose the creation of a unique infrastructure/resource designed to accelerate scientific progress in the future. This is the cooperative agreement companion to the RC2. |
Single-Cell Analysis to Promote Kidney Repair
Summary Acute kidney injury (AKI) has a wide spectrum of outcomes from recovery to a long-term transition to chronic kidney disease (CKD). Between 2000 and 2014, AKI hospitalizations have increased from 3.5 to 11.7 per 1000 persons. Medicare patients aged 66 years and older hospitalized for AKI have a 35% cumulative probability of a recurrent AKI hospitalization within one year and 28% will be diagnosed with CKD in the same time frame. Men have a higher risk of AKI, and of developing progressive CKD, although the mechanisms are poorly understood. In the mouse, males also show a heightened vulnerability to AKI. Recent single cell RNA-seq studies from the McMahon and Kim groups have highlighted marked differences in gene expression between the sexes in proximal tubule segments, the region of the nephron most susceptible to AKI. Preliminary studies in the Humphreys and McMahon laboratories using single nuclear sequencing identified a cell type resulting from failed repair of proximal tubule cells (FR-PTC) following mild to severe AKI with a pro-inflammatory, pro- fibrotic signature. FR-PTCs are hypothesized to drive progressive kidney disease following AKI. This proposal centers on the postulates that an understanding of sex differences in response to AKI, and the application of genetic approaches to target proinflammatory properties of FR-PTCs and to eliminate FR-PTCs following renal repair, will be effective routes to ultimately benefit patient outcomes post AKI. To this end, we have assembled a complementary team, with prior collaborative experience: Humphreys (Washington University), Kim (University of Pennsylvannia) and McMahon (University of Southern California). All team members have participated in the ReBuilding a Kidney Consortium. In Specific Aim 1: we will characterize successful versus failed proximal tubule repair with single nucleus transcriptomics (snRNA-seq) and single nuclear chromatin accessibility studies (scATAC-seq) in male and female mouse models examining key findings in human kidney biopsies. In Specific Aim 2: we will harmonize multimodal datasets generated in Specific Aim1 to facilitate viewing and interrogation of these data by the broad research community. Mining of these data by the group will focus on defining the regulatory logic of repair strategies and outcomes in the male and female kidney. In Specific Aim 3: we will examine the hypothesis that adverse outcomes in the male kidney following AKI are driven by NF-kB pathway components Nfkb1 and TNIK in FR-PTCs, genetically eliminating the action of these genes. We will generate and validate a new transgenic mouse resource for the community, enabling genetic modification and elimination of FR-PTCs. We will determine whether FR-PTC removal has a favorable outcome, as we predict, on progressive kidney disease following AKI.
|
0.948 |
2020 — 2021 |
Bucan, Maja (co-PI) [⬀] Kim, Junhyong |
T32Activity Code Description: To enable institutions to make National Research Service Awards to individuals selected by them for predoctoral and postdoctoral research training in specified shortage areas. |
Training Program in Computational Genomics @ University of Pennsylvania
Project Summary University of Pennsylvania has trained computational genomicists for the past 20 years supported by the NHGRI T32 program, training 52 predoctoral and 13 postdoctoral trainees the majority of whom have gone on to careers in research and development. Here we propose to continue our Computational Genomic Training program with eight predoctoral and two postdoctoral trainees focused on the theme of Data Science and Machine/Statistical Learning methods as applied to genomics data. Our training program concentrates on a rigorous course-based curriculum supported by courses in multiple graduate groups. In addition, our training also involves 13-16 hours of Responsible Conduct of Research (RCR) and Scientific Rigor and Reproducibility (SRR) training, Individual Development Plan, and utilization of Electronic Notebooks and code repositories. Research training is enhanced by a dual mentorship model whenever possible. Our program is supported by a greater genomics training program consisting of three NHGRI T32 programs in Computational Genomics (this program), Genomic Medicine, and ELSI, as well as a NHGRI R25 Diversity Action Plan (DAP). In particular, the DAP program recruits URM undergraduate and postbacc trainees focused on the Greater Philadelphia Area whose many institutions serve the urban URM population. This vertical regional integration will allow us to develop a regional pipeline of URM students with undergraduate research experience in genomics. Our program consists of 31 trainers of which 14 are female scientists and 2 are URM scientists. Twenty one of our 31 trainers have an active computational genomics research program. The expertise of the trainers span disease genomics, genomic technologies, multidimensional statistics, algorithms, data sciences, and machine learning. Our training environment is enhanced by key facilities including large biobanks, high-throughput genomics core, high-performance computing core, and a unique immersive data visualization facility. Penn overall hosts more than 60 NIH training programs with strong institutional administrative support for managing the training programs including an Office of Biomedical Postdoctoral Programs, Office of Diversity and Inclusion, combined Biomedical Graduate Studies, among others. Success of our training program will help train the next generation of genomic workforce in the skills and knowledge necessary to apply state-of-art computational techniques to genomics and develop new techniques for novel genomic data.
|
1 |
2021 |
Kim, Junhyong |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Penn Tmc: Coordination Core @ University of Pennsylvania
The Coordination Core?s goal is to manage and coordinate all activities of the Penn Center for Multi- Scale Molecular Mapping of the Female Reproductive System. The Center?s goal is to create a multi- modal, multi-scale molecular characterization of ~700 different samples of the human female reproductive system. We will coordinate the activities of the Penn Center for Multi-Scale Molecular Mapping of the Female Reproductive System. We will create a management structure with a leadership group of the MPIs and a Project Manager and four Functional Groups, each led by an expert investigator. We will create a single workstream with the four functional groups: (1) Clinical Sampling Group; (2) 3D Modeling Group; (3) Molecular Assay Group; and (4) Data Coordination Group. We will use electronic work systems for communication and project management. Detailed monthly production goals will be established and managed by the MPIs and the project manager. In addition, we will coordinate activities of the Penn Center with the HuBMAP consortium groups. The Coordination Core will manage all interactions with the HuBMAP consortium, NIH Program Staff, and other outside entities. We will participate in all meetings, conference calls, and coordination events. We will work with other TMCs to collaborate in establishing robust protocols and adopting the assays deployed by other centers. We will coordinate with the HIVE groups to deposit and disseminate the data, as well as to develop a common 3D anatomical model and coordinate framework for the female reproductive system. The activities of the Coordination Core will ensure efficient operations of the Penn Center and help meet the milestones of our public resource generation project.
|
1 |
2021 |
Kim, Junhyong O'neill, Kathleen Elise (co-PI) [⬀] |
U54Activity Code Description: To support any part of the full range of research and development from very basic to clinical; may involve ancillary supportive activities such as protracted patient care necessary to the primary research or R&D effort. The spectrum of activities comprises a multidisciplinary attack on a specific disease entity or biomedical problem area. These differ from program project in that they are usually developed in response to an announcement of the programmatic needs of an Institute or Division and subsequently receive continuous attention from its staff. Centers may also serve as regional or national resources for special research purposes, with funding component staff helping to identify appropriate priority needs. |
Penn Center For Multi-Scale Molecular Mapping of the Female Reproductive System @ University of Pennsylvania
The female reproductive system, the uterus, fallopian tubes and the ovaries, is a complex interrelated set of organs that is physiologically dynamic and not only important for fertility but critically interrelated with general health. Single cell studies of the human female reproductive system and related tissues have been previously studied, but, as of yet, a comprehensive program, aligned with the goals of the HuBMAP, to define a molecular map of the entire system, integrating multi-modal assays, spatial diversity, and individual variations has not been established. The Penn Department of Obstetrics and Gynecology performs approximately 3,500 surgical procedures annually, of which many procedures allow sampling of multiple organs and locations from the same subject under normal conditions. Here, we propose to leverage the sampling opportunities afforded by the Penn ObGyn group and the single cell biology expertise of Penn investigators to establish a Penn Center for Multi-scale Molecular Map of the Female Reproductive System. We will obtain a comprehensive molecular characterization of the female reproductive system using six different molecular assays for at least ~700 tissue samples in anatomically indexed samples, creating a key resource for both basic science and women's health. The molecular assays include single cell RNAseq, clampFISH spatial transcriptomics, simultaneous single cell open chromatin and RNA assays, and spatial open chromatin assay, among others. We will also generate a 3D anatomical model to provide spatial coordinate for our molecular characterization. All assay data will be registered to our 3D anatomical map that will be integrated with the HIVE Common Coordinate Framework. All metadata from subject records, clinical procedures, molecular procedures, and informatics pipelines will be collected, curated, and deposited as structured data. All data, including an extensive set of metadata, will be made available as a public resource. The completion of this resource will impact reproductive medicine for women's health and also inform basic biology of human cell communities.
|
1 |