2006 — 2009 |
Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Spatial Point Pattern Analysis Using Composite Likelihood
The proposed research introduces a new likelihood based method in fitting spatial point process models using the idea of composite likelihood. Composite likelihood (CL) has been successfully applied in numerous settings where a full maximum likelihood is not feasible or is not available. Its use in spatial point process modeling, however, has never been studied. This research intends to show that a CL can be formed for any spatial point process whose second-order intensity function can be explicitly defined. The proposed likelihood is easy to obtain and can be used in many different spatial point pattern analyses. In particular, it can be used to fit both homogeneous and inhomogeneous spatial point process models to data. Furthermore, it can also be used to select the bandwidth used in estimating the pair correlation function, which is an extremely exploratory and model fitting tool in spatial point pattern analysis.
This research is motivated by the red oak borer data that were collected by entomologists at the University of Arkansas. The data consist of mapped locations of attack holes caused by larvae of red oak borers when they eat their way into the trees. The main objective is to understand what affects adult red oak borers to decide where to lay their eggs. The proposed model fitting procedures will be applied to link the locations of attack holes to important tree characteristics such as side and height of the tree. This work will have both important biological and practical significance. In particular, the results will provide biological insight on the breeding habit of the adult borers. This biological insight can in turn be used in practice to guide the development of more effective trapping techniques that can be used to control the population of the red oak borers. Thus the proposed research may play a critical role in the effort to save millions of red oak trees in the US that are being threatened by the outbreaks of red oak borers.
|
0.972 |
2009 — 2014 |
Zhang, Heping Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: New Statistical Methods For Massive Spatial, Temporal and Spatial-Temporal Processes
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
Dimension reduction plays an essential role in reducing the complexity of data so that the most useful information in data can be successfully extracted. Most existing dimension reduction methods are developed under the assumption that the data are independent. Consequently, they may be inefficient and sometimes even inappropriate for analyzing spatial/temporal data which are often naturally correlated. The proposed research intends to fill in this gap by developing inverse regression based dimension reduction methods for data arising from three different types of spatial/temporal processes: spatial point processes, recurrent event processes and quantitative spatial processes. Specific goals of the project include 1) developing general frameworks and methods for conducting dimension reduction for both univariate and multivariate spatial point processes and 2) generalizing these methods to the cases of recurrent event processes and quantitative spatial processes. Special attentions will be given when the dimension of the response is also high. In addition, the PI will also develop computationally efficient analytical tools such as second-order analysis for the modeling of massive recurrent event process data.
With the fast development of modern data collection technologies, especially with the increased availability of more accurate Global Positioning System and Geographical Information System, large-scale spatial, temporal and spatial-temporal data have become rapidly available in recent years. Many of these data are massive and highly complex in nature, posing unprecedented challenges to data analysis. The proposed research will develop efficient statistical tools that can be used to analyze such data. The PI will collaborate closely with field scientists from various disciplines to apply these tools to solve real-life problems that have motivated this research. Specific goals of these collaborations includes, but are not limited to, 1) improving the understanding of tropical forestry diversity, 2) better assessing the health effects of air pollution on asthmatic children and 3) providing more accurate spatial predictions of US watershed characteristics such as discharges and fluxes. Key educational components of the project include providing interdisciplinary statistical trainings to students especially minority students at both the graduate and undergraduate levels and helping three local high schools improve their AP Statistics teaching.
|
0.97 |
2011 — 2013 |
Guan, Yongtao |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Statistical Methods For Understanding Heterogeneity in Cocaine Relapse @ University of Miami Coral Gables
DESCRIPTION (provided by applicant): The main objective of this project is to develop novel statistical methods for the analysis of cocaine relapse data. A particular interest is to understand the causes for the heterogeneity in cocaine relapse. Conventionally summary measures like percent of days abstinent and time to first relapse have been frequently used to characterize one's cocaine relapse pattern. However, these commonly used measures fail to utilize information that could be important to further distinguish the behaviors between subjects who share similar values of these measures. Moreover, they cannot describe the temporal trend of the data and thus prohibit the study of the relationship between relapse and time-varying variables such as craving/stress. The specific aims of this project are to: 1) derive informative measures for one's cocaine use behavior and use such measures to subtype one's cocaine relapse pattern;2) link one's cocaine relapse pattern to pretreatment variables including variables characterizing one's baseline cocaine use pattern and demographic variables like gender and age;3) understand the dynamic relationship between cocaine relapse and time-varying craving/stress levels after treatment;4) implement the developed statistical methods in user-friendly computer programs and make them available to the scientific com- munity. To achieve these aims, we will develop cutting-edge statistical methods covering several areas including functional data analysis, measurement errors, recurrent event processes and joint modeling of longitudinal outcomes and recurrent event processes. Statistical properties of these methods will be thoroughly investigated through both rigorous theoretical derivations as well as extensive simulations. Specifically, the proposed subtyping methods will enhance our understanding on the causes for cocaine relapse. This can in turn potentially inform the design of new individually targeted relapse prevention and pharmacological strategies to improve outcomes associated with cocaine dependence. Modeling the data as recurrent event processes al- lows us to make full use of the information available in one's relapse pattern and thus provides a more powerful way to detect any treatment effect. The proposed method to jointly model longitudinal outcomes and recurrent event processes allows us to assess the dynamic relationship between relapse and temporally varying variables such as craving/stress. A greater understanding of this relationship could lead to more effective treatment for cocaine abuse. In terms of statistical innovation, we for the first time introduce the novel concept of conducting functional data analysis for the mean and correlation structures of a stochastic process simultaneously, develop model-free methods to account for the measurement error when some predictors are derived from a stochastic process, and propose computationally efficient algorithms to jointly model longitudinal outcomes and recurrent event processes. PUBLIC HEALTH RELEVANCE: This project will develop novel statistical methods for the analysis of cocaine relapse data. These new methods can be used to help understand the causes for the heterogeneity in individual cocaine relapse patterns. This in turn can potentially inform the design of new individually targeted, more effective relapse prevention and pharmacological strategies to improve outcomes associated with cocaine dependence.
|
0.97 |
2013 — 2016 |
Guan, Yongtao Ma, Xiaomei (co-PI) [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
New Statistical Methods to Handle Spatial Uncertainty in Cancer Risk Estimation @ University of Miami Coral Gables
DESCRIPTION (provided by applicant): The primary goal of this project is to develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation. We consider two different types of spatial uncertainty, specifically, 1) coarsenin due to the practice of releasing location information at an area level but not the point level and 2) geocoding error resulting from the use of geographic information systems software to convert residential addresses to geographic coordinates (i.e. longitudes and latitudes). Cancer epidemiologists can extract data from many different sources such as census, statewide health surveys, tumor registries and population-based case-control studies, and each source may yield data with different types of spatial uncertainty. Analytic methods are usually adversely affected by the presence of spatial uncertainty, resulting in biased parameter estimates, inflated standard errors, and reduced statistical power to detect spatial clustering and trends. To address these challenges, we propose a set of highly versatile estimation procedures to account for the spatial uncertainty and to efficiently combine data obtained from multiple sources. These procedures are based upon established theories on estimating equations and as such they can be easily implemented in practice. Compared with existing methods, the proposed methods are novel because 1) they permit the inclusion of individual-level risk factors for subjects with spatially uncertain locations, 2) the proposed intensity model admits a flexible semiparametric form and hence removes potentially restrictive assumptions such as the population density being constant over small geographic areas, and 3) they explicitly account for spatial correlation in the disease locations in both parameter estimation and statistical inference. In the substantive applications, we propose to supplement population-based case-control data with tumor registry data, census data and statewide health survey data. To the best of our knowledge, such an approach would be the first in the field and unparalleled. We will implement our proposed methods in a free, user-friendly R package. Our package will provide much- needed tools for more objective investigations of cancer risk factors by accounting for spatial uncertainty in the event locations. It will allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.
|
0.964 |
2015 — 2018 |
Guan, Yongtao |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Incorporating Local Haplotype Sharing to Detect Genetic Associations @ Baylor College of Medicine
? DESCRIPTION (provided by applicant): This proposal aims to develop statistical models and computational methods to quantify the degree of local haplotype sharing between two individuals at an arbitrary marker, and to provide an in-depth understanding on how haplotypes affect disease phenotypes directly, or serve as genetic background for polymorphic sites to affect phenotypes differentially, and how haplotype background serves as a medium for rare variants to aggregate and affect phenotypes. An investigation into these problems will provide insights into etiology of complex traits and computational tools for disease association mapping, a strategic goal that NIH has invested heavily, and will shed light on the lingering puzzle of the missing heritability. Our haplotype method reinvents the haplotype association mapping to provide several benefits -- no phasing requirement, no sliding-window requirement, an ability to work directly with next-generation sequencing data, and enhanced interpretability of association findings. Because each SNP serves as a core SNP for its local haplotypes, our haplotype method has the same number of tests as the single SNP analysis. Detecting genetic associations accounting for haplotype backgrounds at each marker will shift paradigm for the large-scale genetic association studies. The single-marker test assumes that an allele has the same effect, independent of its haplotype background. Our fundamental assumption is that, depending on its local haplotype background, an allele can have a positive effect, zero effect, or a negative effect towards a phenotype (for ex- ample, due to local epistatic interactions). When all individuals share the same local haplotype background, our assumption reduces to the conventional assumption of homogeneous effect; when individuals have different local haplotype backgrounds, our assumption generates more power. For example, when an allele has a large effect when presenting on a particular haplotype background and zero effect otherwise, traditional analysis, which ignores the haplotype background, will fail to detect the association because the signal is diluted by individuals with other haplotype backgrounds. On the other hand, if correctly quantified, haplotype background can control and reduce the noise introduced by those individuals. Aggregating rare variants within an LD block makes the aggregation approach applicable to whole genome sequencing data. Current methods aggregate rare variants based on the gene annotation and are difficult to extend to whole genome sequencing data. Our method can quantify LD blocks, allowing for aggregation of rare variants in a LD block. This not only avoids arbitrariness in aggregating variants, but also contributes to interpret- ing associations. On the other hand, current methods aggregate rare variants ignoring the variants' haplotype background. This will inevitably lose power. An extreme example is analyzing sequencing data of the admixed samples, where ignoring the haplotype background is equivalent to not controlling for the local ancestry. Thus, we propose methods to aggregate the rare variants according to their haplotype background.
|
0.97 |
2018 — 2021 |
Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
New Development in Point Process Theory, Methods and Applications
With the rapid development of modern data collection technologies, high-resolution spatial, temporal, and spatial-temporal data have become available at an unprecedented speed in recent years. The complexity and magnitude of these new data call for new statistical modeling tools. The proposed research will develop new modeling tools that can be used to analyze complex and large data. Novel applications of the proposed methods will be considered to answer scientific questions arising in disciplines such as epidemiology, finance, and sociology.
This project will develop new theory and methods in point processes. In particular, the project will develop (1) more efficient estimation procedures based on quasi-likelihood to fit point process models and (2) a novel framework to conduct principle component analysis for marked point processes. For the first aim, efficient computational algorithms will be developed and theoretical properties of these algorithms will be investigated. For the second aim, the marks and points of the marked point process are linked through two potentially correlated latent processes, and principle component analysis is conducted for each of the two latent processes. Theoretical properties of the proposed method will be investigated. Data driven procedures will be developed to select the tuning parameters.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.972 |