2006 — 2009 |
Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Spatial Point Pattern Analysis Using Composite Likelihood
The proposed research introduces a new likelihood based method in fitting spatial point process models using the idea of composite likelihood. Composite likelihood (CL) has been successfully applied in numerous settings where a full maximum likelihood is not feasible or is not available. Its use in spatial point process modeling, however, has never been studied. This research intends to show that a CL can be formed for any spatial point process whose second-order intensity function can be explicitly defined. The proposed likelihood is easy to obtain and can be used in many different spatial point pattern analyses. In particular, it can be used to fit both homogeneous and inhomogeneous spatial point process models to data. Furthermore, it can also be used to select the bandwidth used in estimating the pair correlation function, which is an extremely exploratory and model fitting tool in spatial point pattern analysis.
This research is motivated by the red oak borer data that were collected by entomologists at the University of Arkansas. The data consist of mapped locations of attack holes caused by larvae of red oak borers when they eat their way into the trees. The main objective is to understand what affects adult red oak borers to decide where to lay their eggs. The proposed model fitting procedures will be applied to link the locations of attack holes to important tree characteristics such as side and height of the tree. This work will have both important biological and practical significance. In particular, the results will provide biological insight on the breeding habit of the adult borers. This biological insight can in turn be used in practice to guide the development of more effective trapping techniques that can be used to control the population of the red oak borers. Thus the proposed research may play a critical role in the effort to save millions of red oak trees in the US that are being threatened by the outbreaks of red oak borers.
|
0.972 |
2009 — 2014 |
Zhang, Heping Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: New Statistical Methods For Massive Spatial, Temporal and Spatial-Temporal Processes
This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).
Dimension reduction plays an essential role in reducing the complexity of data so that the most useful information in data can be successfully extracted. Most existing dimension reduction methods are developed under the assumption that the data are independent. Consequently, they may be inefficient and sometimes even inappropriate for analyzing spatial/temporal data which are often naturally correlated. The proposed research intends to fill in this gap by developing inverse regression based dimension reduction methods for data arising from three different types of spatial/temporal processes: spatial point processes, recurrent event processes and quantitative spatial processes. Specific goals of the project include 1) developing general frameworks and methods for conducting dimension reduction for both univariate and multivariate spatial point processes and 2) generalizing these methods to the cases of recurrent event processes and quantitative spatial processes. Special attentions will be given when the dimension of the response is also high. In addition, the PI will also develop computationally efficient analytical tools such as second-order analysis for the modeling of massive recurrent event process data.
With the fast development of modern data collection technologies, especially with the increased availability of more accurate Global Positioning System and Geographical Information System, large-scale spatial, temporal and spatial-temporal data have become rapidly available in recent years. Many of these data are massive and highly complex in nature, posing unprecedented challenges to data analysis. The proposed research will develop efficient statistical tools that can be used to analyze such data. The PI will collaborate closely with field scientists from various disciplines to apply these tools to solve real-life problems that have motivated this research. Specific goals of these collaborations includes, but are not limited to, 1) improving the understanding of tropical forestry diversity, 2) better assessing the health effects of air pollution on asthmatic children and 3) providing more accurate spatial predictions of US watershed characteristics such as discharges and fluxes. Key educational components of the project include providing interdisciplinary statistical trainings to students especially minority students at both the graduate and undergraduate levels and helping three local high schools improve their AP Statistics teaching.
|
0.97 |
2011 — 2013 |
Guan, Yongtao |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Statistical Methods For Understanding Heterogeneity in Cocaine Relapse @ University of Miami Coral Gables
DESCRIPTION (provided by applicant): The main objective of this project is to develop novel statistical methods for investigating heterogeneity in daily cocaine use trajectories among cocaine dependents. The specific aims of this project are to: 1) derive informative summary measures for daily cocaine use trajectories and use such summary measures to subtype these trajectories; 2) link posttreatment cocaine craving/stress and cocaine relapse to baseline variables while accounting for measurement error in the derived baseline summary measures; 3) understand the dynamic relationship between posttreatment cocaine craving/stress and cocaine relapse; 4) implement the developed statistical methods in a user-friendly computer package and make it available to the scientific community. To achieve these aims, we will develop cutting-edge statistical methods covering several areas including functional data analysis, measurement error and joint modeling of longitudinal and recurrent event data. Statistical properties of these methods will be thoroughly investigated through both rigorous theoretical derivation as well as extensive simulation. Specifically, the proposed functional data analysis techniques can help generate novel and more accurate subtypes for one's cocaine use patterns. The proposed methods accounting for measurement error can significantly reduce the potentially large bias in estimated regression coefficients due to measurement error and hence lead to more objective assessment of risk factors and treatment effect. The proposed statistical methods for modeling recurrent event processes allow us to make full use of the rich information contained in one's en- tire cocaine relapse trajectory, and thus can be more powerful in identifying risk factors as well as in detecting treatment effect. The proposed joint modeling of longitudinal and recurrent event data allows us to assess the relationship between cocaine relapse and time varying variables such as craving and stress. A greater understanding of this relationship can inspire the development of novel pharmacological strategies to treat cocaine dependence. In terms of statistical innovation, we introduce the novel concept of conducting functional data analysis for the mean and correlation structures of a stochastic process simultaneously, develop nonparametric methods to account for the measurement error when some predictors are derived from a stochastic process, and propose computationally efficient algorithms to jointly model longitudinal and recurrent event data. PUBLIC HEALTH RELEVANCE: This project will develop novel statistical methods for investigating heterogeneity in daily cocaine use trajectories among cocaine dependents. Such analyses can significantly enhance our understanding of the causes for cocaine relapse by revealing important risk factors. They can also help assess the efficacy of treatment approaches by providing a more fine-tuned analysis to cocaine relapse trajectories.
|
0.97 |
2013 — 2016 |
Guan, Yongtao Ma, Xiaomei (co-PI) [⬀] |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
New Statistical Methods to Handle Spatial Uncertainty in Cancer Risk Estimation @ University of Miami Coral Gables
DESCRIPTION (provided by applicant): The primary goal of this project is to develop novel statistical methods to handle spatial uncertainty in the event locations when conducting cancer risk estimation. We consider two different types of spatial uncertainty, specifically, 1) coarsenin due to the practice of releasing location information at an area level but not the point level and 2) geocoding error resulting from the use of geographic information systems software to convert residential addresses to geographic coordinates (i.e. longitudes and latitudes). Cancer epidemiologists can extract data from many different sources such as census, statewide health surveys, tumor registries and population-based case-control studies, and each source may yield data with different types of spatial uncertainty. Analytic methods are usually adversely affected by the presence of spatial uncertainty, resulting in biased parameter estimates, inflated standard errors, and reduced statistical power to detect spatial clustering and trends. To address these challenges, we propose a set of highly versatile estimation procedures to account for the spatial uncertainty and to efficiently combine data obtained from multiple sources. These procedures are based upon established theories on estimating equations and as such they can be easily implemented in practice. Compared with existing methods, the proposed methods are novel because 1) they permit the inclusion of individual-level risk factors for subjects with spatially uncertain locations, 2) the proposed intensity model admits a flexible semiparametric form and hence removes potentially restrictive assumptions such as the population density being constant over small geographic areas, and 3) they explicitly account for spatial correlation in the disease locations in both parameter estimation and statistical inference. In the substantive applications, we propose to supplement population-based case-control data with tumor registry data, census data and statewide health survey data. To the best of our knowledge, such an approach would be the first in the field and unparalleled. We will implement our proposed methods in a free, user-friendly R package. Our package will provide much- needed tools for more objective investigations of cancer risk factors by accounting for spatial uncertainty in the event locations. It will allow researchers to take advantage of the full spectrum of available data and use the data more effectively to reduce the burden of disease.
|
0.964 |
2015 — 2018 |
Guan, Yongtao |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Incorporating Local Haplotype Sharing to Detect Genetic Associations
? DESCRIPTION (provided by applicant): This proposal aims to develop statistical models and computational methods to quantify the degree of local haplotype sharing between two individuals at an arbitrary marker, and to provide an in-depth understanding on how haplotypes affect disease phenotypes directly, or serve as genetic background for polymorphic sites to affect phenotypes differentially, and how haplotype background serves as a medium for rare variants to aggregate and affect phenotypes. An investigation into these problems will provide insights into etiology of complex traits and computational tools for disease association mapping, a strategic goal that NIH has invested heavily, and will shed light on the lingering puzzle of the missing heritability. Our haplotype method reinvents the haplotype association mapping to provide several benefits -- no phasing requirement, no sliding-window requirement, an ability to work directly with next-generation sequencing data, and enhanced interpretability of association findings. Because each SNP serves as a core SNP for its local haplotypes, our haplotype method has the same number of tests as the single SNP analysis. Detecting genetic associations accounting for haplotype backgrounds at each marker will shift paradigm for the large-scale genetic association studies. The single-marker test assumes that an allele has the same effect, independent of its haplotype background. Our fundamental assumption is that, depending on its local haplotype background, an allele can have a positive effect, zero effect, or a negative effect towards a phenotype (for ex- ample, due to local epistatic interactions). When all individuals share the same local haplotype background, our assumption reduces to the conventional assumption of homogeneous effect; when individuals have different local haplotype backgrounds, our assumption generates more power. For example, when an allele has a large effect when presenting on a particular haplotype background and zero effect otherwise, traditional analysis, which ignores the haplotype background, will fail to detect the association because the signal is diluted by individuals with other haplotype backgrounds. On the other hand, if correctly quantified, haplotype background can control and reduce the noise introduced by those individuals. Aggregating rare variants within an LD block makes the aggregation approach applicable to whole genome sequencing data. Current methods aggregate rare variants based on the gene annotation and are difficult to extend to whole genome sequencing data. Our method can quantify LD blocks, allowing for aggregation of rare variants in a LD block. This not only avoids arbitrariness in aggregating variants, but also contributes to interpret- ing associations. On the other hand, current methods aggregate rare variants ignoring the variants' haplotype background. This will inevitably lose power. An extreme example is analyzing sequencing data of the admixed samples, where ignoring the haplotype background is equivalent to not controlling for the local ancestry. Thus, we propose methods to aggregate the rare variants according to their haplotype background.
|
0.97 |
2018 — 2021 |
Guan, Yongtao |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
New Development in Point Process Theory, Methods and Applications
With the rapid development of modern data collection technologies, high-resolution spatial, temporal, and spatial-temporal data have become available at an unprecedented speed in recent years. The complexity and magnitude of these new data call for new statistical modeling tools. The proposed research will develop new modeling tools that can be used to analyze complex and large data. Novel applications of the proposed methods will be considered to answer scientific questions arising in disciplines such as epidemiology, finance, and sociology.
This project will develop new theory and methods in point processes. In particular, the project will develop (1) more efficient estimation procedures based on quasi-likelihood to fit point process models and (2) a novel framework to conduct principle component analysis for marked point processes. For the first aim, efficient computational algorithms will be developed and theoretical properties of these algorithms will be investigated. For the second aim, the marks and points of the marked point process are linked through two potentially correlated latent processes, and principle component analysis is conducted for each of the two latent processes. Theoretical properties of the proposed method will be investigated. Data driven procedures will be developed to select the tuning parameters.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.972 |