Area:
statistical machine learning
We are testing a new system for linking grants to scientists.
The funding information displayed below comes from the
NIH Research Portfolio Online Reporting Tools and the
NSF Award Database.
The grant data on this page is limited to grants awarded in the United States and is thus partial. It can nonetheless be used to understand how funding patterns influence mentorship networks and vice-versa, which has deep implications on how research is done.
You can help! If you notice any innacuracies, please
sign in and mark grants as correct or incorrect matches.
Sign in to see low-probability grants and correct any errors in linkage between grants and researchers.
High-probability grants
According to our matching algorithm, Guy Lebanon is the likely recipient of the following grants.
Years |
Recipients |
Code |
Title / Keywords |
Matching score |
2006 — 2008 |
Lebanon, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Assessing the Readability of Documents and Statistical Tools For Non-Euclidean Data
Documents are written with a specific audience in mind that varies across several dimensions. One such dimension is the readability level, which may vary from elementary child readability to adult readability. The investigator developd statistical models for readability prediction and experiment with different alternatives. As most standard representations of documents are not well described using Euclidean geometry, the investigator directd his research at non-Euclidean modeling of the word histogram or term-frequency representation. Specifically, the task is that of non-linear regression where the covariates are points in the simplex, but do not obey Euclidean geometry.
The task of predicting the readability of documents is an important one. A likely implication of advances in this area is improvement in matching readability level with documents retrieved by search systems. This in turn will positively effect children and non-native speakers of English in their internet searches and other automated textual efforts. As the research is interdisciplinary it is expected to bring together and foster future collaboration between the communities of statistics, machine learning and information retrieval.
|
0.961 |
2007 — 2011 |
Lebanon, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
"Ips: Decision Theoretic Approaches to Measuring and Minimizing Customized Privacy Risk"
The goal of this project is to provide a principled way of quantitatively characterizing the effect of disclosing private data. Based on statistical decision theory, the proposed framework incorporates user-defined sensitivity information and identification model into a personalized risk function. The risk is intuitive and interpretable as it is based only on a user-specified loss function and elementary laws of probability and statistics. The proposed framework leads to a more accurate measure of the consequences of popular disclosure policies such as k-anonymity as well as efficient search for novel optimal policies.
Currently, private data is being disclosed according to general policies that do not necessarily reflect users preferences. The novel framework will let users obtain a quantitative grasp on the consequences of current data disclosure policies. Due to the simplicity and interpretability of the risk this will apply, in particular, to people lacking in technical or scientific education that otherwise remain uninformed about the use of their private data. Effective dissemination of the research results to industry and the popular press have the potential to transform current disclosure policies to become more focused on serving the needs of the community. The project also aims to enhance graduate and undergraduate education in the interdisciplinary area of statistical approaches to privacy preservation. Outreach efforts include mentoring of minority students in science and technology. The results of this project are disseminated via the web-page http://www.ecn.purdue.edu/~lebanon/privacyRisk.
|
0.961 |
2008 — 2014 |
Lebanon, Guy |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Career: Multiresolution Representations of Documents
An effective document representation is a crucial text processing component and without it, even the most sophisticated methods and models perform poorly. Current document representations such as the bag of words or Markov n-gram models ignore nearly all sequential information and focus instead on the histogram of words or short phrases. The proposed work develops sequential representations for documents that go beyond bag of words and Markov models and effectively capture a wide range of sequential information. The main idea behind these representations is to use smoothing techniques to transform the word sequence into smooth curves representing sequential content through changes in the local word histogram. By varying the amount of smoothing, the proposed representations interpolate between different sequential resolutions, thus conveniently capturing sequential details at varying levels of granularity. The proposed work provides improved document analysis, including the classification, segmentation, and summarization of documents. Furthermore, it enables visualizing the sequential trends in documents thus leading to the emergence of computer-assisted document browsing technology. In addition to computer experiments validating improved modeling accuracy, the project involves a series of user studies thus demonstrating the wide applicability of the project.
Broader impacts include the development of visualization tools that will assist users in reading and browsing documents thus potentially helping millions of people to quickly and effectively absorb textual information. Other education components include assisting foreign language learning, strengthening the computational aspects of the statistics program at Purdue and mentoring minority students.
http://www.stat.purdue.edu/~lebanon/research/projects/multiResDocuments/
|
0.961 |
2009 — 2012 |
Lebanon, Guy Mei, Yajun (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Statistical Inference For Censored Preference Data @ Georgia Tech Research Corporation
Ranked data arises from m raters ordering by some mechanism n items to express their preferences for the item. Such data can represent election voting, psychological and medical surveys, book and movie recommendation, and web-site ranking system such as search engines. In this proposal the investigators develop the theory and methodology of statistical inference in the case where n and m tend to infinity, and each rater provides an increasingly censored or partial preference information. Under this scenario, they demonstrate how to obtain consistent non-parametric estimators and develop efficient computational procedures for their use. Another aspect that is examined is visualizing preference data by embedding it in a low dimensional space, and designing appropriate surveys for preference data.
The methodology and theory developed in this proposal should help build superior recommendations systems which are becoming increasingly popular in today's online businesses. Such systems build a customized list of recommended items based on the user's past preferences. The proposal also develops visualization techniques for such data which should increase the ability of businesses to analyze customer survey data. In the past such techniques have been either ad-hoc and lacking statistical interpretation, or computationally prohibitive. This proposal aims at developing useful tools for preference data that are both statistically interpretable and computationally efficient, in a realistic large data setting.
|
0.906 |