2007 — 2011 |
Martinez, Aleix |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Computer Vision Algorithms For the Study of Facial Expressions of Emotions in Sign Languages @ Ohio State University Research Foundation -Do Not Use
PI: Aleix Martinez Institution: Ohio State University
Title: RI: Computer Vision Algorithms for the Study of Facial Expressions of Emotions in Sign Languages
It is known that there exist important perceptual differences between deaf native users of American Sign Language (ASL) and hearing people with no prior exposure to ASL. This project will systematically investigate the differences between these two groups as they observe and classify images of faces with regard to the displayed emotion. These perceptual differences may have they roots in the distinct manner in which native users of ASL and non-users code and analyze 2D and 3D motion patterns. We will thus study how these differences relate to the perception of movement. Finally, we will develop a face avatar that can emulate the facial movements of users and non-users of ASL. To achieve this goal, we will develop a set of computer vision algorithms that can be used to study the differences in production of facial expressions of emotions in native users of ASL and non-signers. A necessary step for this is to collect a database of facial expressions of emotions as produced by users of ASL. This will reveal differences at the production level and will allow for the study of perceptual differences.
The research described above addresses several critical issues. First, these studies are fundamental to fully understand the underlying mechanisms used by the brain to analyze, code and recognize facial expressions of emotions. While research on facial expressions of emotion has proven extremely challenging to date, most of the studies have only targeted the hearing. This proposal will study the underlying mechanisms associated to code, produce and interpret facial expression of emotions of native users of ASL. Unfortunately, the computer vision algorithms necessary to carry out these studies are not available. The research in this project is set to remedy this shortcoming.
The facial analysis studies that will be conducted during the course of this proposal can be used in a large number of applications, for example, from human-computer interaction systems where the computer interprets expressions form its user, and to study the role that each facial feature plays in the grammar of ASL. Furthermore, the study of emotional gestures will be valuable to those anthropologists attempting to understand and model the evolution of emotions, and could be used to develop mechanisms to detect lies and deceit. The database of facial expressions collected during the course this project will be made available to the research community and to educators of ASL. We will open collaborations with the School for the Deaf and encourage deaf students to pursue careers in computing and engineering.
URL: http://cbcsl.ece.ohio-state.edu/research/
|
0.973 |
2010 — 2014 |
Martinez, Aleix M |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
A Study of the Computational Space of Facial Expressions of Emotion
DESCRIPTION (provided by applicant): Past research has been very successful in defining how facial expressions of emotion are produced, including which muscle movements create the most commonly seen expressions. These facial expressions of emotion are then interpreted by our visual system. Yet, little is known about how these facial expressions are recognized. The overarching goal of this proposal is to define the form and dimensions of the cognitive (computational) space used in this visual recognition. In particular, this proposal will study the following three hypotheses: Although facial expressions are produced by a complex set of muscle movements, expressions are generally easily identified at different spatial and time resolutions. However, it is not know what these limits are. Our first hypothesis (H1) is that recognition of facial expressions of emotion can be achieved at low resolutions and after short exposure times. In Aim 1, we define experiments to determine how many pixels and milliseconds (ms) are needed to successfully identify different emotions. The fact that expressions of emotion can be recognized quickly at low resolution indicates that simple features robust to image manipulation are employed. Our second hypothesis (H2) is that the recognition of facial expressions of emotion is partially accomplished by an analysis of configural features. Configural cues are known to play an important role in other face recognition tasks, but their role in the processing of expressions of emotion is not yet well understood. Aim 2 will identify a number of these configural cues. We will use real images of faces, manipulated versions of these face images, and schematic drawings. It is also known that shape features play a role in facial expressions (e.g., the curvature of the mouth in happiness). In Aim 3, we define a shape-based computational model. Our hypothesis (H3) is that the configural and shape features are defined as deviations from a mean (or norm) face as opposed to being described as a set of independent exemplars (Gnostic neurons). The importance of this computational space is not only to further justify the results of the previous aims, but to make new predictions that can be verified with additional experiments with human subjects. PUBLIC HEALTH RELEVANCE: Understanding how facial expressions of emotion are processed by our cognitive system will be important for studies of abnormal face and emotion visual processing in schizophrenia, autism and Huntington's disease. Also, abused children are more acute at recognizing emotions, suggesting a higher degree of expertise to some image features. Identifying which features are used by the cognitive system will help develop protocols for reducing their unwanted effects. Understanding the limits in spatial and time resolution will also be important for studies of low vision (acuity), which are typical problems in several eye diseases and in the normal process of aging.
|
0.958 |
2010 — 2011 |
Martinez, Aleix M |
R21Activity Code Description: To encourage the development of new research activities in categorical program areas. (Support generally is restricted in level of support and in time.) |
Computational Methods For Analysis of Mouth Shapes in Sign Languages
DESCRIPTION (provided by applicant): American Sign Language (ASL) grammar is specified by the manual sign (the hand) and by the nonmanual components (the face). These facial articulations perform significant semantic, prosodic, pragmatic, and syntactic functions. This proposal will systematically study mouth positions in ASL. Our hypothesis is that ASL mouth positions are more extensive than those used in speech. To study this hypothesis, this project is divided into three aims. In our first aim, we hypothesize that mouth positions are fundamental for the understanding of signs produced in context because they are very distinct from signs seen in isolation. To study this we have recently collected a database of ASL sentences and nonmanuals in over 3600 video clips from 20 Deaf native signers. Our experiments will use this database to identify potential mappings from visual to linguistic features. To successfully do this, our second aim is to design a set of shape analysis and discriminant analysis algorithms that can efficiently analyze the large number of frames in these video clips. The goal is to define a linguistically useful model, i.e., the smallest model that contains the main visual features from which further predictions can be made. Then, in our third aim, we will explore the hypothesis that the linguistically distinct mouth positions are also visually distinct. In particular, we will use the algorithms defined in the second aim to determine if distinct visual features are used to define different linguistic categories. This result will show whether linguistically meaningful mouth positions are not only necessary in ASL (as hypothesized in aim 1), but whether they are defined using non-overlapping visual features (as hypothesized in aim 3). These aims address a critical need. At present, the study of nonmanuals must be carried out manually, that is, the shape and position of each facial feature in each frame must be recorded by hand. Furthermore, to be able to draw conclusive results for the design of a linguistic model, it is necessary to study many video sequences of related sentences as produced by different signers. It has thus proven nearly impossible to continue this research manually. The algorithms designed in the course of this grant will facilitate this analysis of ASL nonmanuals and lead to better teaching materials. PUBLIC HEALTH RELEVANCE: Deafness limits access to information, with consequent effects on academic achievement, personal integration, and life-long financial situation, and also inhibits valuable contributions by Deaf people to the hearing world. The public benefit of our research includes: (1) the goal of a practical and useful device to enhance communication between Deaf and hearing people in a variety of settings;and (2) the removal of a barrier that prevents Deaf individuals from achieving their full potential. An understanding of the non-manuals will also change how ASL is taught, leading to an improvement in the training of teachers of the Deaf, sign language interpreters and instructors, and crucially parents of deaf children.
|
0.958 |
2016 — 2021 |
Martinez, Aleix M |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Computational Methods For the Study of American Sign Language Nonmanuals Using Very Large Databases
? DESCRIPTION (provided by applicant): American Sign Language (ASL) grammar is specified by the manual sign (the hands) and by the nonmanual components, which include the face. Our general hypothesis is that nonmanual facial articulations perform significant semantic and syntactic functions by means of a more extensive set of facial expressions than that seen in other communicative systems (e.g., speech and emotion). This proposal will systematically study this hypothesis. Specifically, we will study the following three hypotheses needed to properly answer the general hypothesis stated above: First, we hypothesize (H1) that the facial muscles involved in the production of clause-level grammatical facial expressions in ASL and/or their intensity of activation are more extensive than those seen in speech and emotion. Second, we hypothesize (H2) that the temporal structure of these facial configurations are more extensive than those seen in speech and emotion. Finally, we hypothesize (H3) that eliminating these ASL nonmanual makers from the original videos, drastically reduces the chances of correctly identifying the clause type of the signed sentence. To test these three hypotheses, we define a highly innovative approach based on the design of computational tools for the analysis of nonmanuals in signing. In particular, we will examine the following three specific aims. In Aim 1, we will build a series of computer algorithms that allow us to automatically (i.e., without the need of any human intervention) detect the face, its facial features as well as the automatic detection of the movements of the facial muscles and their intensity of activation. These tools will be integrated into ELAN, a standard software used for linguistic analysis. These tools will then be used to test six specific hypotheses to successfully study H1. In Aim 2, we define computer vision and machine learning algorithms to identify the temporal structure of ASL facial configurations and examine how these compare to those seen in speech and emotion. We will study six specific hypotheses to successfully address H2. Alternative hypotheses are defined in both aims. Finally, in Aim 3 we define algorithms to automatically modify the original videos of facial expression in ASL to eliminate the identified nonmanual markers. Native users of ASL will complete behavioral experiments to examine H3 and test potential alternative hypotheses. Comparative analysis with non-signer controls will also be completed. These studies will thus further validate H1 and H2. We provide evidence of our ability to successfully complete the tasks in each of these aims. These aims address a critical need; at present, the study of nonmanuals must be carried out by hand. To be able to draw conclusive results, it is necessary to study thousands of videos. The proposed computational approach supposes at least a 50-fold reduction in time compared to methods done by hand.
|
0.958 |