2021 — 2022 |
Djuric, Petar (co-PI) [⬀] Fabus, Renee Yao, Shanshan |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Lip Reading by Unobtrusive Multimodal Sensors and Machine Learning Algorithms
The project aims to build an unobtrusive system to enable lip reading for patients with Amyotrophic Lateral Sclerosis (ALS, also known as Lou Gehrig's diseases) and individuals with speech and hearing disorders. Although there is rich literature on lip reading, the bulkiness, obtrusiveness, and/or immobility of these solutions impedes their applications in daily practice, especially for patients with neuromuscular disorders. There is an urgent need to develop novel lip-reading technologies to improve the communication capabilities of ALS patients with loved ones and healthcare providers. The proposed system can considerably improve on existing solutions for tracking and interpreting facial movements and more broadly, body movements, such as finger motions and body gestures. The ability to gather multimodal motion patterns from unobtrusive sensors and apply machine learning (ML) to interpret the acquired data would greatly facilitate diagnosis, treatment, and rehabilitation of motion-related disorders, such as stroke and Parkinson's disease. In addition, this work paves the way for the development of nonverbal communication interfaces enabled by facial/body gestures and opens new avenues for rehabilitation, robotics, and human-machine interfaces. This project presents an excellent opportunity for students to participate in cross-disciplinary research. Part of the research will be integrated into the PI's courses and capstone design projects. The PIs are committed to outreach activities and increasing the diversity through local minority organizations and the Vertically Integrated Program at Stony Brook University.
The overarching goal of this project is to build an unobtrusive hardware-software platform for ALS patients that can capture speech-relevant lip gestures and decode lip movements for speech. First, a skin-like multimodal strain and electromyography (EMG) sensing system will be designed to track both skin deformations and muscle activities associated with lip movements. Self-assembled structures will be introduced to render the sensors ultrathin, breathable, and semi-transparent. Second, the feasibility of converting the sensed lip signals to corresponding spoken words will be demonstrated. Modern ML methods, and in particular, ensemble Gaussian processes (GPs) will be exploited for speech recognition. In the proposed scheme, each GP serves as a classifier and the final decision is made by fusing the results of all the GPs by making use of methods within the Bayesian framework. The potential contributions of the proposed work include: 1) Design of skin-like strain and EMG sensors with high sensitivity and good skin compatibility through a scalable self-assembly process. 2) Integration of multimodal sensors for comprehensive in-vivo quantification of lip movements associated with speech. 3) Development of ML algorithms that precisely convert lip movements to speech. 4) Laying the grounds for developing a truly natural and unobtrusive hardware-software system for lip reading. Our proposed work can fill the gaps in the existing solutions by an intuitive and unobtrusive technology for lip reading.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
0.916 |