1992 — 1995 |
Wang, Deliang |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Segmentation and Recognition of Complex Temporal Patterns @ Ohio State University Research Foundation -Do Not Use
Temporal information processing underlies various kinds of intelligent behaviors, including hearing and vision. A neural network framework for segmenting and recognizing complex temporal patterns is proposed. Processing of temporal segmentation is based on the idea that segmentation is expressed by synchronization within each segment and desychronization among different segments. Each segment becomes an input to the recognition network that explicitly encodes neighborhood or topological relations of local features of the input, and recognition is based on the graph matching method. To cope with problems embedded in time, the network to constructed codes time explicitly. Multiple temporal patterns are segregated into different segments that are activated alternately in the time domain. The network is able to recognize complex temporal patterns, and recognition is invariant to distortions of time intervals (time warping) and to changes in the rate of presentation . The network will be tested for both neural plausibility and computational effectiveness. Results of this project will provide new computational principles that might be used by the brain to process temporal segmentation and recognition. Also, they will provide effective methods for solving technical problems indispensable in real time continuous auditor pattern recognition.
|
0.973 |
1995 — 1999 |
Wang, Deliang |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Automated Auditory Scene Analysis Based On Oscillatory Correlation @ Ohio State University Research Foundation -Do Not Use
9500436 Wittrup This conference will bring together leaders, practitioners, and students to discuss the latest ideas and discoveries in biochemical engineering, and will emphasize efforts to expand the intellectual and industrial scope of biotechnology and the contributions which biochemical engineers can make to this expansion. The theme of the conference is that biotechnology is increasingly an interdisciplinary endeavor, requiring knowledge, methods, and expertise from biochemistry, genetics, chemistry, computer science, and chemical engineering to identify the most critical problems, to formulate the most creative and successful strategies, and to implement the most novel and profitable solutions. This interdisciplinary web of interactions is critical for the development of new products using biotechnology, and new processes for synthesizing and purifying the products of biotechnology. ***
|
0.973 |
2000 — 2004 |
Wang, Deliang |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr: Dynamics-Based Speech Segregation @ Ohio State University Research Foundation -Do Not Use
A typical auditory scene contains multiple simultaneous events, and a remarkable feat of the auditory nervous system is its ability to disentangle the acoustic mixture and group the acoustic energy from the same event. This fundamental process of auditory perception is called auditory scene analysis. Of particular importance in auditory scene analysis is the separation of speech from interfering sounds, or speech segregation. Speech segregation remains a largely unsolved problem in auditory engineering and speech technology. In this project, the P1 seeks to develop a dynamics-based system for speech segregation using perceptual and neural principles. Auditory grouping will be based on oscillatory correlation, whereby phases of neural oscillators encode the binding of auditory features. The investigation will consist of subsequent stages of computation, starting from simulated auditory periphery composed of cochlear filtering and hair cell transduction. A mid-level representation will be formed by computing auto- and cross-correlation of filter channels. A stage of segment formation then creates individual elements of a represented auditory scene, each of which is a dynamically evolving, connected time-frequency structure that may overlap with other elements. Operating on auditory segments from the segment formation stage, both simultaneous organization and sequential organization will be incorporated. For simultaneous organization, grouping will be based on periodicity, location, onset and offset analyses, while for sequential organization grouping will be based on pitch, spectral, and location continuities. In particular, two pitch maps corresponding to two ears and one location map will be computed for auditory organization. All of the employed grouping cues are consistent with perceptual principles of auditory scene analysis. These cues guide the connectivity of neural oscillator networks, which perform grouping and segregation of auditory segments. The proposed system will be evaluated using real recordings of speech and interfering sounds, where speech can be both voiced and unvoiced. The success of the system will be quantitatively assessed using two measures: changes in signal-to-noise ratio and speech recognition rate. This project is expected to make significant contributions to automatic speech recognition in unconstrained environments.
|
0.973 |
2006 — 2009 |
Wang, Deliang Shinn-Cunningham, Barbara Ellis, Daniel Divenyi, Pierre |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Separating Speech From Speech Noise to Improve Intelligibility
Separating signals that have been mixed together is an archetypal engineering probelm. The past decade has seen the emergence of a several approaches applicable to separating sound mixtures -- for example, a restaurant scenario in which a desired target voice must be extracted from the background babble of other patrons. However, the most appropriate goal, and hence the way to measure performance, is not always clear. In this project, the goal is established as improving intelligibility i.e. processing sound mixtures so a human listener can better understand what can be said. This requires a collaboration between computer science/electrical engineering -- to provide the separation algorithms -- and auditory scientists/psychologists -- to guide the results towards perceptually-relevant improvements, and to evaluate the results in listener tests.
The particular techniques to be developed and combined include blind source separation (such as independent component analysis), computational auditory scene analysis (simulations of what is understood about human perceptual processing), and model-driven approaches derived from the machine-learning techniques of speech recognition. One specific area of interest is the synthesis of `minimally-informative noise', acoustic tokens that effectively communicate both what can be inferred and what remains unknown about the target signal, and which can leverage the powerful perceptual inference of human listeners.
This project will lead to implementations of acoustic signal separation that deliver the greatest benefit to human listeners, potentially including both normal-hearing and hearing-impaired individuals. This has a broad range of applications from processing archival recordings through to improved real-time communications technologies, as well as the potential to help automatic speech recognition systems.
|
0.967 |
2006 — 2009 |
Wang, Deliang |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Collaborative Research: Separating Speech From Speech Noise to Improve Speech Intelligibility @ Ohio State University Research Foundation -Do Not Use
Separating signals that have been mixed together is an archetypal engineering probelm. The past decade has seen the emergence of a several approaches applicable to separating sound mixtures -- for example, a restaurant scenario in which a desired target voice must be extracted from the background babble of other patrons. However, the most appropriate goal, and hence the way to measure performance, is not always clear. In this project, the goal is established as improving intelligibility i.e. processing sound mixtures so a human listener can better understand what can be said. This requires a collaboration between computer science/electrical engineering -- to provide the separation algorithms -- and auditory scientists/psychologists -- to guide the results towards perceptually-relevant improvements, and to evaluate the results in listener tests.
The particular techniques to be developed and combined include blind source separation (such as independent component analysis), computational auditory scene analysis (simulations of what is understood about human perceptual processing), and model-driven approaches derived from the machine-learning techniques of speech recognition. One specific area of interest is the synthesis of `minimally-informative noise', acoustic tokens that effectively communicate both what can be inferred and what remains unknown about the target signal, and which can leverage the powerful perceptual inference of human listeners.
This project will lead to implementations of acoustic signal separation that deliver the greatest benefit to human listeners, potentially including both normal-hearing and hearing-impaired individuals. This has a broad range of applications from processing archival recordings through to improved real-time communications technologies, as well as the potential to help automatic speech recognition systems.
|
0.973 |
2013 — 2017 |
Wang, Deliang |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Speech Segregation to Improve Intelligibility of Noisy Speech
DESCRIPTION (provided by applicant): Hearing loss is one of the most prevalent chronic conditions, affecting 10% of the U.S. population. Although signal amplification by modern hearing aids makes sound more audible to hearing impaired listeners, speech understanding in background noise remains one of the biggest challenges in hearing prosthesis. The proposed research seeks a solution to this challenge by developing a speech segregation system that can significantly improve intelligibility of noisy speech for listeners with hearing loss, with the loner term goal of applying to hearing aid design. Unlike traditional speech enhancement and beam forming algorithms, the proposed monaural (one-microphone) solution will be grounded in perceptual principles of auditory scene analysis. There are two stages in auditory scene analysis: A simultaneous organization stage that groups concurrent sound components and a sequential organization stage that groups sound components across time. This project is designed to achieve three specific aims. The first aim is to improve word recognition scores of hearing-impaired listeners in background noise. The second and the third aims are to improve the sentence-level intelligibility scores in background noise and in interfering speech, respectively. To achieve the first aim, a simultaneous organization algorithm will be developed that uses the pitch cue to segregate voiced speech and the onset and offset cues to segregate unvoiced speech. To achieve aims 2 and 3, a sequential organization algorithm will be developed that groups simultaneously organized streams across time to produce a sentence segregated from background interference. Sequential organization will be performed by analyzing pitch characteristics and a novel clustering method on the basis of incremental speaker modeling. A set of seven speech intelligibility experiments involving both hearing-impaired and normal-hearing listeners will be conducted to systematically evaluate the developed system.
|
1 |
2014 — 2017 |
Wang, Deliang Fosler-Lussier, Eric [⬀] Mandel, Michael |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Medium: Deep Neural Networks For Robust Speech Recognition Through Integrated Acoustic Modeling and Separation
Over the last decade, speech recognition technology has become steadily more present in everyday life, as seen by the proliferation of applications including mobile personal agents and transcription of voicemail messages. Performance of these systems, however, degrades significantly in the presence of background noise; for example, using speech recognition technology in a noisy restaurant or on a windy street can be difficult because speech recognizers confuse the background noise with linguistic content. Compensation for noise typically involves preprocessing the acoustic signal to emphasize the speech signal (i.e. speech separation), and then feeding this processed input into the recognizer. The innovative approach in this project is to train the recognition and separation systems in an integrated manner so that the linguistic content of the signal can inform the separation, and vice versa.
Given the impact of the recent resurgence of Deep Neural Networks (DNNs) in speech processing, this project seeks to make DNNs more resistant to noise by integrating speech separation and speech recognition, exploring three related areas. The first research area seeks to stabilize input to DNNs by combining DNN-based suppression and acoustic modeling, integrating masking estimates across time and frequency, and using this information to improve reconstruction of speech from noisy input. The second area seeks to examine a richer DNN structure, using multi-task learning techniques to guide the construction of DNNs better at performing all tasks and where layers have meaningful structure. The final research area examines ways to adapt the spurious output of DNN acoustic models given acoustic noise. With the focus of integrating speech separation and recognition, the project will be evaluated both by measuring speech recognition performance, as well as metrics that are more closely related to human speech perception. This will ensure a broader impact of this research by providing insights not only to speech technology but also facilitating the design of next-generation hearing technology in the long run.
|
1 |
2018 — 2021 |
Wang, Deliang |
R01Activity Code Description: To support a discrete, specified, circumscribed project to be performed by the named investigator(s) in an area representing his or her specific interest and competencies. |
Speech Segregation to Improve Intelligility of Reverberant-Noisy Speech
Project Summary Hearing loss is one of the most prevalent chronic conditions, affecting 37.5 million Americans. Although signal amplification in modern hearing aids makes sound more audible to hearing impaired listeners, speech understanding in background interference remains the biggest challenge by hearing aid wearers. The proposed research seeks a monaural (one-microphone) solution to this challenge by developing supervised speech segregation based on deep learning. Unlike traditional speech enhancement, deep learning based speech segregation is driven by training data, and three components of a deep neural network (DNN) model are features, training targets, and network architectures. Recently, deep learning has achieved tremendous successes in a variety of real world applications. Our approach builds on the progress made in the PI's previous R01 project which demonstrated, for the first time, substantial speech intelligibility improvements for hearing-impaired listeners in noise. A main focus of the proposed work in this cycle is to combat room reverberation in addition to background interference. The proposed work is designed to achieve three specific aims. The first aim is to improve intelligibility of reverberant-noisy speech for hearing- impaired listeners. To achieve this aim, we will train DNNs to perform time-frequency masking. The second aim is to improve intelligibility of reverberant speech in the presence of competing speech. To achieve this aim, we will perform DNN training to estimate two ideal masks, one for the target talker and the other for the interfering talker. The third aim is to improve intelligibility of reverberant speech in combined speech and nonspeech interference. To achieve this aim, we will develop a two-stage DNN model where the first stage will be trained to remove nonspeech interference and the second stage to remove interfering speech. Eight speech intelligibility experiments involving both hearing-impaired and normal-hearing listeners will be conducted to systematically evaluate the developed system. The proposed project is expected to substantially close the speech intelligibility gap between hearing-impaired and normal-hearing listeners in daily conditions, with the ultimate goal of removing the gap altogether.
|
1 |
2021 — 2024 |
Wang, Deliang |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Deep Learning Based Complex Spectral Mapping For Multi-Channel Speaker Separation and Speech Enhancement
Despite tremendous advances in deep learning based speech separation and automatic speech recognition, a major challenge remains how to separate concurrent speakers and recognize their speech in the presence of room reverberation and background noise. This project will develop a multi-channel complex spectral mapping approach to multi-talker speaker separation and speech enhancement in order to improve speech recognition performance in such conditions. The proposed approach trains deep neural networks to predict the real and imaginary parts of individual talkers from the multi-channel input in the complex domain. After overlapped speakers are separated into simultaneous streams, sequential grouping will be performed for speaker diarization, which is the task of grouping the speech utterances of the same talker over intervals with the utterances of other speakers and pauses. Proposed speaker diarization will integrate spatial and spectral speaker features, which will be extracted through multi-channel speaker localization and single-channel speaker embedding. Recurrent neural networks will be trained to perform classification for the purpose of speaker diarization, which can handle an arbitrary number of speakers in a meeting. The proposed separation system will be evaluated using open, multi-channel speaker separation datasets that contain both room reverberation and background noise. The results from this project are expected to substantially elevate the performance of continuous speaker separation, as well as speaker diarization, in adverse acoustic environments, helping to close the performance gap between recognizing single-talker speech and recognizing multi-talker speech.
The overall goal of this project is to develop a deep learning system that can continuously separate individual speakers in a conversational or meeting setting and accurately recognize the utterances of these speakers. Building on recent advances on simultaneous grouping to separate and enhance overlapped speakers in a talker-independent fashion, the project is mainly focused on speaker diarization, which aims to group the speech utterances of the same speaker across time. To achieve speaker diarization, deep learning based sequential grouping will be performed and it will integrate spatial and spectral speaker characteristics. Through sequential organization, simultaneous streams will be grouped with earlier-separated speaker streams to form sequential streams, each of which corresponds to all the utterances of the same speaker up to the current time. Speaker localization and classification will be investigated to make sequential grouping capable of creating new sequential streams and handling an arbitrary number of speakers in a meeting scenario. With the added spatial dimension, the proposed diarization approach provides a solution to the question of who spoke when and where, significantly expanding the traditional scope of who spoke when. The proposed separation system will be evaluated using multi-channel speaker separation datasets that contain highly overlapped speech in recorded conversations, as well as room reverberation and background noise present in real environments. The main evaluation metric will be word error rate in automatic speech recognition. The performance of speaker diarization will be measured using diarization error rate.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
|
1 |