1990 — 1994 |
Morgan, Nelson |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Application of Signal Processing Cad to the Digital Realization of Artificial Neural Networks @ International Computer Science Institute
This is a project to extend the Lager CAD system with cells and tools for neural network design. The extended system is being used to design several systems of increasing sizes, the largest being a machine to recognize connected speech after being trained for each speaker. The resulting machine is expected to run much faster in the training phase than existing machines.
|
1 |
1993 — 1997 |
Morgan, Nelson |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
A System For Connectionist Speech Recognition Research @ International Computer Science Institute
Morgan This project is constructing a computer optimized toward speech recognition algorithms, and is evaluating speech algorithms on the machine. A fundamental goal of the project is to explore the architectural changes needed for speech processing in future production systems. The new computer is a low-degree multiprocessor, each node of which contains a high-speed general- purpose processor, a multiple-accumulate processor, memory, and a communications interface for the multiprocessor interconnect. The computer will also be capable of being extended to include analog processing or smart sensors. This new machine will provide the performance of supercomputers at a small fraction of their cost on the speech recognition problem, and will contribute to the development of speech recognition systems for everyday use in commodity computers.
|
1 |
1997 — 1998 |
Morgan, Nelson |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Sger: Incorporating Higher-Level Information Into Dynamic Pronounciation Modeling For Asr @ International Computer Science Institute
In large-vocabulary spontaneous speech, the variability of the pronunciations of words is much higher than in read speech situations. At the 1996 Summer Workshop on Large Vocabulary Conversational Speech Recognition (WS96), a model for this variability to be used in Automatic Speech Recognition (ASR) systems was developed based on machine- derived descriptions of speech data. The continuation of this work in this grant focuses on studying the correlation of variation in pronunciations in continuous speech and higher-level information not usually brought to bear in an ASR pronunciation model. One important element in this model is the rate of speech, which has been shown to be a good predictor of word error rate on both read and spontaneous speech corpora. Investigations into the effects of resyllabification (movement of syllable boundaries when words are spoken in sequence) and word frequency on word pronunciations are also undertaken. The goal of this project is to improve the predictability of variation for speech recognition models, in particular for the reduction of recognition error for spontaneous and conversational speech. The techniques will be evaluated on the Switchboard corpus.
|
1 |
2001 — 2006 |
Ostendorf, Mari (co-PI) [⬀] Morgan, Nelson Stolcke, Andreas Ellis, Daniel Kirchhoff, Katrin (co-PI) [⬀] |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Itr/Pe+Sy:Mapping Meetings: Language Technology to Make Sense of Human Interaction @ International Computer Science Institute
Meetings are essential and ongoing processes in almost every enterprise. To record meetings is to provide a history of human interactions. However, two central challenges remain: (1) how to make sense of the group dynamics in those meetings and (2) how to search through a history of those interactions to find the information one may want. This research aims to develop automatic information processing systems based on the metaphor of a "meeting map", a structured representation that supports the presentation of multiple views of a meeting at different scales. The project will focus on two broad map categories: content maps, portraying topics discussed and decisions made; and interaction maps, identifying the roles and relationships of the participants and the level of concurrence. Building content and interaction maps will involve automatic classification of information from topic changes and salience to disagreement/consensus. These maps will be used for generating simple indicative summaries, and off-the-shelf visualization tools will be used for map presentation. The project will build on analyses of 100 hours of meetings. Evaluations will use objective recognition accuracies and expert assessments of automatic summaries. Meeting maps respect the diversity of information present in meeting scenarios, and provide effective support for human-to-human interactions.
|
1 |
2005 — 2008 |
Zhu, Qifeng Morgan, Nelson Wooters, Chuck |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Oia/Mri: Acquisition of a Computational Server For Large Vocabulary Connectionist Speech Recognition @ International Computer Science Institute
This project, supporting experimental research methods for large speech recognition tasks, aims at purchasing a large Symmetric Multi-Processor (SMP) system. The research involves the development of models and algorithms that will reduce automatic speech recognition (ASR) errors for natural conversations, which may be exacerbated by realistic but difficult acoustic conditions. Major improvement in algorithm robustness opens a wider range of future applications, including voice access to networked information and information retrieval and extraction for meetings. Head-mounted microphones or microphone arrays may not be feasible due to low Signal to Noise Ratio (SNR) and the effects of reverberation. Integration of multiple estimators, either at the level of probability streams or hypothesized word sequences, with associated confidence measures, can greatly improve overall performance. Research has shown that such properties can significantly increase recognition accuracy, even for high SNR tasks that require the transcription of informal conversational speech. For problems of scale, training of even a single-stream system can take weeks using a 2005-generation PC or workstation. A fast multi-processor system might overcome these resource limitations and greatly enhance the ability to explore promising solutions to the current constraints on performance. Hence, research requiring multiple probability streams or more computationally intensive algorithms should benefit from this new multi-layered system infrastructure.
Broader Impact: The planned research supports technical explorations that become the basis of PhD dissertations. Other areas, such as computational biology, natural language processing, digital communications, computer vision, human activity modeling, and human computer interaction, might also benefit by the research. ICSI involves many female researchers; has a high school outreach program, and trains students, and other investigators.
|
1 |
2010 — 2012 |
Morgan, Nelson Hakkani-Tur, Dilek |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ci-P: Towards a Consensus Representation For Understanding Structure of Multiparty Conversations @ International Computer Science Institute
Meetings provide unique knowledge sharing opportunities and are an efficient way of interaction between people with different expertise areas. Such human communication in stored audio form has rapidly grown in recent years, providing ample source material for later use. In particular, the increased prominence of search as a basic user activity has meant that the ability to automatically browse, summarize, or graphically visualize various aspects of the spoken content has become far more important. There are several studies on representation and detection of various types of events in multiparty conversations, such as agreement/disagreements and decisions. However, there is no consensus on how to represent structure of meeting discussions, to be used later for human browsing or in further automatic processing, such as summarization.
This planning project works towards a research infrastructure for a better understanding of meeting structure, and aims to organize a community effort in the form of a workshop, to identify the consensus needs of the meeting processing research and education community, for enhancing the existing and widely used ICSI meetings corpus with annotations of structure of meeting discussions. Such annotations are critical in initiating research on automatic detection and annotation of meeting discussions, and would also be useful for research on meeting visualization, browsing and summarization. Furthermore, such an infrastructure would provide students and researchers in natural language and speech processing a framework to experiment with and enable social scientists interested in interactional structures to develop more robust analysis mechanisms. The workshop discussions will contribute to the decision on an annotation schema and the design of annotation guidelines, with a small set of sample annotated meetings, which will be made publicly available.
|
1 |
2011 — 2012 |
Morgan, Nelson |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
International: An Analysis of Speaker Diarization Systems Errors @ International Computer Science Institute
This project will support the three-month visit of a US PhD student to the Idiap Research Institute in Switzerland, a leading international laboratory that has developed a state-of-the-art diarization system. Speaker diarization is the task of determining ?who spoke when? without a priori knowledge of the number of speakers or speaker identities. The focus of the current effort is to perform error analysis in audio-only speaker diarization for the meeting domain. There are two main areas of interest. The first is to build a framework to analyze speaker diarization performance on specific types of segments (e.g., speaker changes, interruption, overlapped speech, short utterances, long utterances, etc.). By analyzing where speaker diarization systems perform poorly, speaker diarization researchers can focus on improving performance during those problematic types of segments. The second area is to compare speaker diarization performance across systems.
The project has substantial broader impacts. Speaker diarization is a useful step in meeting analysis. Considering the time people spend in meetings, improved speaker diarization could be useful for a broad portion of the population. While the goal is to characterize current speaker diarization errors, the knowledge gained from this work will be useful for improving future speaker diarization systems. In particular, by comparing where errors occur across multiple systems, the speaker diarization community can gain insight into the strengths and weaknesses of the various systems which could lead to a more novel way of combining systems to improve speaker diarization performance. In addition, the project will support the development of an international network of collaborators for a US graduate student.
|
1 |
2012 — 2015 |
Morgan, Nelson |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Eager: Collaborative Research: Towards Modeling Human Speech Confusions in Noise @ International Computer Science Institute
This EArly-concept Grant for Exploratory Research (EAGER) supports an exploratory study to evaluate model components for prediction of human speech recognition in the presence of noise. Such a model has the potential to predict confusions between fine phonetic distinctions in different levels of background noise and at different speaking rates. The study takes advantage of modern physiological results that indicate that the primary auditory cortex performs spectro-temporal filtering; that is, that there are cells that are sensitive to particular spectro-temporal modulations at each auditory frequency. In this project, perceptual experiments in the presence of both stationary and non-stationary additive noise and at different signal-to-noise ratios for a database of CVC syllables recorded at 2 different speaking rates yield confusion statistics. These statistics are then compared to those resulting from an auditory model enhanced by elements incorporating these spectro-temporal filters.
Successful results from this study will suggest enhancements to current hearing models and ultimately, after a broader study for which this EAGER is a pilot, advance the understanding of human speech perception. Background noise presents a challenging problem for a variety of speech and hearing devices including hearing aids and automatic speech recognition (ASR) systems. Since normal-hearing human listeners are extremely adept at perceiving speech in noise, this improved understanding of human models could lead to better artificial systems for speech processing. The databases and tools developed for this study will be disseminated to the research community.
|
1 |
2013 — 2016 |
Morgan, Nelson Ellis, Daniel |
N/AActivity Code Description: No activity code was retrieved: click on the grant title for more information |
Ri: Small: Collaborative Research: Towards Modeling Source Separation From Measured Cortical Responses @ International Computer Science Institute
This project will use new technologies for measuring brain activity to understand in detail how human listeners are able to separate competing, overlapping voices, and thereby to help design automatic systems capable of the same feat. Natural environments are full of overlapping sounds, and successful audio processing by both humans and machines relies on a fundamental ability to separate out sound sources of interest. This is commonly referred to as the "cocktail party effect," based on the ability of people to hear what a single person is saying despite the noisy background audio from other speakers. Despite the long history of research in hearing, this exceptional human capability for sound source separation is still poorly understood, and efforts to automatically separate overlapping voices by machine are correspondingly crude: although great advances have been made in robust processing of noisy speech by machine, separation of complex natural sounds (such as overlapping voices) remains a challenge. Advances in sensor technology now enable the modeling of this function in humans, giving an unprecedented, detailed view of sound representation processing in the brain. This project works specifically with measurements of neuroelectric response made directly on the surface of the human cortex (currently with a 256-electrode sensor array) for patients awaiting neurosurgery. Using such measurements made for controlled mixtures of voices, the project will endeavor to both develop models of voice separation in the human cortex by reconstructing an approximation to the acoustic stimulus from the neural population response, and in the process learning the linear mapping between the neural response back to a spectrogram measure of the stimulus. To attempt to significantly improve the ability of machine algorithms to mimic human source separation capability, the project will also focus on a signal processing framework that supports experiments with different combinations of cues and strategies to optimize agreement with the recordings of neural activity. The engineering model is based on the Computational Auditory Scene Analysis (CASA) framework, a family of approaches that have shown competitive results for handling sound mixtures.
|
1 |