| Universität Paderborn, P1.3.01
We cast classical signal pre-processing problems into a new regression setting by learning the nonlinear mapping from noisy speech spectra to clean speech features based on deep neural networks (DNNs), combining the emerging deep learning and big data paradigms. DNN-enhanced speech demonstrates good quality and intelligibility in challenging acoustic conditions. Furthermore, this paradigm facilitates an integrated learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic modeling, all altogether in a unified manner. The proposed approach was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions, respectively. Leveraging on the top speech qualities achieved in speech separation, microphone array based speech enhancement and speech dereverberation, needed for the three corresponding speaking environments, our team scored the lowest word error rates in all three scenarios.
Speaker: Chin-Hui Lee, School of ECE, Georgia Tech
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 34,000 citations and an h-index of 75 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''.