EIM News

Renommierter Forscher Chin-Hui Lee spricht an der Uni Paderborn über Sprachverarbeitung

05.07.2018 | EIM-Nachrichten, CS-Nachrichten

Prof. Chin-Hui Lee, eine der bekanntesten Persönlichkeiten auf dem Gebiet der automatischen Spracherkennung, besucht am 23.7. die Universität Paderborn. Auf Einladung von Prof. Häb-Umbach, Fachgebiet Nachrichtentechnik, wird er um 14.00h im Raum P1.3.01 einen Vortrag mit dem Thema A Machine Learning Approach to Acoustic Signal Processing halten und steht den ganzen Tag für Fragen und Diskussionen zur Verfügung.

Chin-Hui Lee ist Professor am Georgia Institute of Technology. Davor war er bis zum Jahr 2001 an den berühmten Bell Laboratories, Murray Hill, New Jersey, beschäftigt, wo er als „Distinguished Member of Technical Staff“ zuletzt Direktor der Forschungsabteilung „Dialogue Systems“ war. Dr. Lee ist Fellow des IEEE und der ISCA (International Speech Communication Association). Unter seinen vielen Auszeichnungen ist der renommierte Technical Achievement Award der IEEE Signal Processing Society für “Exceptional Contributions to the Field of Automatic Speech Recognition''.

Abstract des Vortrags

We cast classical signal pre-processing problems into a new regression setting by learning the nonlinear mapping from noisy speech spectra to clean speech features based on deep neural networks (DNNs), combining the emerging deep learning and big data paradigms. DNN-enhanced speech demonstrates good quality and intelligibility in challenging acoustic conditions. Furthermore, this paradigm facilitates an integrated learning framework to train the three key modules in an automatic speech recognition (ASR) system, namely signal conditioning, feature extraction and acoustic modeling, all altogether in a unified manner. The proposed approach was tested on recent challenging ASR tasks in CHiME-2, CHiME-4 and REVERB, designed to evaluate ASR robustness in mixed speakers, multi-channel, and reverberant conditions, respectively. Leveraging on the top speech qualities achieved in speech separation, microphone array based speech enhancement and speech dereverberation, needed for the three corresponding speaking environments, our team scored the lowest word error rates in all three scenarios.