CoursesCS/EE 506/606 Research Programming
EE 530/630 Speech Synthesis
EE 553/653 Speech Signal Processing
My work can be grouped into three categories:
- Signal Analysis. This category describes work that performs feature analysis and subsequent model fitting to speech and other biological signals. Examples include modeling the coarticulation of speech with the purpose of diagnosis of classes of dysarthria, modeling affective aspects of speech for the purpose of assisting in the diagnosis of autism, and automatic classification of overnight sleep breathing sounds for low-cost, ubiquitous, minimally-obtrusive screening and assessment of sleep apnea.
- Signal synthesis. This category describes work that creates signals, e.g. a speech audio waveform, from models that have been created either from real observations or by-rule. Most commonly, this refers to text-to-speech (TTS) synthesis systems, which employ a variety of models describing the evolution of pitch, duration, and spectral features over the time course of an utterance. TTS-based augmentative devices assist individuals who have lost their voice in communication, and TTS systems are also used as reading machines for the blind. Examples of work that is advancing the state of the art include increasing spectral control in concatenative text-to-speech synthesizers to increase their naturalness, and representing acoustic inventories of TTS systems with an asynchronous interpolation model, thus allowing high rates of compression, elimination of signal artifacts, and signal modification.
- Signal transformation. This category combines the first two, and adds a machine learning component, i.e. speech transformation systems have speech audio waveforms as both their input and output. One of the main applications are the personalization of voices used in speech generating devices, using an approach referred to as voice conversion (this was the main focus of my Ph.D. thesis), making it possible to build personalized TTS systems for individuals with speech disorders who can only intermittently produce normal speech sounds or for individuals who are about to undergo surgery that will irreversibly alter their speech. Another application involves automatically increasing the intelligibility of conversationally-spoken speech, for inclusion in future-generation hearing aid devices. A third application is for enhancing both the intelligibility as well as the naturalness of dysarthric or aphonic speech.