Machine Learning Approaches to Articulatiory Inversion (Articulatory Inver.)

Articulatory inversion is the problem of recovering the sequence of vocal tract shapes that produce a given acoustic utterance. Articulatory representations are useful for automatic speech recognition, speech production research, language therapy, and language learning. Articulatory inversion is a hard problem because different vocal tract shapes can produce the same acoustics, yet the articulatory trajectory must obey the mechanical constraints of the human vocal tract. Other examples of inversion problems over a sequence, which share the multivalued nature of the mappings and the existence of constraints, are: the recovery of facial gestures associated with a speech utterance; the inverse kinematics of a robot arm; and the recovery of 3D motion from video.

This project approaches articulatory inversion from a machine learning standpoint, based on a framework introduced by the PI. The low-dimensional manifold in articulatory-acoustic space is represented in a probabilistic way by a density model estimated from data (recorded using a microphone and electromagnetic articulography). Multivalued mappings are explicitly represented by the modes of conditional distributions of this density, and the articulatory trajectory is disambiguated using a continuity constraint.

The project introduces new problems in dimensionality reduction, density estimation and regularization (such as multivalued regression and graph-learning from noisy data), and new models and algorithms. The expected results of this work are: performing basic research in machine learning, and introducing mapping inversion problems to research and education; improving articulatory inversion (for which code will be made freely available); and advocating data-driven approaches in speech production research and education.

Funding source

NSF CAREER for former CSLU faculty Miguel Carrera-Perpignan