Semi-Supervised Discriminatrive Training of Language Models

This project is conducting fundamental research in statistical language modeling to improve human language technologies, including automatic speech recognition (ASR) and machine translation (MT).

A language model (LM) is conventionally optimized, using text in the target language, to assign high probability to well-formed sentences. This method has a fundamental shortcoming: the optimization does not explicitly target the kinds of distinctions necessary to accomplish the task at hand, such as discriminating (for ASR) between different words that are acoustically confusable or (for MT) between different target-language words that express the multiple meanings of a polysemous source-language word.

Discriminative optimization of the LM, which would overcome this shortcoming, requires large quantities of paired input-output sequences: speech and its reference transcription for ASR or source-language (e.g. Chinese) sentences and their translations into the target language (say, English) for MT. Such resources are expensive, and limit the efficacy of discriminative training methods.

In a radical departure from convention, this project is investigating discriminative training using easily available, *unpaired* input and output sequences: un-transcribed speech or monolingual source-language text and unpaired target-language text. Two key ideas are being pursued: (i) unlabeled input sequences (e.g. speech or Chinese text) are processed to learn likely confusions encountered by the ASR or MT system; (ii) unpaired output sequences (English text) are leveraged to discriminate between these well-formed sentences from the (supposed) ill-formed sentences the system could potentially confuse them with.

This self-supervised discriminative training, if successful, will advance machine intelligence in fundamental ways that impact many other applications.

Funding source