Product Licensing

The Center for Spoken Language Understanding has developed a number of products available for Academic and Corporate use. To license any of the products listed below, please visit the Technology 
Transfer & Business Development Office website for Speech & Language Technologies.

For additional information contact:
Patricia Dickerson
or
503-346-3753

OHSU # 0631
Full Text To Speech with OGIResLPC
This is a signal processing sub-system for the Festival open-source text-to-speech system. The task of the sub-system is to concatenate acoustic units, modify pitch and timing of the speech signal after concatenation. The sub-system is based on residual-excited linear prediction.

Recordings are analyzed via linear prediction, the reflection coefficients and residuals are stored, and, during pitch and timing modification, overlap-add algorithms are applied to the residuals signal. This system produces higher quality speech than systems in which the excitation is provided by pulses or white noise.

OHSU # 0665
Voice Transformation (High Resolution)
The essence of the system is the following:

  1. Parallel recordings are obtained of a [to-be-transformed] source speaker and a [to-be-mimicked] target speaker.   "Parallel" refers to the fact that the exact same text is used.
  2. An automatic alignment system is used to find the correspondence between the phonemes in the two sets of   recordings.
  3. The target speaker's speech is analyzed using Linear Predictive Coding (LPC), producing LPC coefficients and LPC residuals. Both contain information about the speaker.
  4. A first mapping is trained that maps spectral envelopes of the source speaker on corresponding spectral enveloped of the target speaker. 
  5. A second mapping is computed between the target LPC coefficients and the target LPC residuals.
  6. During operation, the first mapping computes target LPC coefficients from input source LPC coefficients. The second mapping computes target LPC residuals from the computed target LPC coefficients. LPC synthesis is used to generate speech from the computed target LPC coefficients and the computed target LPC residuals.

Competitive Advantages
The key invention and improvement over earlier systems is the usage of target LPC residuals. In older systems, this information is not used, yet it adds considerably to the quality of the mimic. For the "foreign accent reduction" system, the plan is to use this method, and train it on a speaker with an Asian Indian accent and a speaker with a US American accent, and test it on speech from the former speaker that was not used during training.

OHSU # 0681-A
22 Language v1.5
The 22 Language corpus consists of telephone speech from 21 languages. Some of the calls in each language are transcribed orthographically.
Developed with support from the National Science Foundation.
Sample

OHSU # 0681-B
Alphadigit v1.3 
The Alphadigit Corpus is a collection of 78,044 examples from 3,025 speakers saying 6 digit strings of letters and digits over the telephone.
Sample

OHSU # 0681-C
Apple Words and Phrases v1.3
Developed with support from Apple Computer, Inc. 3008 calls were collected and each caller repeated a list of command phrases as they were prompted.
Sample

OHSU # 0681-D
Cellular Words and Phrases v1.3
Consists of utterances gathered from 336 callers who were using cellular telephones. Each caller listened and responded to a series of pre-recorded prompts.
Sample

OHSU # 0681-E
Foreign Accented English v1.2
The corpus contains 4925 telephone quality utterances from native speakers of 23 languages speaking English.
Sample

OHSU # 0681-F
Isolet v1.3
Isolet is a coprus of letters of the English alphabet spoken in isolation. The database consists of 7800 spoken letters, 2 productions of each letter by 150 speakers.
Sample

OHSU # 0681-G
Kids' Speech v1.1 
This final release of the Kids' Speech Corpus comprises spontaneous and scripted utterances from children in grades K through 10. All children read approximately 60 items from a total list of 319 phonetically-balanced but simple words, sentences, or digit strings. Each utterance of spontaneous speech begins with a recitation of the alphabet and contains a monologue of about one minute in duration. Orthographic transcriptions of each spontaneous utterance are included, and transcriptions of the scripted utterances are available via table lookup. All files have been verified for accuracy.
Sample

OHSU # 0681-H
Multi Channel Overlapping Numbers Corpus (MONC) v1.0
A portion of the Numbers corpus played through loudspeakers, re-recorded on a 12-channel table-top microphone array in a meeting room. 

NOTE: This corpus is currently not available for commercial use.
Sample

OHSU # 0681-I
Multi-Language Telephone Speech v1.2
The OGI Multi-language Telephone Speech Corpus consists of telephone speech from 11 languages: English, Farsi, French, German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, Vietnamese. Time-aligned phonetic transcriptions are available for some of the utterances.
Sample

OHSU # 0681-J
Names v1.3
The Names Corpus is a collection of first and last name utterances. All of the utterance have been phonetically transcribed.
Sample

OHSU # 0681-K
National Cellular v2.3 
Consists of cellular telephone speech from 2336 callers from locations throughout the United States.
Developed with support from the National Science Foundation.
Sample

OHSU # 0681-L
Numbers v1.3
The Numbers Corpus is a collection of naturally produced numbers. The utterances were taken from other CSLU telephone speech data collections, and include isolated digit strings, continuous digit strings, and ordinal/cardinal numbers.
Sample

OHSU # 0681-M
Portland Cellular v1.3 
The Portland Cellular Corpus consists of utterances gathered from callers who were using cellular telephones in Portland, Oregon area.
Sample

OHSU # 0681-N
Speaker Recognition v1.1 
The Speaker Recognition corpus (formerly known as Speaker Verification), consists of telephone speech from 91 participants. Each participant has recorded speech in twelve sessions over a two-year period.
Developed with support from the National Science Foundation.
Sample

OHSU # 0681-O
Spelled and Spoken Words v1.2
The Spelled and Spoken Words corpus consists of spelled and spoken words. From over 4000 callers. 1000 callers also recited the English alphabet with pauses between the letters. In addition, a subset of the calls has been phonetically labeled.
Sample

OHSU # 0681-P
SR4X v1.2 
This corpus is a collection of 36 speakers saying 11 words 6 times on 4 different channels.
Sample

OHSU # 0681-Q
Stories v1.2 
The Stories Corpus is made up of extemporaneous speech collected from English speakers in the CSLU Multi-language Telephone Speech data collection.
Sample

OHSU # 0681-R
The Spoltech Brazilian Portuguese v1.0

The Spoltech Brazilian Portuguese corpus consists of prompted sentences and answers to questions, recorded in a number of regions of Brazil. The speech data (8080 utterances) from 477 speakers have been recorded at 44.1 kHz, and there are 2572 orthographic transcriptions and 5507 time-aligned phoneme-level transcriptions. The acoustic environment was not controlled, in order to provide realistic background conditions.
Sample

OHSU # 0681-S
VOICES v1.0
 
The VOICES Corpus contains 12 speakers reading 50 phonetically rich sentences. The recording procedure involved a "mimicking" approach which resulted in a high degree of natural time-alignment between different speakers.
Sample

OHSU # 0681-T
Yes/No v1.2
 
The Yes/No Corpus is a collection of answers to yes/no questions from other CSLU corpora.
Sample

OHSU # 0952
Sound Identification Tutor
The Sound Identification Tutor is a collection of software used for language training. It is targeted on improving phonological awareness through discrimination, identification and imitation drills using pairs of words or syllables that differ in one phoneme. The main software components are an editor that is used to create new lessons, a runtime module that plays lessons and a data analysis module that converts data collected during runtime to a human readable form.
This Sound Identification Tutor must be used in conjunction with the CSLU Toolkit (OHSU #0680).

OHSU # 1195
Clear-Speech Corpus, Speaker JPH
Corpus collected for analyzing differences between two speaking styles, i.e. "clear" and "conversational" speech. Provides 140 sentences and parrallel recordings of clear and conversational speech as well as associated phonetic labels and manually verfied pitch marks, i.e. glottal closure instants.
The Center for Spoken Language Understanding (CSLU) distributes corpora to commercial entities and academic institutions. Corporate members can use these corpora for research but also for creating commercial products such as generating acoustic models for speech recognition.
Developing a successful spoken language system typically requires vast amounts of data, and CSLU has established itself significantly as a collector and distributor of speech corpora. Recognizing that speech corpora are important resources for anyone conducting research in the area of voice processing, we have collected and transcribed telephone and cellular speech data in over 20 languages. CSLU usually has at least one data collection going at any given time.
 
OHSU # 1359
German female speaker diphone voice
Recordings of a female German human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.

OHSU # 1360
German male speaker diphone voice
Recordings of a male German human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.

OHSU # 1361
American English male speaker diphone voice
Recordings of a male American-English human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.

OHSU # 1362
American English female diphone voice (AS)
Recordings of a female American-English human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.

OHSU # 1363
American English female diphone voice (TL)
Recordings of a female American-English human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.

OHSU # 1364
Mexican Spanish male diphone voice
Recordings of a male Mexican-Spanish human voice processed and reformatted to be usable as an Acoustic Inventory Corpus (AIC) by the OGIresLPC sub-system (OHSU #0631) in the Festival open-source text-to-speech system.
The AIC consists of time-stamped digital recordings of raw speech or processed speech together with index files that map phoneme sequences via the time stamps to corresponding time stamps in the digital recordings. The AIC was designed to have optimal coverage of phoneme sequences in which substantial coarticulation can be expected.