CORPORA from CSLU: Cellular words and phrases


The Cellular Words and Phrases Corpus consists of utterances gathered from callers who were using cellular telephones. Each caller listened and responded to a series of pre-recorded prompts from a fixed protocol. There are 346 callers represented in this corpus.

Recording Details:
The data were recorded from an analog line using a Gradient Technologies analog-to-digital conversion box. The .wav file format used for this corpus is the RIFF standard file format. This file format is 16-bit linearly encoded.

Directory Structure:
There are five top-level directories in this distribution: docs, speech, trans, labels, and misc. The docs directory contains assorted documentation files; the misc directory contains archival material and, possibly, scripts and tools; the labels directory contains time align phoneme labels.

The speech and trans directories contain the data files, which have the following name structure:





## = call number


yyyyy = utterance code


zzz = file extension (txt/wav)

For example:

CEcall-49.yes. wav

This utterance is from caller 49 and should contain the word "Yes".

A corresponding transcription can be found in the file: CEcall-49.yes. txt

These audio and text files are subdivided into directories based on their call number div 10. So, these files would be found in /cwp/speech/4 and /cwp/trans/4, respectively.

The Center for Spoken Language Understanding (CSLU) distributes corpora to commercial entities and academic institutions for a fee. Commercial entities can use these corpora for research but also for creating commercial products such as generating acoustic models for speech recognition.


The text transcriptions were performed according to the non time-aligned word-level conventions described in the CSLU Labeling Guide.


To place your order:

1. Click on the type of license you wish to order: Academic or non-profit entity or Commercial entity.

2. Terms of the license agreement can be viewed by clicking on the word "terms".

3. You agree to the terms of the license agreement when you click on "Add to Order" and proceed to the next screen.

4. If information on the "Order Contents" screen is correct, press "Check out".

5. On the next screen, a brief "Intended Use" is required. For "Recipient Scientist Information" enter the appropriate information for yourself or if you are placing the order for another person enter that information. We will use this information should we have questions about the order, payment or shipping address.

6. Once your payment has been received and verified by OHSU, your order will be approved by Technology Transfer & Business Development and then the DVD will be sent out by the Center for Spoken Language Understanding by FedEx within 5-10 business days.  


For more information, contact:

Trina Voss
Technology Development Manager

