Categories | Inventors
CORPORA from CSLU: Numbers v1.3
OHSU # 0681-L
Categories:
Inventors:
- CSLU, SOM CSLU
General Description
The Numbers Corpus is a collection of naturally produced numbers. The utterances were taken from other CSLU telephone speech data collections, and include isolated digit strings, continuous digit strings, and ordinal/cardinal numbers. A total of 23902 files are included in this corpus.
Recording Details
The data in this corpus
were collected over telephone lines. They were collected from both analog and
digital phone lines.
The analog data were recorded using a Gradient
Tehcnologies analog-to- digital conversion box. These files were recorded as
16-bit, 8 khz and stored in a linear format.
The digital data were
recorded with the CSLU T1 digital data collection system. These files were
sampled at 8 khz 8-bit and stored as ulaw files.
All of the data have
been linearly encoded in the 16-bit RIFF standard file format.
Directory
Structure
There are five top-level directories in this distribution: docs,
speech, labels, trans, and misc. The docs directory contains assorted
documentation files.
The speech, trans, and labels directories contain
the data files, which have the following name
structure:
NU-xxxxx.yyyy.zzz
xxxxx = call number
y = utterance code
zzz = file extension (txt/wav/phn)
For
example:
NU-1016.street.wav
This utterance is from caller 1016 and contains
numbers from a street address.
Corresponding text and phonetic
transcriptions can be found in these files:
NU-1016.street.txt
NU-1016.street.phn
These audio and text files are subdivided into
directories based on their call number div 100. So, these files would be found
in /numbers/speech/10,,
/numbers/trans/10,
and /numbers/labels/10, respectively.
Transcriptions
The text
transcriptions were performed according to the non time-aligned word-level
conventions described in the CSLU Labeling
Guide.
Phonetic transcriptions are plain text files
that carry time-aligned phonetic labels. The first two lines of the file are a
header, which defines the length of a "frame" in milliseconds. The rest of the
files consists of two numbers that define a frame range, and a label that
applies to that region. For example:
MillisecondsPerFrame: 1.000000
END OF HEADER
2 113 .pau
113 191 w
191 267 ^
267 395 n
So, we can see here that a frame corresponds to 1 millisecond (ms) of time, and that from 2 to 113 ms into the file, there is a pause (.pau), with the first phoneme (w) starting at 113 ms and stretching to 191 ms.
The Center for Spoken Language Understanding (CSLU) distributes corpora to commercial entities and academic institutions for a fee. Commercial entities can use these corpora for research but also for creating commercial products such as generating acoustic models for speech recognition.
To place your order:
1. Click on the type of license you wish to order. The Academic or non-profit entity fee is $50; Commercial entity fee is $2,500.
2. Terms of the license agreement can be viewed by clicking on the word "terms".
3. You agree to the terms of the license agreement when you click on "Add to Order" and proceed to the next screen.
4. If information on the "Order Contents" screen is correct, press "Check out".
5. On the next screen, a brief "Intended Use" is required. For "Recipient Scientist Information" enter the appropriate information for yourself or if you are placing the order for another person enter that information. We will use this information should we have questions about the order, payment or shipping address.
6. Once your payment has been received and verified by OHSU, your order will be approved by Technology Transfer & Business Development and then the DVD will be sent out by the Center for Spoken Language Understanding by FedEx within 5-10 business days.
For more information, visit the CSLU Corpora website at:
http://www.cslu.ogi.edu/corpora/corpCurrent.html
For more information, contact:
Michele Gunness
Senior Technology Development Manager
503-494-4184
