CORPORA from CSLU: Spelled and spoken words v1.2
OHSU # 0681-O
The Spelled and Spoken Words corpus consists of spelled and spoken words. 3647 callers were prompted to say and spell their first and last names, to say what city they grew up in and what city they were calling from, and to answer two yes/no questions. In order to collect sufficient instances of each letter, about 1000 callers also recited the English alphabet with pauses between the letters. Each call was transcribed by two people, and all differences were resolved. In addition, a subset of the calls has been phonetically labeled.
Each subject called the CSLU data collection system by dialing a toll-free number. An analog telephone line was connected to a Gradient Technologies box. Data from incoming calls were recorded by the Gradient box. The sampling rate was 8khz and the files were stored in 16-bit linear format on a UNIX file system. Each utterance was recorded as a separate file.
A press release describing our research project and the need for volunteers produced newspaper, radio and television coverage. In addition, we posted requests for callers on several university bulletin boards and national computer newsgroups.
Each file in the corpus was listened to and transcribed by two transcribers. Any differences between the two transcribers' transcriptions were examined and resolved.
Some of the utterances were phonetically transcribed using a TIMIT-like phonetic alphabet. The transcription followed conventions that provided the ground work for the more elaborate conventions described in The CSLU Labeling Guide.
Cole, R. A., M. Fanty and K. Roginski, "A Telephone Speech Database of Spelled and Spoken Names" , Proceedings of the International Conference on Spoken Language Processing, Banff, Alberta, Oct. 12-16, pp. 891-893, (1992).
The Center for Spoken Language Understanding (CSLU) distributes corpora to commercial entities and academic institutions for a fee. Commercial entities can use these corpora for research but also for creating commercial products such as generating acoustic models for speech recognition.
To place your order:
1. Click on the type of license you wish to order: Academic or non-profit entity or Commercial entity.
2. Terms of the license agreement can be viewed by clicking on the word "terms".
3. You agree to the terms of the license agreement when you click on "Add to Order" and proceed to the next screen.
4. If information on the "Order Contents" screen is correct, press "Check out".
5. On the next screen, a brief "Intended Use" is required. For "Recipient Scientist Information" enter the appropriate information for yourself or if you are placing the order for another person enter that information. We will use this information should we have questions about the order, payment or shipping address.
6. Once your payment has been received and verified by OHSU, your order will be approved by Technology Transfer & Business Development and then the DVD will be sent out by the Center for Spoken Language Understanding by FedEx within 5-10 business days.
For demos and more information, visit the CSLU Corpora website at:
- CSLU, SOM CSLU
For more information, contact:
Technology Development Manager