Fake Voice Recordings Easy to Make, Hard to Detect
04/09/03 Portland, Ore.Investigators at OHSU's OGI School of Science & Engineering say many scientists can transform one person's voice into another's
Are the audiotapes periodically released by Osama bin Laden real or fake? Their poor audio quality and the increasing availability of reliable voice transformation technologies certainly means there is a chance they're fake, said an Oregon scientist.
At the OGI School's Center for Spoken Language Understanding, a new method for voice transformation was developed by one of van Santen's researchers, Alexander Kain, Ph.D. The method mimics the fine acoustic detail that reflects the unique characteristics of an individual's oral cavity and vocal cords.
"The important point isn't that we know how to do this, but that many scientists already have similar transformation methods and that these methods are simple to implement and readily available to anyone in the literature and on the Internet," noted van Santen.
To effectively mimic someone's voice, an original recording is needed of the person you want to imitate. Then you need to find and record someone reading the exact same text in a similar voice, word set and dialect.
"The actor that is used needs to do a reasonably good job of imitating the original speaker's dialect, melody and rhythm," said Kain. "Our system, for example, is then "trained" on these two parallel recordings and "learns" how to transform new speech into speech that sounds like someone else's. Once these elements are in place, all the actor needs to do is say new text with the original speaker's melody and rhythm, and the system transforms these recordings into speech that will have the same voice characteristics."
Said van Santen, "This is just one example of a voice transformation method that might be used. Many scientists have similar methods, so it is becoming increasingly difficult to detect whether a recorded voice is real or fake.
We think we know how to detect whether our voice transformation method was used, but each method leaves behind different cues, so it may soon become completely impossible to detect whether a tape is an original, particularly if there is poor audio quality and a good voice transformation system has been used."
The Center for Spoken Language Understanding has five full-time faculty, four postdocs, a dozen graduate students and additional programming staff who are focusing on speech technologies for a wide range of applications, especially education and health applications. Speech technology could someday be used to help people learn to read, to help non-native speakers learn English, and to give autistic people more ways to communicate. For more information, visit http://cslu.cse.ogi.edu/.
The OGI School of Science & Engineering (formerly the Oregon Graduate Institute of Science & Technology) became one of four schools of the Oregon Health & Science University in 2001. The OHSU School of Science & Engineering has more than 100 full-time and adjunct faculty, and more than 300 master's and doctoral students seeking degrees in five academic departments. In addition, there are 300 students taking credit courses, but not seeking degrees at this time. Each year, the school's Center for Professional Development enrolls more than 1,000 working professionals who take not-for-credit classes.