Quantum math models speech

By Eric Smalley, Technology Research News

It is easy to tell whether a voice heard over the phone is that of a person or a computer. This is a good indication that scientists still don't fully understand how the human voice works.

Researchers at King's College London and Phonologica Ltd. are using mathematical tools from quantum physics to address the problem. They have found that the vocal tract shapes sound waves in a way that is more complicated than the conventional wisdom, which is based on science from more than a century ago, tells.

The researchers' concise model of the physics of speech could play a significant role in improving telecommunications, speech recognition and speech synthesis technologies.

The researchers' modeled the way the frequencies that make up sound waves spread out when the waves encounter the dents and bumps that appear in the human vocal tract during speech. Although wave dispersion is widely studied in optics, in part because wave dispersion degrades optical communications, the standard model of vocal acoustics does not take wave dispersion into account.

The researchers' model can be understood in terms of sound waves in a straight pipe, like an organ pipe, said Barbara Forbes, founder of Phonologica Ltd. and a visiting research associate in physics at King's College London. "We find that the natural resonance frequencies of a straight pipe can be shifted up or down in a precise and controllable way by the introduction of [dents and bumps] at particular places," she said.

The traditional view of sound wave physics holds that the maximum shift in resonance occurs at the point a sound wave pressure node, or point of minimum pressure, meets a change in the shape of the pipe wall. The researchers' results, however, showed that the wave does not spread out at that point. Instead, they found that complex effects near the pressure node are responsible for the shifts, said Forbes.

The researchers are able to shift multiple resonance frequencies independently, which is a key aspect of how humans produce speech. They found that specific degrees of change in curvature at only six places in the vocal tract are sufficient to reproduce 30 vowels sounds, said Forbes. "[This is] enough to describe the basic systems of all the world's languages," said Forbes.

The researchers' model is a step toward providing researchers with a simple method of analyzing and reproducing speech. Keeping things simple is key to advancing speech-related technologies. "The search for a minimal number of parameters to describe speech acoustics and the speech signal has been going on since the 1950s," said Forbes.

Having a small number of parameters accurately represent speech makes it possible to compress the relatively large amount of acoustical information that makes up speech into a much smaller amount of digital information, making it easier to transmit and store. "Mapping the full-bandwidth speech signal onto a sparse representation or code is necessary for ultra-low-bit-rate technologies such as mobile telephony," she said.

The model could be used in new approaches to speech recognition. "Current systems work by statistical modeling alone, and make no use of knowledge about either vocal tract physics or linguistics," said Forbes. Current systems use statistical probabilities to match sound wave patterns to phonemes. There "This is why they have such problems in adapting to natural human... speech in normal levels of background noise," she said. "Our system is based on [a] parameterization of vocal tract physics, and we believe this will eventually lead to a more natural speech interface."

Speech synthesis has improved in recent years but still has a long way to go to produce natural-sounding voices. The researchers have used their model to generate vowel sounds, but the results are preliminary. "Really natural speech synthesis will require incorporation of finer physiological detail than we are currently considering," said Forbes. "For example, our current simulations assume a rather simple model of excitation at the larynx," she said.

The researchers are extending their wave-mechanical model to consonant sounds, and are using quantum mathematics to determine the parameters of speech acoustics, said Forbes.

The researchers are aiming to have a prototype recognition system ready for demonstration within two years, said Forbes. "The modeling of connected speech processes will take a bit longer, say around 5 years," she said.

Forbes' research colleagues was E. Roy Pike Of King's College London. They published the research in the July 30, 2004 issue of Physical Review Letters. The research was funded by the UK Engineering and Physical Sciences Research Council and IP2IPO PLC.

Timeline:   2, 5 years
Funding:   Government; Private
TRN Categories:  Human-Computer Interaction; Physics
Story Type:   News
Related Elements:  Technical paper, "Acoustical Klein-Gordon Equation: A Time-Independent Perturbation Analysis," Physical Review Letters, July 30, 2004


October 6/13, 2004

Page One

Atomic clock to sync handhelds

Quantum math models speech

Page layout drives Web search

Fluid chip does binary logic

Chip spots DNA electrochemically
Crystal structure tunes nanowires
Gas flow makes electricity
Sound makes electricity for space
Design rules build on self-assembly
Nanotube diode reverses itself


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.