Common sense boosts speech software

By Eric Smalley, Technology Research News

There's nothing like losing an ability you take for granted for inspiring creative solutions.

A researcher at the Massachusetts Institute of Technology's Media Lab found that out when a bicycle accident broke both hands, leaving him unable to type for a few months.

"I decided it was a good time to learn about speech recognition," said Henry Lieberman. "I realized that the work we were doing in common sense reasoning could help. We had already done an interface for predictive typing and I realized the same principles would apply," he said.

Speech recognition software matches strings of phonemes -- the sounds that make up words -- to words in a vocabulary database. The software finds close matches and presents the best one. The software does not understand word meaning, however. This makes it difficult to distinguish among words that sound the same or similar.

The Open Mind Common Sense Project database contains more than 700,000 facts that MIT Media Lab researchers have been collecting from the public since the fall of 2000. These are based on common sense like the knowledge that a dog is a type of pet rather than the knowledge that a dog is a type of mammal.

The researchers used the phrase database to reorder the close matches returned by speech recognition software. In the example 'My bike has a squeaky brake', ordinary speech recognition software might have trouble distinguishing between "brake" and "break", but the researchers' system knows that bicycles have brakes, and so makes the correct choice, said Lieberman.

The researchers evaluated their common sense speech recognition technique by logging the errors and dictation times of users who dictated text that contained topics covered by the Open Mind database. It prevented 17 percent of the errors and reduced dictation time by 7.5 percent, said Lieberman.

In addition to reducing errors, the approach improves error correction. When a user indicates an error while dictating using speech recognition software, the software presents a menu of alternatives and the user selects one. The researchers found that users often gave up searching the menu before reaching the end, and so dictated phrases over again even though the correct word or phrase was available at the end of the menu. The common sense filtering assures that the correct word is more likely to appear at the top of the list, he said.

Researchers have used other ways of improving the choices speech recognition software makes, including methods that put more emphasis on the most common English words, words that commonly occur together, and the speaker's most recent words, said Lieberman. However, none of these can tell if a word makes sense in a given context, he said.

"One surprising thing about testing interfaces like this is that sometimes, even if they don't get the absolutely correct answer, users like them a lot better," said Lieberman. "This is because they make plausible mistakes, for example 'tennis clay court' for 'tennis player', rather than completely arbitrary mistakes that a statistical recognizer might make, for example 'tennis slayer'," he said.

"This suggests that there ought to be more research into how to get computers to make better mistakes," said Lieberman.

The researchers are working on an improved interface for correcting speech recognition mistakes, said Lieberman. Menu correction takes 10 times as long as saying a single word, so directly inserting one of a few likely correction alternatives chosen using the common sense technique would improve throughput, he said.

The software could be used with today's commercial speech recognition technology, according to Lieberman. "Certainly with a year or so of development work, people could see substantial improvements," he said.

Lieberman's research colleagues were Alexander Faaborg, Waseem Dahera and Josť Espinoza. They presented the research at the Intelligent User Interfaces Conference (IUI 2005), held in San Diego, January 9 to 12, 2005. The research was funded by the MIT Media Lab's corporate and government sponsors.

Timeline:   Now
Funding:   Corporate, Government
TRN Categories:  Human-Computer Interaction
Story Type:   News
Related Elements:  Technical paper, "How to Wreck a Nice Beach You Sing Calm Incense," Intelligent User Interfaces Conference (IUI 2005), San Diego, January 9-12, 2005


March 23/30, 2005

Page One

Tool turns English to code
Common sense boosts speech software
Inkjet prints human cells
How it Works: Biochips

Nanowires track molecular activity
Microdroplet makes mighty microscope
Cheap material makes speedy memory
Tiny crystals adjust laser colors
Electricity controls biomolecules
Nanotubes juice super batteries
Layers promise cheap circuits


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.