queries bridge search and speech
Technology Research News
After 30-odd years of computer speech recognition
development, researchers are still looking for ways to make it easier
for a computer to sift individual words from the constant stream of syllables
that is spoken language.
What is most difficult for speech software is recognizing the borders
of a word that's not in its dictionary. Researchers from the Japanese
University of Library and Information Science and the Japanese National
Institute of Advanced Industrial Science and Technology have made this
a little easier.
The researchers have found a way to help speech recognition programs used
to retrieve information from data collections like the Web identify out-of-vocabulary
sequences of syllables. In a sense, the researchers have given computers
a faster way to sound out words they don't already know.
State-of-the-art information retrieval systems allow users to input any
number of keywords into a vocabulary. "It is often the case that a couple
million terms are indexed for a single information retrieval system,"
said Atsushi Fujii a research assistant at the University of Library and
Information Science in Japan.
State-of-the-art speech recognition systems have to limit vocabulary size
to a few tens of thousands of words in order to match each syllable sequence
to a word in real-time.
Because of the limited speech recognition vocabulary sizes, however, when
speech is used to query information retrieval systems, some of the words
may not be in the speech recognition vocabulary.
The trick to finding these words is knowing where to look. When someone
uses speech recognition as an interface to search a collection of data,
he naturally utters words related to the unrecognized query term, said
To take advantage of this, the system carries out the query using the
words the computer does recognize, then looks in those documents for words
that are phonetically identical or similar to the unrecognized syllable
sequences. The system then queries the documents again using the new-found
words. This two-step process makes it possible for the computer to match
an unrecognized syllable sequence to a real word relatively quickly, according
The researchers tested their method by dictating queries to archives of
newspaper articles. The method improved the information retrieval system's
accuracy and did not increase the search time, according to Fujii.
The researchers also used their data retrieval method to beef up a speech
recognition system's vocabulary with appropriate new words. "We used a
target collection to recover speech recognition errors so as to improve
the quality of [both] speech recognition and information retrieval," Fujii
The method is a way to improve speech-driven information retrieval systems,
which could lead to interactive dialogue and question-answering systems
that allow users to control computers by speech, according to Fujii. These
include car navigation systems, and Web search using telephones and mobile
computers, he said.
The researchers have come up with a "clever trick" for turning sequences
of syllables that are not in a speech recognizer's vocabulary into words,
said Brian Roark, a senior technical staff member at AT&T Research. "This
takes a step toward solving the problem of turning... syllable sequences
into [correctly spelled] words," he said.
The method is potentially useful for speech recognition in general, Roark
said. "If you can somehow leverage a particular task to give an indication
of likely [out-of-vocabulary] words in a particular context, it might
be possible to exploit this," he said.
But because large vocabulary recognition programs don't come across a
lot of out-of-vocabulary sequences the total possible gain in recognition
from this method would probably be fairly small, Roark added.
The researchers' next step is to do larger-scale experiments using different
types of document collections, such as technical papers and Web pages,
The researchers' current experiments use Japanese speech that is dictated
directly to the computer, said Fujii. Ultimately, the researchers are
aiming to be able to process spontaneous speech in different languages,
Practical applications using dictated speech are technically possible
within two years, said Fujii. Applications that can handle spontaneous
speech will take more than three years, he added.
Fujii's research colleagues were Katunobu Itou of the National Institute
of Advanced Industrial Science and Technology in Japan, and Tetsuya Ishikawa
of the University of Library and Information Science. The research was
funded by the University of Library and Information Science and the Japan
Science and Technology Corporation (JST).
Timeline: < 2 years, > 3 years
Funding: University, Corporate
TRN Categories: Databases and Information Retrieval; Human-Computer
Story Type: News
Related Elements: Technical paper, "A Method for Open-Vocabulary
Speech-Driven Text Retrieval," posted in the arXiv physics archive at
Disks set to go ballistic
queries bridge search and speech
nerve cells to electronics
Silicon chips set to
promises powerful computers
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog
Buy an ad link