Software referees group calls

By Kimberly Patch, Technology Research News

Although teleconferencing has been around for decades, it suffers from logistical limitations. Even with only a handful of participants, it is difficult to tell who is talking, or to choose the right time to chime in. In an increasingly wired world, the lack of a flexible audio interface is becoming more of a problem.

Researchers from Palo Alto Research Center, Inc. (PARC), Stanford University, and Carnegie Mellon University have devised a scheme that gives a group of wireless phone or handheld computer users a more natural teleconferencing environment by keeping track of who is talking when.

The scheme uses the moment-by-moment dynamics of talk to determine which members of a group are actively conversing with each other, and adjusts the audio accordingly, said Paul Aoki, a researcher at Palo Alto Research Center. This makes the teleconferencing space more like a room where people can have side conversations while keeping tabs on the group as a whole.

For instance, in a group of four people -- Alice, Bob, Charlie and Diane -- Alice and Bob might start gossiping about a friend who Charlie and Diane don't know and aren't interested in, and at the same time Charlie and Diane might start a conversation about where they're all going to meet for dinner, said Aoki.

The system will notice these groupings and will make Charlie and Diane quieter from Alice's and Bob's perspectives, and Alice and Bob quieter from Charlie's and Diane's perspectives, Aoki said. But when Bob overhears Diane suggest a restaurant he doesn't like and he starts talking to Charlie and Diane, the system will respond by adjusting the audio so that they're all at the same normal volume level again.

"It's useful to adjust the audio in a remote conversation like this so that you can more easily understand the people with whom you are talking," said Aoki. "It's as if you're at a party, and when you start talking with someone next to you, all of the other people nearby very politely move away to make it easier for you to have your conversation," he said.

The researchers tested their prototype on two groups of four people between the ages of 20 and 40. The groups' conversations included regular teleconferencing and teleconferencing using the researchers' automatic sound adjustment software.

Each group of four split into prearranged pairs and played a party game that involved answering questions intended to provoke conversation. When both pairs had finished their questions, the four participants formed opposite pairs and repeated the question-and-answer process.

The participants were easily able to work with the adjusting audio, said Aoki. "We didn't... tell them what the system was supposed to do... they were able to figure it out on their own."

The researchers observed a large difference between speech patterns in the regular teleconference and in the adjusted audio space, said Aoki. In the regular teleconference, the paired conversations got in each other's way, he said. The pairs "kept starting and stopping their speech in response to the other pair's bursts of talk... like what you do when you're saying something and a train goes by in the middle of your sentence."

When the shared audio space made the correct choice about who was talking to whom, "the pairs could talk as they normally would if they were face-to-face," said Aoki. The conversation was natural enough that when one pair was waiting for the other to finish, "they would often just kill time by talking about topics like sports," he said.

The researchers tapped a sociological discipline -- conversation analysis -- to find ways to automatically tell who is talking to whom.

"Talk isn't just words in a certain order, but is the result of some pretty complicated social practices that we don't generally realize that were using," said Aoki. For instance, people generally take turns talking, and during turn-taking very little time passes from the moment one person stops and another starts.

Conversation analysts review examples of human interaction in order to understand how these practices work.

The researchers quantified speech patterns gleaned by conversation analysts that generally show whether or not people are in conversation, and built software that determines what grouping of people is supported by the best evidence. "The techniques we use are simple enough that all of the processing easily runs in real-time on a desktop PC," he said.

The method is more applicable to mobile phones than research aimed at making audio more intuitive by using stereo effects to trick the human brain into thinking sounds are coming from specific directions in space, said Aoki. Because the PARC method does not require stereo, it could be used in today's mobile phones fairly easily, he said. In addition, changing sound level rather than direction makes it less likely that users will be confused about whether a sound is coming from the phone or the real world, he said.

The researchers' current prototype correctly determines conversational groupings between 68 and 86 percent of the time, according to Aoki.

The system sidesteps a pair of potential pitfalls of shared audio spaces, according to Aoki. "You don't want to just turn off the other people since what they're talking about may become relevant to you very quickly," he said.

And reconfiguring the conversation by punching in a code on a handset is cumbersome, Aoki said. Manual controls are useful too, but "it helps if the system can do some things automatically so that the conversation can move along smoothly," he said.

The researchers are working to improve the system's accuracy, said Aoki. The method could be ready for use in practical applications in three to five years, he said.

Aoki's research colleagues were Matthew Romain of Stanford University, Margaret H. Szymanski, James D. Thornton and Allison Woodruff of Palo Alto Research Center, and Daniel Wilson of Carnegie Mellon University. The researchers presented the work at the Association of Computing Machinery Computer-Human Interaction (ACM-CHI) conference in Fort Lauderdale, Florida, April 5-10, 2003. The research was funded by PARC.

Timeline:   3-5 years
Funding:   Corporate
TRN Categories:  Applied Technology
Story Type:   News
Related Elements:  Technical paper, "The Mad Hatter's Cocktail Party: a Social Mobile Audio Space Supporting Multiple Simultaneous Conversations," presented at at the Association of Computing Machinery Computer-Human Interaction (ACM-CHI) conference in Fort Lauderdale, Florida, April 5-10, 2003


June 18/25, 2003

Page One

Chip sorts colors

Software referees group calls

Prefab key to molecular memory

Wires make wireless strain gauge

News briefs:
See-through circuits closer
Protein traps nanoparticles
Nods drive audio interface
Nano rapid prototyping advances
Practical nanotube fiber near
Nanotube transistors make memory


Research News Roundup
Research Watch blog

View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 

Ad links:
Buy an ad link


Ad links: Clear History

Buy an ad link

Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN

© Copyright Technology Research News, LLC 2000-2006. All rights reserved.