Software referees group calls
Technology Research News
Although teleconferencing has been around
for decades, it suffers from logistical limitations. Even with only a
handful of participants, it is difficult to tell who is talking, or to
choose the right time to chime in. In an increasingly wired world, the
lack of a flexible audio interface is becoming more of a problem.
Researchers from Palo Alto Research Center, Inc. (PARC), Stanford
University, and Carnegie Mellon University have devised a scheme that
gives a group of wireless phone or handheld computer users a more natural
teleconferencing environment by keeping track of who is talking when.
The scheme uses the moment-by-moment dynamics of talk to determine
which members of a group are actively conversing with each other, and
adjusts the audio accordingly, said Paul Aoki, a researcher at Palo Alto
Research Center. This makes the teleconferencing space more like a room
where people can have side conversations while keeping tabs on the group
as a whole.
For instance, in a group of four people -- Alice, Bob, Charlie
and Diane -- Alice and Bob might start gossiping about a friend who Charlie
and Diane don't know and aren't interested in, and at the same time Charlie
and Diane might start a conversation about where they're all going to
meet for dinner, said Aoki.
The system will notice these groupings and will make Charlie and
Diane quieter from Alice's and Bob's perspectives, and Alice and Bob quieter
from Charlie's and Diane's perspectives, Aoki said. But when Bob overhears
Diane suggest a restaurant he doesn't like and he starts talking to Charlie
and Diane, the system will respond by adjusting the audio so that they're
all at the same normal volume level again.
"It's useful to adjust the audio in a remote conversation like
this so that you can more easily understand the people with whom you are
talking," said Aoki. "It's as if you're at a party, and when you start
talking with someone next to you, all of the other people nearby very
politely move away to make it easier for you to have your conversation,"
The researchers tested their prototype on two groups of four people
between the ages of 20 and 40. The groups' conversations included regular
teleconferencing and teleconferencing using the researchers' automatic
sound adjustment software.
Each group of four split into prearranged pairs and played a party
game that involved answering questions intended to provoke conversation.
When both pairs had finished their questions, the four participants formed
opposite pairs and repeated the question-and-answer process.
The participants were easily able to work with the adjusting audio,
said Aoki. "We didn't... tell them what the system was supposed to do...
they were able to figure it out on their own."
The researchers observed a large difference between speech patterns
in the regular teleconference and in the adjusted audio space, said Aoki.
In the regular teleconference, the paired conversations got in each other's
way, he said. The pairs "kept starting and stopping their speech in response
to the other pair's bursts of talk... like what you do when you're saying
something and a train goes by in the middle of your sentence."
When the shared audio space made the correct choice about who
was talking to whom, "the pairs could talk as they normally would if they
were face-to-face," said Aoki. The conversation was natural enough that
when one pair was waiting for the other to finish, "they would often just
kill time by talking about topics like sports," he said.
The researchers tapped a sociological discipline -- conversation
analysis -- to find ways to automatically tell who is talking to whom.
"Talk isn't just words in a certain order, but is the result of
some pretty complicated social practices that we don't generally realize
that were using," said Aoki. For instance, people generally take turns
talking, and during turn-taking very little time passes from the moment
one person stops and another starts.
Conversation analysts review examples of human interaction in
order to understand how these practices work.
The researchers quantified speech patterns gleaned by conversation
analysts that generally show whether or not people are in conversation,
and built software that determines what grouping of people is supported
by the best evidence. "The techniques we use are simple enough that all
of the processing easily runs in real-time on a desktop PC," he said.
The method is more applicable to mobile phones than research aimed
at making audio more intuitive by using stereo effects to trick the human
brain into thinking sounds are coming from specific directions in space,
said Aoki. Because the PARC method does not require stereo, it could be
used in today's mobile phones fairly easily, he said. In addition, changing
sound level rather than direction makes it less likely that users will
be confused about whether a sound is coming from the phone or the real
world, he said.
The researchers' current prototype correctly determines conversational
groupings between 68 and 86 percent of the time, according to Aoki.
The system sidesteps a pair of potential pitfalls of shared audio
spaces, according to Aoki. "You don't want to just turn off the other
people since what they're talking about may become relevant to you very
quickly," he said.
And reconfiguring the conversation by punching in a code on a
handset is cumbersome, Aoki said. Manual controls are useful too, but
"it helps if the system can do some things automatically so that the conversation
can move along smoothly," he said.
The researchers are working to improve the system's accuracy,
said Aoki. The method could be ready for use in practical applications
in three to five years, he said.
Aoki's research colleagues were Matthew Romain of Stanford University,
Margaret H. Szymanski, James D. Thornton and Allison Woodruff of Palo
Alto Research Center, and Daniel Wilson of Carnegie Mellon University.
The researchers presented the work at the Association of Computing Machinery
Computer-Human Interaction (ACM-CHI) conference in Fort Lauderdale, Florida,
April 5-10, 2003. The research was funded by PARC.
Timeline: 3-5 years
TRN Categories: Applied Technology
Story Type: News
Related Elements: Technical paper, "The Mad Hatter's Cocktail
Party: a Social Mobile Audio Space Supporting Multiple Simultaneous Conversations,"
presented at at the Association of Computing Machinery Computer-Human
Interaction (ACM-CHI) conference in Fort Lauderdale, Florida, April 5-10,
June 18/25, 2003
Chip sorts colors
Software referees group
Prefab key to molecular
Wires make wireless
Protein traps nanoparticles
Nods drive audio
Nano rapid prototyping
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog
Buy an ad link