Conversational engagement tracked
Technology Research News
would be useful if a computer could sense ebbs and flows in conversation
in order to automatically adjust remote communications systems. It would
be useful, for instance if a system automatically switched from a walkie-talkie-type
push-to-talk system to a telephone-like full duplex audio connection when
the participants become highly engaged in a conversation.
Language is often fairly cryptic, however. The phrase "I am interested
in this conversation", for instance, can signal enjoyment or polite boredom.
Researchers from the University of Rochester and Palo Alto Research
Center are aiming to allow computers to automatically assess peoples' engagement
in a conversation by analyzing the way they speak rather than what they
The researchers' system analyzes tone of voice and prosodic style,
which includes changes in strength, pitch and rhythm. "We do not look at
what users say, but how involved they are in the conversation when they
say it -- how into the conversation they are," said Chen Yu, a University
of Rochester researcher who is now an assistant professor of psychology
and cognitive science at Indiana University.
As voice communication shifts from traditional telephone networks
to the more flexible Internet it is becoming easier to seamlessly shift
between different communication channels, said Paul Aoki, a research scientist
at the Palo Alto Research Center. The system could be used to automatically
adapt voice channels on-the-fly.
The system could also make it possible for computers to adjust to
users in other ways, said Aoki. "If your computer can detect that you are
deeply engaged in conversation with another person, whether on the telephone
or the same room... it might defer a loud announcement that you have new
email, or it might set your instant messaging status to busy," he said.
Although humans are social animals, machine understanding of users'
social states has received relatively little attention, said Yu.
Detecting how engaged people are from the sound of their voices
is not straightforward, said Aoki. Previous research has tried to glean
information about engagement by detecting emotion. But engagement is not
the same as emotion. "You can be highly engaged in sad... or angry conversations
as well as happy ones," he said.
The researchers' system adds the ability to sense characteristics
of conversational engagement to previous methods of recognizing speech emotion,
taking into consideration changes in emotion over time and the influence
of participants on each other.
The system measures the prosodic aspects for individual users and
feeds the results into a first-level module that has been trained to recognize
patterns in these measurements, associating certain patterns with particular
emotional states, said Aoki. The system measures the strength of emotion,
whether the emotion is positive or negative, and emotion type -- anger,
panic, sadness, happiness, interest, boredom, and the absence of emotion.
This first-level measurement only reflects an individual's state at a moment
To decide how engaged the user is in the conversation, the second
level looks at patterns in the stream of emotion states over time, and at
the emotion states of the other person in the conversation, said Aoki. "We
added this consideration of both time and other people because we wanted
to model the fact that conversation is a social interaction," he said. "Whether
or not you are engaged in a particular conversation at a given moment is
part of a social process that changes over time and involves all of the
participants in the conversation."
The system measures five levels of engagement. The researchers'
used recorded phone conversations to test the system. Using just the first
level emotion detector they were able to rank the levels of engagement with
a 47 percent accuracy rate, which is more than double the 20 percent accuracy
that would result from random choices. The method to track emotion over
time boosted the accuracy rate to 61 percent. Adding emotion tracking of
the person the subject was talking to boosted the accuracy rate to 63 percent.
One technical challenge in building the system was finding methods
that categorize emotional states accurately and worked well across different
speakers, said Yu. "People's emotional responses and the ways in which they
convey emotion using speech vary widely across individuals," he said.
The Palo Alto Research Center scientists are working to add the
software to their existing voice communication system in order to do real-world
The overall goal of the research is to build voice communication
systems that respond to the way people talk, said Aoki. Now that lots of
people have mobile phones, talk within tight social groups like teenage
or young adult friends can be very frequent. At the same time, frequent
phone calls can be annoying. "We're trying to build systems that let people
ease in and out of remote conversations, just as you can when people are
physically together," said Aoki. "Determining how engaged users are in the
conversation is one part of that research."
The method could be used in practical applications in three to six
years, said Yu.
Yu and Aoki's research colleague was Allison Woodruff, who is now
at Intel Research. The work appeared in the proceedings of the 8th International
Conference on Spoken Language Processing (ICSLP) held October 4 to 8, 2004
on Jeju Island in Korea. The research was funded by the Palo Alto Research
Timeline: 3-6 years
TRN Categories: Human-Computer Interaction; Pattern Recognition
Story Type: News
Related Elements: Technical paper, "Detecting User Engagement
in Everyday Conversations," proceedings of the 8th International Conference
on Spoken Language Processing (ICSLP) on Jeju Island in Korea October 4-8,
2004 and posted on the Computing Research Repository (CoRR)at arxiv.org/PS_cache/cs/pdf/0410/0410027.pdf.
December 1/8, 2004
For pure nanotubes add
Solar cell doubles as
Pure silicon laser debuts
toughens nanotube fiber
hydrogen on ice
Smart dust gets magnetic
carry big currents
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog
Buy an ad link