| Conversational engagement trackedBy 
      Kimberly Patch, 
      Technology Research News
 It 
      would be useful if a computer could sense ebbs and flows in conversation 
      in order to automatically adjust remote communications systems. It would 
      be useful, for instance if a system automatically switched from a walkie-talkie-type 
      push-to-talk system to a telephone-like full duplex audio connection when 
      the participants become highly engaged in a conversation.
 
 Language is often fairly cryptic, however. The phrase "I am interested 
      in this conversation", for instance, can signal enjoyment or polite boredom.
 
 Researchers from the University of Rochester and Palo Alto Research 
      Center are aiming to allow computers to automatically assess peoples' engagement 
      in a conversation by analyzing the way they speak rather than what they 
      say.
 
 The researchers' system analyzes tone of voice and prosodic style, 
      which includes changes in strength, pitch and rhythm. "We do not look at 
      what users say, but how involved they are in the conversation when they 
      say it -- how into the conversation they are," said Chen Yu, a University 
      of Rochester researcher who is now an assistant professor of psychology 
      and cognitive science at Indiana University.
 
 As voice communication shifts from traditional telephone networks 
      to the more flexible Internet it is becoming easier to seamlessly shift 
      between different communication channels, said Paul Aoki, a research scientist 
      at the Palo Alto Research Center. The system could be used to automatically 
      adapt voice channels on-the-fly.
 
 The system could also make it possible for computers to adjust to 
      users in other ways, said Aoki. "If your computer can detect that you are 
      deeply engaged in conversation with another person, whether on the telephone 
      or the same room... it might defer a loud announcement that you have new 
      email, or it might set your instant messaging status to busy," he said.
 
 Although humans are social animals, machine understanding of users' 
      social states has received relatively little attention, said Yu.
 
 Detecting how engaged people are from the sound of their voices 
      is not straightforward, said Aoki. Previous research has tried to glean 
      information about engagement by detecting emotion. But engagement is not 
      the same as emotion. "You can be highly engaged in sad... or angry conversations 
      as well as happy ones," he said.
 
 The researchers' system adds the ability to sense characteristics 
      of conversational engagement to previous methods of recognizing speech emotion, 
      taking into consideration changes in emotion over time and the influence 
      of participants on each other.
 
 The system measures the prosodic aspects for individual users and 
      feeds the results into a first-level module that has been trained to recognize 
      patterns in these measurements, associating certain patterns with particular 
      emotional states, said Aoki. The system measures the strength of emotion, 
      whether the emotion is positive or negative, and emotion type -- anger, 
      panic, sadness, happiness, interest, boredom, and the absence of emotion. 
      This first-level measurement only reflects an individual's state at a moment 
      in time.
 
 To decide how engaged the user is in the conversation, the second 
      level looks at patterns in the stream of emotion states over time, and at 
      the emotion states of the other person in the conversation, said Aoki. "We 
      added this consideration of both time and other people because we wanted 
      to model the fact that conversation is a social interaction," he said. "Whether 
      or not you are engaged in a particular conversation at a given moment is 
      part of a social process that changes over time and involves all of the 
      participants in the conversation."
 
 The system measures five levels of engagement. The researchers' 
      used recorded phone conversations to test the system. Using just the first 
      level emotion detector they were able to rank the levels of engagement with 
      a 47 percent accuracy rate, which is more than double the 20 percent accuracy 
      that would result from random choices. The method to track emotion over 
      time boosted the accuracy rate to 61 percent. Adding emotion tracking of 
      the person the subject was talking to boosted the accuracy rate to 63 percent.
 
 One technical challenge in building the system was finding methods 
      that categorize emotional states accurately and worked well across different 
      speakers, said Yu. "People's emotional responses and the ways in which they 
      convey emotion using speech vary widely across individuals," he said.
 
 The Palo Alto Research Center scientists are working to add the 
      software to their existing voice communication system in order to do real-world 
      tests.
 
 The overall goal of the research is to build voice communication 
      systems that respond to the way people talk, said Aoki. Now that lots of 
      people have mobile phones, talk within tight social groups like teenage 
      or young adult friends can be very frequent. At the same time, frequent 
      phone calls can be annoying. "We're trying to build systems that let people 
      ease in and out of remote conversations, just as you can when people are 
      physically together," said Aoki. "Determining how engaged users are in the 
      conversation is one part of that research."
 
 The method could be used in practical applications in three to six 
      years, said Yu.
 
 Yu and Aoki's research colleague was Allison Woodruff, who is now 
      at Intel Research. The work appeared in the proceedings of the 8th International 
      Conference on Spoken Language Processing (ICSLP) held October 4 to 8, 2004 
      on Jeju Island in Korea. The research was funded by the Palo Alto Research 
      Center.
 
 Timeline:   3-6 years
 Funding:   Corporate
 TRN Categories:  Human-Computer Interaction; Pattern Recognition
 Story Type:   News
 Related Elements:  Technical paper, "Detecting User Engagement 
      in Everyday Conversations," proceedings of the 8th International Conference 
      on Spoken Language Processing (ICSLP) on Jeju Island in Korea October 4-8, 
      2004 and posted on the Computing Research Repository (CoRR)at arxiv.org/PS_cache/cs/pdf/0410/0410027.pdf.
 
 
 
 
 Advertisements:
 
 
 
 | December 1/8, 2004
 
 Page 
      One
 
 For pure nanotubes add 
      water
 
 Solar cell doubles as 
      battery
 
 Conversational engagement 
      tracked
 
 Pure silicon laser debuts
 
 Briefs:
 Tight twist 
      toughens nanotube fiber
 Multicamera 
      surveillance automated
 Chemical keeps 
      hydrogen on ice
 Smart dust gets magnetic
 Short nanotubes 
      carry big currents
 Demo advances 
      quantum networking
 
 News:
 Research News Roundup
 Research Watch blog
 
 Features:
 View from the High Ground Q&A
 How It Works
 
 RSS Feeds:
 News
  | Blog  | Books  
 
   
 Ad links:
 Buy an ad link
 
 
 
         
          | Advertisements: 
 
 
 
 |   
          |  
 
 
 |  |  |