| Software referees group callsBy 
      Kimberly Patch, 
      Technology Research News
 Although teleconferencing has been around 
        for decades, it suffers from logistical limitations. Even with only a 
        handful of participants, it is difficult to tell who is talking, or to 
        choose the right time to chime in. In an increasingly wired world, the 
        lack of a flexible audio interface is becoming more of a problem.
 
 Researchers from Palo Alto Research Center, Inc. (PARC), Stanford 
        University, and Carnegie Mellon University have devised a scheme that 
        gives a group of wireless phone or handheld computer users a more natural 
        teleconferencing environment by keeping track of who is talking when.
 
 The scheme uses the moment-by-moment dynamics of talk to determine 
        which members of a group are actively conversing with each other, and 
        adjusts the audio accordingly, said Paul Aoki, a researcher at Palo Alto 
        Research Center. This makes the teleconferencing space more like a room 
        where people can have side conversations while keeping tabs on the group 
        as a whole.
 
 For instance, in a group of four people -- Alice, Bob, Charlie 
        and Diane -- Alice and Bob might start gossiping about a friend who Charlie 
        and Diane don't know and aren't interested in, and at the same time Charlie 
        and Diane might start a conversation about where they're all going to 
        meet for dinner, said Aoki.
 
 The system will notice these groupings and will make Charlie and 
        Diane quieter from Alice's and Bob's perspectives, and Alice and Bob quieter 
        from Charlie's and Diane's perspectives, Aoki said. But when Bob overhears 
        Diane suggest a restaurant he doesn't like and he starts talking to Charlie 
        and Diane, the system will respond by adjusting the audio so that they're 
        all at the same normal volume level again.
 
 "It's useful to adjust the audio in a remote conversation like 
        this so that you can more easily understand the people with whom you are 
        talking," said Aoki. "It's as if you're at a party, and when you start 
        talking with someone next to you, all of the other people nearby very 
        politely move away to make it easier for you to have your conversation," 
        he said.
 
 The researchers tested their prototype on two groups of four people 
        between the ages of 20 and 40. The groups' conversations included regular 
        teleconferencing and teleconferencing using the researchers' automatic 
        sound adjustment software.
 
 Each group of four split into prearranged pairs and played a party 
        game that involved answering questions intended to provoke conversation. 
        When both pairs had finished their questions, the four participants formed 
        opposite pairs and repeated the question-and-answer process.
 
 The participants were easily able to work with the adjusting audio, 
        said Aoki. "We didn't... tell them what the system was supposed to do... 
        they were able to figure it out on their own."
 
 The researchers observed a large difference between speech patterns 
        in the regular teleconference and in the adjusted audio space, said Aoki. 
        In the regular teleconference, the paired conversations got in each other's 
        way, he said. The pairs "kept starting and stopping their speech in response 
        to the other pair's bursts of talk... like what you do when you're saying 
        something and a train goes by in the middle of your sentence."
 
 When the shared audio space made the correct choice about who 
        was talking to whom, "the pairs could talk as they normally would if they 
        were face-to-face," said Aoki. The conversation was natural enough that 
        when one pair was waiting for the other to finish, "they would often just 
        kill time by talking about topics like sports," he said.
 
 The researchers tapped a sociological discipline -- conversation 
        analysis -- to find ways to automatically tell who is talking to whom.
 
 "Talk isn't just words in a certain order, but is the result of 
        some pretty complicated social practices that we don't generally realize 
        that were using," said Aoki. For instance, people generally take turns 
        talking, and during turn-taking very little time passes from the moment 
        one person stops and another starts.
 
 Conversation analysts review examples of human interaction in 
        order to understand how these practices work.
 
 The researchers quantified speech patterns gleaned by conversation 
        analysts that generally show whether or not people are in conversation, 
        and built software that determines what grouping of people is supported 
        by the best evidence. "The techniques we use are simple enough that all 
        of the processing easily runs in real-time on a desktop PC," he said.
 
 The method is more applicable to mobile phones than research aimed 
        at making audio more intuitive by using stereo effects to trick the human 
        brain into thinking sounds are coming from specific directions in space, 
        said Aoki. Because the PARC method does not require stereo, it could be 
        used in today's mobile phones fairly easily, he said. In addition, changing 
        sound level rather than direction makes it less likely that users will 
        be confused about whether a sound is coming from the phone or the real 
        world, he said.
 
 The researchers' current prototype correctly determines conversational 
        groupings between 68 and 86 percent of the time, according to Aoki.
 
 The system sidesteps a pair of potential pitfalls of shared audio 
        spaces, according to Aoki. "You don't want to just turn off the other 
        people since what they're talking about may become relevant to you very 
        quickly," he said.
 
 And reconfiguring the conversation by punching in a code on a 
        handset is cumbersome, Aoki said. Manual controls are useful too, but 
        "it helps if the system can do some things automatically so that the conversation 
        can move along smoothly," he said.
 
 The researchers are working to improve the system's accuracy, 
        said Aoki. The method could be ready for use in practical applications 
        in three to five years, he said.
 
 Aoki's research colleagues were Matthew Romain of Stanford University, 
        Margaret H. Szymanski, James D. Thornton and Allison Woodruff of Palo 
        Alto Research Center, and Daniel Wilson of Carnegie Mellon University. 
        The researchers presented the work at the Association of Computing Machinery 
        Computer-Human Interaction (ACM-CHI) conference in Fort Lauderdale, Florida, 
        April 5-10, 2003. The research was funded by PARC.
 
 Timeline:   3-5 years
 Funding:   Corporate
 TRN Categories:  Applied Technology
 Story Type:   News
 Related Elements:  Technical paper, "The Mad Hatter's Cocktail 
        Party: a Social Mobile Audio Space Supporting Multiple Simultaneous Conversations," 
        presented at at the Association of Computing Machinery Computer-Human 
        Interaction (ACM-CHI) conference in Fort Lauderdale, Florida, April 5-10, 
        2003
 
 
 
 
 Advertisements:
 
 
 
 | June 18/25, 2003
 
 Page 
      One
 
 Chip sorts colors
 
 Software referees group 
      calls
 
 Prefab key to molecular 
      memory
 
 Wires make wireless 
      strain gauge
 
 News briefs:
 See-through circuits 
      closer
 Protein traps nanoparticles
 Nods drive audio 
      interface
 Nano rapid prototyping 
      advances
 Practical nanotube 
      fiber near
 Nanotube transistors 
      make memory
 
 News:
 Research News Roundup
 Research Watch blog
 
 Features:
 View from the High Ground Q&A
 How It Works
 
 RSS Feeds:
 News
  | Blog  | Books  
 
   
 Ad links:
 Buy an ad link
 
 
 
         
          | Advertisements: 
 
 
 
 |   
          |  
 
 
 |  |  |