Integrated inputs improve interactivityBy Kimberly Patch, Technology Research News
Communicating with a computer through the usual channels -- pressing plastic keys and pointing with a mouse -- is pretty limited compared to the sound, eye contact, gesture and touch of human conversation.
For years, researchers have been working on widening human-computer bandwidth via speech recognition, eye and gesture trackers, and force feedback devices designed to allow a computer to communicate tactile sensations.
Each one of these technologies promises to help human-computer communications, but the real trick is being able to use them all at once in a way that feels natural. To that end, Researchers at Rutgers University have put together a desktop system, dubbed Stimulate, that coordinates input from all of these means of communication.
"Our focus was ... to achieve more natural communication between the human user and the networked computer system," said project leader James Flanagan, VP for Research at Rutgers University. Flanigan defines natural human interaction as including things like facial expression and manual gestures "in a hands-free mode where you don't have to wear or hold sound pickup equipment in order to transmit your message."
To do this, Stimulate uses a camera to track the user's face and eye movements; an array microphone mounted on the monitor to pick up the user's voice and distinguish it from background noise; and a three-ounce glove to track finger gestures and provide tactile feedback via pneumatic pistons.
The system uses speech recognition software to interpret users words, and text-to-speech synthesis so it can answer back.
The camera is gimbaled, meaning it is able to swivel in all directions to track fine movements. It uses an ultrasonic range finder and a face recognition algorithm to find the user' s face and watch for visual gestures. Software maps the cursor movement to the user's eye movement so "you can just move the cursor by looking," said Flanigan. The camera tracks pupil movement by shining an infrared beam at the eye and computing the angle between the center of the pupil at the beam's reflection off the cornea, he said.
The glove includes a position detector and little pneumatic thrusters that can apply pressure to the fingertips. (See picture.) "You can reach into a complex scene and move an object -- you can detect the position of it, the shape of it [and] the squishiness of it -- how much it pushes back when you grab it," said Flanigan.
The difficult part was coordinating the different inputs, said Flanigan. The researchers' Fusion Agent software interprets sensory inputs and estimates the user's intent by putting everything in context, which is essentially a type of semantic analysis," he said.
For instance, a user might point to an object and say 'move this to there.' Interpreting the command involves knowing what 'this' is and knowing where 'there' is. The software must look at all the inputs simultaneously because the user might point with an eye movement or a hand gesture. It gets more complicated when inputs are redundant or contradictory, Flanigan said. "You might speak and point. ... so the software agent has to maintain some context awareness of what the transaction is and what objects are being addressed and what actions are requested. In order to interpret all this, the software must perform syntax analysis and semantic analysis, Flanigan said.
"I think it is a significant work," said Jie Yang, a research scientist at Carnegie Mellon University. " In terms of the microphone array they are on the leading edge. [And the] major problem is you have to coordinate all the components together. This is a tough problem -- it's not trivial."
It is this combination and coordination of sight, sound and tactile technologies that, "even though they are quite primitive technologies, transcend the capabilities of the traditional mouse and keyboard," said Flanagan. For instance, it is somewhat difficult to rotate a virtual object 22-and-a-half degrees to the right with a mouse and keyboard. "But if you wanted to reach into the scene and twist the object 22-and-a-half degrees you can do that, or if you want to say 'rotate that 22-and-a-half degrees clockwise' by speech, that's fairly convenient as well," he said.
The researchers are currently working on a wireless version of the system. "It's at a very early stage," said Flanigan, but the goal is "to be able to walk around with your personal digital assistant and use conversational interaction, eye tracking... manual gesture [and] stylus gesture." Toward that end, the researchers are working on a miniature gimbaled camera, said Flanigan.
Although there's a lot of work to do on both the input technologies and the software, the utility of more natural human computer communications is clear, said Flanagan. "I could imagine you will see applications in selected places in less than five years," he said.
Flanigan's colleagues in the project were Rutgers professors Greg Burdea, Joe Wilder, Ivan Marsic and Cas Kulikowski. The project was funded by the National Science Foundation.
Timeline: < 5 years
TRN Categories: Human-Computer Interaction
Story Type: News
close Related Elements: Photo
September 13, 2000
Cheap lasers on the way
Nanotubes make microscopic bearings
Integrated inputs improve interactivity
Microorganisms infect computer chips
Sandia brews cheap telecom laser
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog | Books
Buy an ad link
Ad links: Clear History
Buy an ad link
© Copyright Technology Research News, LLC 2000-2006. All rights reserved.