|March 9/16, 2005|
|People detect meaning
in even subtle facial expressions. Because this seemingly basic ability
to read each other has evolved over millions of years, however, we tend
to take our communications skills for granted.
Computers, on the other hand, have more in common with toasters than the human brain. There’s much work to be done to make computers communicate with us on anything like the terms we’re accustomed to.
The underlying technology that lets computers hear words, follow a gaze, pick up gestures, and keep track of a person moving around a room is pattern recognition software.
The first task of pattern recognition software is finding patterns in streams of raw data like digital audio or video. The software then matches the patterns to structures it knows. Patterns can include words, gestures and human bodies.
Four types of pattern recognition software are key to computer interfaces: artificial neural networks, the hidden Markov model, nearest neighbor and support vector machines.
Artificial neural network software mimics the basic structure of biological brains. The software learns what a hand looks like by building a pattern of connections among brain-cell-like components through repeated exposure to stimuli like digital video images of hands.
The hidden Markov model divides sensory input into a series of extremely brief events - typically thousandths of a second - and makes predictions about the nature of an event based on the event before and the one after. A sequence of events could be the sounds that make up a spoken word.
Nearest-neighbor algorithms map sensory input as points in an imaginary space, then classify the points assuming that points near to each other are similar. Two points representing video images of hands, for example, would be closer than a point representing a hand and a point representing a head.
Support vector machines map sensory input features statistically. The software recognizes patterns by comparing the ways a feature map resembles the maps of examples it was trained on.
Recognizing patterns, however, gets a computer only so far. Once it learns to recognize an eye, a hand or a human, a computer has to be able to track the object as it moves.
Object tracking software measures an object, predicts where the object will be next, narrows the area to be measured based on the prediction, and uses new measurements to improve the predictions.
Even being able to track and interpret the types of input humans use to communicate - gestures, words and facial expressions - is not enough. Meaning is often conveyed by a combination of different types of sensory input. Words and gestures, for example, can go together to produce meaning that cannot be determined from simply examining the inputs separately.
To tackle this problem, the computer needs to recognize and interpret each type of input, track the timing of the inputs, group segments of sensory input from each type chronologically, combine segments like words denoting space and pointing gestures, then interpret the combination to extract its meaning.
Snapshots save digital evidence
Software organizes email by task
Wire guides terahertz waves
How it Works Files:
Quantum crypto scheme goes one-way
Method makes double nanotubes
Material promises denser DVDs
Artificial cochlea tells tones apart
Nanotubes boost molecular devices
Avalanches up disk storage
Silicon chip laser goes continuous
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog | Books
Buy an ad link
Ad links: Clear History
Buy an ad link
© Copyright Technology Research News, LLC 2000-2006. All rights reserved.