Software organizes email by
Technology Research News
a lot of structure to a person's email. Rather than random isolated documents,
individual email messages are often portions of a larger activity. Despite
the inherent structure, and despite organizational tools such as folders,
much of the world's email remains relatively unorganized.
Researchers from the University College Dublin in Ireland and IBM
Research have developed a way to use the inherent structure of related email
messages to automatically organize the messages by task.
"We realized that quite a lot of emails are not just random isolated
messages, but rather relate in some specific way to earlier messages [by
way of] some underlying activity or process" such as travel, meetings, or
asset management, said Nicholas Kushmerick, a lecturer in computer science
at the University College Dublin in Ireland and visiting scientist at IBM's
Dublin Software Laboratory. "Our email activity management technique enables
the email client to automatically recognize the structure of these activities
and group messages," he said.
The method could eventually be used in tools that automatically
organize email and allow a user to query the system based on the underlying
organization. It could also lead to related tools like task schedulers,
according to Kushmerick.
The researchers' prototype uses a three-phase process, said Kushmerick.
First, the system groups messages according to the activity or task
they relate to. For a user who participates in multiple eBay auctions simultaneously,
the system would partition eBay messages into messages pertaining to the
different auctions -- for example, a desk auction, a bed auction, and a
dollhouse auction, said Kushmerick.
Second, the system detects occurrences of the process across those
activities and re-groups the messages. EBay auction steps include email
acknowledgments of bids and notifications of outbids. For example, the eBay
messages would be grouped into the 'thanks for the bid' messages for the
desk, bed and dollhouse, and the 'you've been out bid' with messages for
the desk, bed, and dollhouse, said Kushmerick.
Third, the system organizes activities and steps into a single representation,
or process model that stipulates the order in which the process steps occur,
said Kushmerick. "The complication is that many real-world processes can
contain loops -- a single eBay auction might contain many pairs of bid-out
bid messages," he said.
The researchers used text classification, text clustering and automata
induction algorithms to carry out the process, said Kushmerick. Text classification
algorithms determine some level of meaning for words. Text clustering algorithms
group documents into related sets. Automata induction algorithms generate
process flow models. Each of these pieces has been developed independently
for decades, but no one had previously thought to apply them to this particular
problem or integrate them in this manner, he said.
Compared to artificial intelligence approaches to data representation,
the three-phase process is shallow, but appropriate, said Kushmerick. "We
can get away with shallow techniques because the messages and processes
we're dealing with are generally quite structured," he said. Every message
from eBay, for example, contains a unique identifier such as 9188139a; information
like this can be exploited to organize messages.
From the user's point of view, the system is entirely automated,
said Kushmerick. The user "gives the system a set of messages and says 'please
organize them'," he said. "There is no need for the user to provide background
such as 'I have pending reports for two trips -- to Singapore, and Berlin'."
The researchers' next step is to begin real-world testing of the
technique with large collections of messages. They are also working on completing
the user interface. They're aiming to make the system easy to visualize
and easy to correct, said Kushmerick.
The interface will enable users to correct system mistakes and to
provide hints to help the system generalize correctly, said Kushmerick.
"No matter how carefully we tune our algorithms, fully automated techniques
will probably never be 100 percent accurate, so we need to make sure that
occasional mistakes do not harm the benefit that accrues from our more sophisticated
activity-centric presentation," he said.
The current prototype is 91 percent accurate at classifying and
grouping messages, according to Kushmerick.
The researchers are also working on ways to allow the user and machine
to cooperate to discover the appropriate task structure when the computer
cannot do it on its own. When the algorithm is stumped, "we don't want the
computer to throw up its hands and say 'I've no idea'," said Kushmerick.
"Instead, the computer should ask a series of pointed questions that will
disambiguate the situation, such as... 'it would appear that eBay occasionally
sends bid acknowledgments twice. Is that correct?'"
They are also working on enabling the user to carry out high-level
queries that have to do with the underlying activities, said Kushmerick.
For example, "Calculate the average amount I spent on each online grocery
order last year," or "Check the travel reimbursement transactions to estimate
the total number of days I spent away from home for business purposes in
the last six months."
One drawback of the prototype is that it is overly dependent on
computer-generated messages, said Kushmerick. "Extending our techniques
from machine-person messages to person-person messages will be very challenging,
but we have already started to make some progress," he said.
The technology could be ready for commercial use in two to three
years, said Kushmerick. "A patent application covering our technology has
already been filed, and IBM is currently exploring potential avenues to
commercialization," he said.
Further down the line, the researchers' system could be used to
automate other tools like schedulers and email analysis tools. An analysis
tool could, for instance, automatically notice when you send a message to
someone requesting that they send you a document, add this request to a
"pending" list, and automatically mark the request "satisfied" when the
document arrives, said Kushmerick.
The ultimate aim is to enable ordinary end-users, as opposed to
specialized technical support personnel, to personalize and customize their
computing environments, said Kushmerick. "Each of us has a distinctive suite
of activities that we engage in, preferences for the way information is
presented, notions of what constitutes high-priority, constraints about
divulging confidential information, et cetera," he said.
In the last several years machine learning technologies have improved
enough that it is possible to envision practical self-customizing software
and high-level tools that allow ordinary users to customize applications,
Kushmerick's research colleague was Tessa Lau. They presented the
work at the Intelligent User Interfaces Conference (UIC 2005) held January
9 to 12, 2005 in San Diego. The research was funded by IBM's T. J. Watson
Timeline: 2-3 years
TRN Categories: Data Representation and Simulation; Databases
and Information Retrieval; Internet
Story Type: News
Related Elements: Technical paper, "Automated Email Activity
Management: an Unsupervised Learning Approach," presented at the Intelligent
User Interfaces Conference (UIC '05) in San Diego, January 9-12, 2005 and
posted at www.cs.ucd.ie/staff/nick/home/research/download/kushmerick-iui05.pdf
March 9/16, 2005
Snapshots save digital
email by task
Wire guides terahertz
How it Works Files:
scheme goes one-way
Method makes double
cochlea tells tones apart
boost molecular devices
Avalanches up disk
laser goes continuous
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog
Buy an ad link