Software sorts Web data

By Kimberly Patch, Technology Research News

A research consortium is putting the finishing touches on a set of software programs designed to do for data comparison what the Web has already done for sharing documents.

Today, comparing sunspot activity with heart attack data, or cross-referencing tire safety data with ZIP codes is possible, but it's not a quick, point-and-click type of task. It involves finding relevant sets of data, scraping them into a program, making conversions to comparable units, and futzing with the data to make it readable.

For the past four years, project DataSpace, a university, government and corporate consortium, has been building an infrastructure to change all that. "The idea is there's a lot of data out there but it's not always easy to see how it's related to other data,” said Robert Grossman, director of the Laboratory for Advanced Computing at the University of Illinois, CEO of Magnify Inc. and DataSpace project leader.

The DataSpace project involves several pieces of software. The four major parts are DataSpace Transfer Protocol (DSTP), a protocol for moving columns of data over the Web; Predictive Model Markup Language (PMML), a set of tags for marking columns of data; and open source DSTP client and server software, which allow computers to exchange such data. DSTP is the data equivalent of Hypertext Transfer Protocol (HTTP) and PMML is the data equivalent of and Hypertext Markup Language (HTML).

Data needs its own markup and transfer protocols because two columns of data must share common units in order to be compared meaningfully, said Grossman. The DataSpace format is like email versus a fax, he said. "If you send a fax, you see the same image but you can't manipulate it. When data is in HTML it cannot be immediately manipulated even though the information is visually apparent. You need some protocol that has some format that is understood by your application," Grossman said.

The DataSpace protocol addresses the problem by adding universal keys to columns of data. A user can compare data that has keys in common regardless of its location and format. For example, with ZIP code and date keys in common, "you can compare diabetes related deaths per ZIP code with average income and average education per ZIP code," said Grossman. The DataSpace project includes many standard keys and also allows users to find more.

One of the four pieces has been completed each year of the project. With the advent of the transfer protocol, DataSpace is ready to be driven, said Grossman, who compared the project to a car that now has four working wheels.

"It's ready to be used. The [PMML] language is being supported by [Microsoft, IBM, Oracle and NCR]. The protocol is well-defined. We have shown through Java applications that we're giving out on the Web that it's useful and easy to use and we're encouraging people to... adapt them to needs of their own," Grossman said.

The scientists are continuing to stress test the system. One big road test is slated for November, when 10 applications involving about 50 scientists and engineers will debut at the Supercomputing tradeshow in Dallas. "They'll be [showing] everything from looking for patterns in genomic data to looking for patterns in business data to looking for patterns in engineering data to looking at the Firestone tire data [related to] fatalities," said Grossman.

The project has involved about 50 scientists and engineers a year from various universities, companies and government labs, said Grossman. The most recent work included a testbed at the University of Illinois with funding from the National Science Foundation and input from a dozen other entities, said Grossman. Software, demonstrations and a full list of participants are available at www.dataspace.net.

Timeline:   Now
Funding:   Government, University, Corporate
TRN Categories:   Internet
Story Type:   News
Related Elements:   Project DataSpace web site: www.dataspaceweb.net




Advertisements:



September 20, 2000

Page One

Hue-ing to quantum computing

Robots emerge from simulation

Software sorts Web data

Processor design tunes memory on the fly

Superconducting transistor debuts


News:

Research News Roundup
Research Watch blog

Features:
View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 



Ad links:
Buy an ad link

Advertisements:







Ad links: Clear History

Buy an ad link

 
Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN


© Copyright Technology Research News, LLC 2000-2006. All rights reserved.