Virtual computers reconfigure on the fly

By Ted Smalley Bowen, Technology Research News

Grid computing, which pieces together temporary, virtual computers from resources on the Internet, is in theory a good way to handle tough number-crunching tasks that change over time.

Grid software combines the muscle of a few or even hundreds of computers by coordinating scheduling and security across the different types of systems. These combined resources are needed to speed up scientific and engineering applications that frequently involve complicated equations and elaborate graphical simulations.

But early efforts have only been able to handle simple, relatively predictable programs, rather than the complex, custom programs run by scientists, engineers, and their ilk.

What is lacking is a steady mechanism for maintaining sufficient levels of compute power for the duration of a virtual Grid computer's tasks. Grid applications need to be able to monitor the resources and performance of the systems that fuel them and switch to other appropriate systems when the original contributors fail to meet their requirements.

Toward this end, a group of researchers at Argonne National Labs, the University of California at Berkeley, the University of Chicago, and the Max Planck Institute for Gravitational Physics have developed software that reconfigures a virtual grid computer on-the-fly in order to keep it humming.

This adaptive approach is designed to help existing Grid computing software address the compute power problem. "Grid computing must be adaptive, because... one is required to operate in an environment about which one has imperfect knowledge and that has dynamically varying characteristics," said Ian Foster, a professor of computer science at the University of Chicago, and an associate director in the Department of Computer Science at Argonne National Laboratory in Argonne, IL.

The researchers' software uses notification and event services to determine when things change, said Foster.

To create the system, the researchers started with the Cactus set of Grid computing tools, which allow programmers to run groups of calculations in parallel across multiple computers that can range from PCs to supercomputers. The researchers also used the Globus toolkit to provide Grid resource discovery, access, location, migration and security functions.

The researchers added programs for adapting applications to run on different types of computer systems, for detecting drops in performance, for finding appropriate resources, and for handling the migration process.

They also added software that keeps tabs on the progress of a given program through a series of checkpoints in order to carry that information over to new systems as they are recruited.

The checkpoints save a snapshot of the computation in a form that permits the job to be shifted to another system, even one that has a very different architecture and operating system, or different amounts of disk space and memory, said Foster.

The various systems involved in a virtual Grid computer using the researchers' software must perform to the standards of a contract between the user and the systems providing the compute power. If a contract is broken, the software finds other resources and reconfigures the virtual computer.

The researchers evaluated their software on several Grid testbeds, loading down virtual Grid computers with more and more tasks until performance dropped by more than 10 percent. They set the software so it found alternative resources that gave the bogged-down virtual computer more compute power after three such drop-offs.

The researchers' system currently requires the operators to monitor this performance manually. "We obtain per-time-step measurements, and monitor according to a user-specified definition of what forms a contract violation. Future plans have us doing this automatically," Foster said.

The experiment involved no scheduling software, although eventually computers participating in Grid applications will be subject to random use and will need to prioritize their resources. "So far, we assume no scheduling technology: applications discover unloaded servers, and initiate computation there if authorized," said Foster.

The researchers plan to add asynchronous notification of resources, meaning an application can begin at a lower speed or fidelity, and improve if and when more resources become available, he said.

The software is powerful and generic enough that it can accommodate many different variables for determining application migration, said Henri Casanova, a researcher in the Grid computing lab at the University of California San Diego. It is likely to "motivate Grid application developers to architect their applications in ways that will support migration," he said.

In doing so it will open up the interesting questions of how to decide whether to trigger migration, and when and where to do it, Casanova said.

The work also opens the way for a more detailed exploration of Grid computing issues like scheduling, resource selection, and application adaptability, he said.

In general, the scheme is best suited for large applications that must run over long periods, said Casanova. In large scientific simulations that consume large amounts of tightly coordinated resources, migration will be useful if the cost of migrating is not greater than the cost of running the application on potentially sub-optimal resources, he said.

Foster’s colleagues in the study were Gabrielle Allen, Gerd Lanfermann , Thomas Radke and Ed Seidel of the Max Planck institute, David Angulo and Chuang Liu of the University of Chicago, and John Shalf of Lawrence Berkeley National Laboratory. The work is slated to appear in an upcoming issue of the International Journal of Supercomputer Applications. The study was funded by the National Science Foundation (NSF).

Timeline:   Now
Funding:   Government
TRN Categories:   Distributed Computing; Applied Computing; Supercomputing
Story Type:   News
Related Elements:  Technical paper, “The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment”, slated for publication in November in the International Journal of Supercomputer Applications.



Advertisements:



November 28, 2001

Page One

Programmable DNA debuts

Device would boost quantum messages

Virtual computers reconfigure on the fly

Software sorts video soundtracks

Bigger disks won't hit quantum barrier

News:

Research News Roundup
Research Watch blog

Features:
View from the High Ground Q&A
How It Works

RSS Feeds:
News  | Blog  | Books 



Ad links:
Buy an ad link

Advertisements:







Ad links: Clear History

Buy an ad link

 
Home     Archive     Resources    Feeds     Offline Publications     Glossary
TRN Finder     Research Dir.    Events Dir.      Researchers     Bookshelf
   Contribute      Under Development     T-shirts etc.     Classifieds
Forum    Comments    Feedback     About TRN


© Copyright Technology Research News, LLC 2000-2006. All rights reserved.