Software speeds gene comparisonBy Kimberly Patch, Technology Research News
When biologists want to compare a group of individuals from one species to a group from another species in order to determine how closely the two species are related, they make comparative gene maps, a tedious process done by hand that can take weeks or months.
It takes a long time because of the large number of comparisons that need to be made. For example, to find out how far back corn and rice diverged from a common ancestor, a biologist would compare about 100 markers, or DNA segments, on each of the 10 chromosomes that make up the corn genome to 100 markers on each of the 12 chromosomes of rice. The larger the number of matches, the more recently the plants became separate species.
As a computer problem, it is difficult because simply going through those comparisons by rote, comparing everything to everything else, would result in a number of comparisons so large it would take half a page of space to write the number out: a 12 followed by 1,000 zeroes. Although the hand comparisons are tedious, humans can cut down on the work by intuitively skipping over blocks of comparisons that are not likely to produce results.
A group of Cornell researchers has addressed the problem, however, by using a dynamic programming algorithm similar to those used for analyzing human and programming languages. The algorithm considers all the possibilities without actually going through them by determining what to compare next based on the results it has amassed so far. "What it has looked at so far... will limit what it considers next," said Cornell applied mathematics graduate student Debra Goldberg.
The algorithm produces the comparison results in a few minutes of computer time, Goldberg said.
The quick results may prove especially useful because the markers used to compare chromosomes are from many individuals, and the more samples, the better the comparison. But because going through a full comparison is difficult, new data from more individuals is not incorporated into the maps very quickly. "We felt we [could] do something that would speed up the process and then revise it as more data came in," said Goldberg.
The problem was also a difficult one because the practice of comparing chromosomes is not standardized.
This is because gene mapping is of necessity a subjective process. The genes the process is trying to compare have been shuffled and reshuffled over millions of years. Finding evidence of common markers in different species involves a good bit of detective work.
Chromosomes are very long strings of DNA, the molecules that contain sequences of the four bases that make up the genetic material of life. Genes are specific regions, or sequences of bases, that a cell uses to make certain proteins.
The order of genes on chromosomes can change in several ways. During the course of reproduction, the progeny gets a mix of its parent's chromosomes. During the course of cell division, portions of chromosomes can break off and recombine in the wrong order. Common forces like radiation can damage DNA, causing further changes.
All these variables can make comparisons difficult. For example, if a gene is common to two species, but has also been copied within a single species, "you'll have trouble figuring out how to match them up," said Dannie Durand, an associate professor biological sciences at Carnegie-Mellon University.
Mice and humans both use hemoglobin proteins to carry oxygen in the blood, and these proteins are coded by a certain gene. Humans, however, have at least eight copies of the gene, four of which have mutated into genes that are only turned on while a fetus is in the womb's low oxygen environment. "You can see how if you didn't have a complete dataset you might think you found two analogous genes when you haven't," said Durand.
For their algorithm, the Cornell researchers decided on a set of standards. This is likely both a plus plus and a minus. One of the reasons gene maps are not standardized is biologists studying different organisms care about different aspects of the data. A given approach "may work very well for some types application problems but not others," simply because the scientists are asking different questions about very different species, said Durand.
The Cornell researchers program has just one variable, which allows biologists to choose how closely they would like groups of bases to match in order to be called a match. "We came up with some simple rules that are general enough that they're not going to violate any biologists sense of what is correct," said Goldberg. The idea, she said was to balance accuracy with parsimony.
Parsimony requires not changing things constantly. "We don't want to be flip-flopping back and forth for every gene in our labeling," said Goldberg.
It is tricky to know exactly what means what in gene comparisons. For example, a single instance of a certain portion of maize chromosome six matching a certain portion of rice chromosome three may be misleading, Goldberg said. "Perhaps this is evidence of an ancient linkage group, but just a single marker is not enough evidence that we would want to commit to such a thing. There're many other ways that any one gene might have a match in another genome, so we don't want to assign too much weight to any one piece of data," she said.
At the same time, if there are several instances in comparisons among individuals of different species where six markers in a row match, a seventh marker does not match, and then the next four markers match, it may be enough evidence to call the region as a whole a match. This is because events like mutations could have independently changed a single marker on one of the plants after they diverged from a common ancestor. Decisions like these "can be thought of as a smoothing function," said Goldberg.
In some ways the algorithm makes for simpler comparisons than the more complicated hand produced comparison maps.
On the other hand, because they're consistent, comparison maps produced by the algorithm can be quickly, easily and consistently compared to each other. "A formalized set of rules [means research] groups can compare different comparative maps," said Goldberg.
The algorithm may also allow comparisons in areas where it is too costly to do them otherwise. The data comparing humans and mice, for instance, has been looked at thoroughly because of its importance in human medicine. "But in many other species we don't have the resources, the money or biologists' time," Goldberg said.
The researchers are working on providing a better interface to the program that will allow researchers to more easily extract data from the comparison by asking questions, said Goldberg.
The algorithm will be available later this year for researchers to use, she said.
Goldberg's research colleagues were Jon Kleinberg and Susan McCouch of Cornell. They presented their research at the Plant and Animal Genome Conference in San Diego, January 13-17, 2001. The research was funded by the National Science Foundation, the Packard Foundation, the U.S. Department of Agriculture (USDA), The Cooperative State Research Education and Extension Service, the Alfred P. Sloan foundation, and the Office of Naval Research (ONR).
Funding: Government, Private
TRN Categories: Applied Computing; Data Structures and Algorithms
Story Type: News
Related Elements: Technical paper, "Automated Comparative Mapping," presented at the Plant and Animal Genome Conference in San Diego, January 13-17, 2001; Technical paper, "Algorithms for Constructing Comparative Maps," presented at the Gene Order Dynamics, Comparative Maps and Multigene Families (DCAF) conference in September, 2000 in Sainte-Adèle, Québec. The paper is posted at www.cam.cornell.edu/~debra/research.html
February 14, 2001
Quantum effect moves machine
Software speeds gene comparison
Agents learn from traveling salesman
Harder chips make more sensitive sensors
Silver atoms shine red and green
Research News Roundup
Research Watch blog
View from the High Ground Q&A
How It Works
News | Blog | Books
Buy an ad link
Ad links: Clear History
Buy an ad link
© Copyright Technology Research News, LLC 2000-2006. All rights reserved.