Biology's Big Project
Is Computer Challenge

By Gina Kolata



June 11, 1996

The University of Washington has jumped into the DNA sequencing business in a big way, building a world-class center to study the exact order of the chemicals that make up human DNA. And the seed money, the $12 million that has allowed the university to lure the stars of science to its campus? It came from Bill Gates, chairman of the Microsoft Corp.

Experts on sequencing, the term for determining the order of a gene's building blocks, say that the connection is not surprising. In the last few years, it has become increasingly clear that the Human Genome Project, the plan to sequence human DNA and map the position of human genes, would have to turn into a project in computer science and in engineering such things as robots to do laboratory work. So computer scientists who used to spend their days devising clever methods for speeding up solutions to classic mathematical problems are now figuring out how to speed up the processing of data on DNA sequences.

"We've always known that the day would come when engineering would play a critical role" in the human genome project, said Dr. David Botstein, referring to the invasion of computer engineers and programmers into biology laboratories. "That day has arrived," he added.

The influx has not come without some resistance from biologists dismayed at the prospect of having their own subject wrested away from them, Botstein said. At first, he said, "biologists viewed it with considerable alarm and, on the part of some people, quite vocal distaste about the necessity of making it that way." But most have become resigned to the situation, Botstein said. These biologists hope, some would say in vain, that once the computer experts have processed all the information in the genome, they will go back where they came from and let biologists resume the study of life.

One of the computer scientists is in fact a biologist who spent years learning the computer side of the business. Dr. Eric Green, chief of the Genome Technology Branch at the National Center for Human Genome Research in Bethesda, Md., said, "I was drawn in for the engineering challenge."

Since he is, in fact, a biologist, albeit one with a strong computer science bent, some of his colleagues have looked askance at his current technical focus. Green said, "Some academics would say, 'You're not doing real science. You're not doing real biology."'

Seemingly overnight, the technology became so sophisticated that the major academic players, selected a few months ago for what Botstein calls the "heavy duty sequencing," are at just six centers in the United States and four outside the country. The work has become so specialized that "it's getting very late for new start-ups," said Dr. Maynard V. Olson, a biologist on the team at the University of Washington in Seattle.

Computer scientists say that it takes a full-time commitment to remain a player in this field. Dr. James Orlin, a professor of operations research at the Massachusetts Institute of Technology, said that he had recently dropped out of the project after three years of work because he could not devote all his efforts to it. "To succeed in this area," Orlin said, "you have to devote yourself to it more or less full time."

Although biologists are still calling the shots, some see an entirely new field of science emerging as the processing of biological data becomes the domain of computer experts.

"We still do frame the problems and define the areas," said Dr. Leroy Hood, a biologist who heads the molecular biotechnology department at the University of Washington and who personally interested Gates in the project. But Hood added, "As computer scientists come in knowing more and more about biology, they will chart their own course. Once they get into it, we don't tell them what to do or where to go. They just take off."

Hood said that it had been hard for some computer scientists to appreciate the complex nature of biological problems and the creative effort it takes to solve them. When he first approached computer scientists for help, he said, they were arrogant and somewhat dismissive.

"Computer scientists would say, 'I can solve your problem -- just tell me what it is,"' Hood said. But then, he said, they would come back with a solution to an idealized problem with all the wrinkles removed, which had little to do with reality. Computer scientists, Hood said, "aren't worth anything unless they really learn the biology."

But now, Hood said, "several of them see this as an unbelievable opportunity" to solve new types of problems as they help decipher the ancient codes that evolution has embedded in the genome. Computer scientists are now starting to change the way biologists think, Hood said. They can help biologists move from the level of a single gene to the level of complex systems of genes working together, he explained.

"The future of molecular biology is studying complexity," Hood said. "The past has been studying one gene or one protein. The brain is a nice example. If we were to take one nerve cell and study it for 30 years, it wouldn't tell us one iota of how the system works."

The immediate goal of figuring out the human genome is so formidable that the old ways of the molecular biologists who work at the single-gene level are inappropriate to the task.

"We want to take the human out of the loop as much as possible," said Dr. Richard M. Karp, a leading computer scientist who joined the University of Washington team in August.

"The act of running sequencing experiments is deadly boring," Karp said. "If it is done by humans, it requires a meticulous attention to detail that few people can summon eight hours a day. The keys to making it effective are to automate the process -- to design robotic systems to do most of the work, to automate the process of scanning the data into the computer and then to use algorithms" so computers can analyze the data, he added.

One reason for the reliance on computers is the sheer scale of what is being attempted, scientists said.

"This project is huge," said Dr. Robert Waterson, a former mathematician who directs the sequencing effort at Washington University in St. Louis. The human genome consists of three billion nucleotides, the basic building blocks of DNA, arranged like beads on a string. The project aims to find out their correct order.

Waterson's group has been cutting its teeth on a project to determine the complete DNA sequence of the roundworm, Caenorhabditis elegans. The group has been working for six years with a group headed by Dr. John Sulston, director of the Sanger Center in Cambridge, England, the world's largest DNA sequencing center, and it is about halfway through, while starting on human DNA at the same time.

To give an idea of the magnitude of the task, Waterson said that if each nucleotide were one millimeter, or four-hundreths or an inch, wide, the nucleotides that make up the worm's DNA would stretch about 120 miles, the distance from St. Louis to Columbia, Mo. Those making up the human genome would stretch from St. Louis to Los Angeles, about 1,600 miles.

"In the last six years, with all this sequencing, we are not even halfway to Columbia yet," Waterson said. "And now we have the temerity to suggest that it is time to set out for L.A."

Because the human genome is so enormous, scientists had to find some way to orient themselves along its long stretches. A first step in the sequencing was to find landmarks, making it possible to have maps of the DNA with particular sequences flagged. Recently, scientists completed the first such map. Now Karp and his colleagues are trying to make it more detailed.

The information in each human chromosome -- each cell has 23 chromosome pairs -- is roughly equivalent to the information in 50 thick telephone books, Karp said. A map like the one that was recently generated gives the equivalent of 20 telephone numbers from each book. "We're trying to give a phone number on every page," Karp said, devising computer algorithms that can create the maps from DNA sequencing data.

But scientists still face the thorny computational problem of actually determining the sequences of DNA within the landmarks.

DNA sequencing machines, which can automatically determine the sequence of a piece of DNA, can handle only small segments, no longer than about 500 nucleotides each. So the trick is to feed small chunks of DNA to a sequencer to get the linear order of the nucleotides in those chunks, then to assemble the pieces in order.

Because of the way the DNA segments must be obtained, the sequences overlap; because of the nature of the DNA itself, many segments contain sections in which a particular sequence is repeated over and over. The scientists must figure out clever ways to take those sequences and cut out the overlaps, then put the nucleotides in order.

Waterson said that every week, his group generated the sequences of 27,000 DNA segments, each made up of 500 nucleotides. The problem is to reassemble them, a process akin to putting together a giant jigsaw puzzle.

Even though finding the sequences of 27,000 DNA fragments each week is a tenfold improvement on the rate at which the group found sequences three years ago, it is still too slow. Waterson said that his goal was to be getting the sequences of 40,000 DNA segments a week by the end of the year. "We figure that for us to do one-third of the human genome in five to six years," he said, "we have to get reads of 80,000 to 90,000 a week."

One measure of the importance of the invasion of computer scientists into the human genome project is the fame of the people who have joined in.

For example, Karp is a founder of the field of theoretical computer science, a man whose name is a household word among those who study computing. Yesterday he was one of eight scientists awarded the National Medal of Science, the nation's highest science award. His decision to move from the University of California at Berkeley, where he studied classic problems in computer science, to the University of Washington, where he works on the sequencing projects, made a strong impression on computer scientists.

Karp, said Dr. Tandy Warnow, a computer scientist at the University of Pennsylvania, "is one of the most important forces in the field." She said Karp "has the most respect of any computer scientist I know," adding, "He's helping to change the ways we think about what we are doing as computer scientists."

Karp said he saw the genome project as a large part of the future for his field.

"There's a revolution occurring in biology, particularly at the molecular level," Karp said. "It's turning biology into an information science. Many biologists consider the acquisition of sequences to be boring. But from a computer science point of view, these are first-rate and challenging algorithmic questions."

Karp added that at this point, the work "is far removed from biology."

"There are these basic experimental techniques that people worked very hard to refine," Karp said. "But we think of them as black boxes that pour out all of these data."

Copyright 1996 The New York Times Company


Dick Karp Receives National Medal of Science


lazowska@cs.washington.edu