space space space
space
University of Illinois at Urbana-Champaign
space
space

How computers can revolutionize molecular biology


Saurabh Sinha

Saurabh Sinha is using computer science to unlock one of life's biggest mysteries: How genes function. Genes can be represented as strings of characters, and computer scientists are very good at studying strings. Sinha's research combines statistics, machine learning, and algorithm theory to help piece together the genetic puzzle. But to understand this work, some background on the biology is necessary.

The puzzle begins with DNA

DNA (deoxyribonucleic acid) is present in every cell. Its structure is like a thread made of beads, with billions of beads in every thread. Each bead is one of four types, so whole threads can be represented as strings over an alphabet of four characters. DNA threads are written as strings of letters A, C, G, and T.

A gene is a substring of DNA with length about 1,000, and there are thousands of them. We know that genes somehow encode information about us-how we look, how we act, how we function on a daily basis-but we do not yet understand how all this is done.

The next piece of the puzzle is proteins. Proteins are big molecules, consisting of many atoms. Unlike DNA, they are not threads but are complicated three-dimensional structures. They do almost all of the work in our cells. For example, if a food particle comes in and has to be broken up and used for energy generation, the protein gets the job done.

Genes are responsible for producing proteins. The genes act as casts or templates for protein production. Each protein is produced by one gene. It is a one-to-one match. This is an interesting phenomenon, Sinha pointed out, because genes are linear and proteins are not. "To study an organism's function," said Sinha, "you have to study the proteins. And to study the proteins, you have to study the genes."

Sinha first described the task of finding where the genes are in DNA, as follows: Imagine a text book in a different language in which all punctuation and spaces are removed. If you are asked to, say, find sentences that talk about the day's weather tin that book, how would you go about doing that without knowing the language? There might be ten sentences related to weather scattered throughout the book. This is the kind of data extraction problem that Sinha is facing. "Finding where the genes are is the first problem," he said, "but we are at least beginning to understand some of the features of genes. This has led to a harder program: Gene regulation."

Why the cell of a nose is different than a cell of an eye is due to the difference in the proteins present. The DNA is the same. Different cells produce different proteins, and this what makes us all different-from humans to monkeys to flies and bees. The difference is not at the gene level, but it is the proteins that are produced differently. Different cells have different proteins that are active.

There is a regulatory mechanism that turns a gene on or off. If the gene is on, it will produce its corresponding protein in that cell. This leads to diversity of animal forms. The regulatory mechanism is encoded near the gene itself, and the action happens in a nearby substring, with length of about 10, called the regulatory site. This short string acts as a switch that turns the nearby gene on or off. To complicate matters minor variations in the site do not interfere with the mechanism. Genes can be measured experimentally to see if they are active; thousands of genes can be measured simultaneously. (This is done with an instrument called a "DNA microarray chip.") If ten genes are active in an orchestrated or synchronized way, it can be theorized that something common is regulating them, and perhaps they all have the same short string (regulatory site) near them. To find this common site, we can look for statistically overrepresented words near the genes. Enter the computer scientist. "That is the basic idea of all my work," Sinha said, after having explained the biological processes involved. "Sometimes these sequences are like loosely organized sentences. In that case, I look for bunches of words."

Sinha is also interested in how genes and the regulatory region differ among species, between humans and chimpanzees or between two different kinds of flies. "The genes don't change much," he said, "but the regulatory regions are quite different. So I study the evolution of these things and try to find patterns in how they evolve. If we can understand these patterns and how sequences evolve, we can understand how evolution as a phenomenon works."

Sinha is working on two collaborative research projects to address some of these issues: One is with honeybees, and one is with flatworms.

The social behavior of honey bees: Nature or nurture?

The social behavior of bees mirrors some aspects of the social behavior of all living creatures, including humans. Perhaps we can learn from them. Sinha is working with biologists Gene Robinson and Charlie Whitefield, both UIUC professors, to learn how social behavior of bees is affected, or even determined, by genes. One way we know that bees are social is by the fact that they will get food for themselves even if they are not hungry themselves. Another is that they will nurse young larvae even if they are not their own offspring. The bees are thinking more about their role in the colony. Social behavior is a function of both genes and the environment. One hypothesis is that the environment, perhaps through exposure to a chemical, can turn on certain genes in the body, which might prompt another gene to turn on, and a whole cascade of events will happen inside the cell. The final result might govern the organism's social behavior. This effect has been demonstrated experimentally among honey bees. Bee nurses who had been fed a certain chemical became foragers-their social behavior could be controlled by artificially injected inputs. "The question is," said Sinha, "what is mediating this process? What happens inside the cell? Some mechanism is turning genes on and off, so we are looking at the honey bee genome and microarray chip snapshots of the bee's brain, integrating the gene sequence with gene activity in the brain. This is very exciting for us."

Can the flatworm unlock the secret of life?

In this new research trajectory, Sinha is working with Professor Phil Newmark of UIUC to examine the genetic basis of "regeneration." What is amazing about the flatworm is that if you cut in two, anywhere along its body, it will regrow what's missing, including the nervous system, thereby regenerating itself. Most creatures start as one cell that divides and continues to divide until it becomes a whole adult body. The flatworm is somehow able to do something similar even in adulthood. What is going on? Again, Sinha is looking at the gene sequence and snapshots of gene activity to find out. And if this work doesn't immediately unlock the secret of life, it does have practical application here and now. For example, it turns out that chemical insecticides work via similar mechanisms. Insects adapt to withstand these chemical assaults by evolving into different strains. If we understand the mechanism, perhaps we can manage the strains that have become resistant.

"So many things work by serendipity," said Sinha, pointing out how many medicines we have derived from plants have been discovered to work quite by accident. "What this new technology allows us to do is to study the details in a systematic, high throughput manner. All that I study could be done by card-carrying biologists with microscopes and traditional techniques, but they might arrive at one development in twenty years. With a computer, we can make predictions a thousand times faster than that. Computers are very adept at making pattern discoveries and correlations between different data. My job is to help them do this effectively and efficiently. This is a completely new era in science, and computers are leading the way."

Written by Judy Tolliver, March 24, 2006


--
Last Modified August 07 2006 09:02:19.

space
space

space

Department of Computer Science, Thomas M. Siebel Center for Computer Science, 201 N Goodwin Ave,
Urbana, IL 61801-2302. The Department is part of the College of Engineering at the University of Illinois at Urbana-Champaign. Contact academic@cs.uiuc.edu with academic questions
or webmaster@cs.uiuc.edu with questions or comments on this page.