| ||||||||||||||||||||||||||||||||||||||
|
PROBABILISTIC ALIGNMENT We are looking at new and efficient ways to align multiple DNA sequences, using maximum likelihood and Bayesian approaches. We use a simple evolutionary model where the events are substitutions, insertions and deletions, and the parameters are their rates and length distributions (for indels). The goal is to infer the most plausible way for explaining the data, by training various aspects of the model. The big challenge here is to do all this while keeping the learning algorithms efficient; we have to scale up to megabase pairs of sequence after all !
Annotating insertions and deletions with probabilistic model-based methods: Despite a large volume of prior research done on multiple alignment, we still are lacking tools that approach alignment from an evolutionary perspective, rather than the information theoretic approach taken by most of the current algorithms. Insertions and deletions should be treated as evolutionary events, rather than simply the absence of conservation, as implied by their status as "gaps" in the alignment. We have recently published preliminary work in this direction. Our program, called Indelign, is specialized for annotating insertions and deletions in a given multiple alignment, and offers a powerful alternative to the parsimony-based approaches used so far.
Probabilistic alignment of cis-regulatory modules: CRMs contain multiple binding sites, and during their evolution some of these binding sites remain conserved in (relative) position. This information should improve the quality of alignment of such cis-regulatory sequences. We have demonstrated how this is possible, within the Hidden Markov Model (HMM) framework for parsing CRMs. Our program, called Morph, can align two CRMs from moderately diverged species, in a binding-site conscious manner, and provides clear benefits over using a traditional alignment tool.
CRE: CIS-REGULATORY EVOLUTION Once we have built reliable tools for multiple sequence comparison, our goal is to examine the evolution of transcription factor binding sites that control gene expression.
Sequence turnover and indels in cis-regulatory modules: Our journey into this exciting field began with the following simple study of CRMs in Drosophila, where we compared D. melanogaster and D. yakuba, with D. pseudoobscura as an outgroup. We found a surprising abundance of insertions and deletions, as well as a highly non-neutral signature of equal amounts of the two, in CRMs. We also found a very high coverage of short tandem repeats in these CRMs.
Binding site turnover and length constraints in CRMs: We are currently examining the selective pressures on the length and placement of CRMs by comparing the recently sequenced 12 species of Drosophila. The same study uses a combination of the Morph program (see above) and the consistency-transformation of the ProbCons algorithm (Do et al.) to align CRMs from 12 Drosophila genomes, and follow their evolution. We find that the "loss rate" of binding sites depends significantly on the DNA-binding specificity of the transcription factor. This work also makes significant contribution to the visualization of "annotated multiple alignments". |
|
|||||||||||||||||||||||||||||||||||||