| ||||||||||||||||||||||||||||||||||||||
|
Cis-regulatory modules (CRMs) that perform similar functions should share binding sites for the same transcription factors. Often, these binding sites are unknown, and the relevant transcription factor motifs may also be unknown. Nevertheless, the shared binding sites should affect the statistical properties of the functionally related sequences. This is what we seek to exploit in developing alignment-free measures of similarity between regulatory sequences. Alignment-free sequence comparison can serve two purposes: (i) given the cis-regulatory modules in one species, to discover their orthologs in a highly diverged species (e.g., from fruitfly to mosquito), and (ii) given the cis-regulatory modules belonging to a pathway, to find other CRMs in this pathway in the same species. The D2Z score: In the following paper, we developed the statistics for such an alignment-free measure of similarity between regulatory sequences. The basic idea here was to count the number of shared short words (k-mers) between two given sequences, and "normalize" this count so as to measure its statistical significance.
Ab initio module discovery: We have used the D2Z score and another score for alignment-free sequence comparison, in conjunction with Simulated Annealing search strategies, to discover CRMs in the control regions of co-expressed genes.
Supervised CRM prediction: Given the known CRMs active in a specific tissue or stage of development, we can search near other co-expressed genes or genome-wide for functionally related CRMs, using our alignment-free measures of similarity. This is work in progress. CRM discovery across large evolutionary divergence:A large body of experimentally validated CRMs are catalogued by the REDfly database (Gallo et al. 2006), for Drosophila development. We are interested in finding the orthologs of these CRMs, to the extent that they are conserved, in highly diverged insect species such as the mosquito, beetle, wasp and honeybee. Traditional methods for finding orthologs, that are based on genome-wide alignments, break down for this application, since the non-coding genomes of these insect species do not align well. We are therefore searching for these missing orthologs in the control regions of orthologous genes, using our alignment-free measures. This is work in progress. |
|
|||||||||||||||||||||||||||||||||||||