CS598 SS Probabilistic Methods for Biological Sequence Analysis (Spring 2007)
Instructor: Saurabh Sinha
| Home
| Basic Information
| Schedule
|
| Readings
| Project
| Resources |
Schedule (tentative)
Crash course for project
- (01/16) Basic molecular biology: biological processes and sequence features, the genome and its annotation. Powerpoint Slides
- (01/18) More detailed molecular biology. The importance of gene regulation. Fruitfly segmentation. Binding sites and motifs. Modules and their composition. Tandem repeats. Bioinformatics goals for this system: motif finding (supervised, ab initio), module finding (supervised, ab initio), regulatory pathways and networks. Powerpoint Slides
- (01/23) Module finding. Hidden Markov models (DEKM 3, BB 8.2). Expectation Maximization (DEKM 11.6). Powerpoint Slides
- (01/25) Evolution and tree of life. Comparative genomics. Alignments. Multi-species motif finding (alignment free). Parsimony and its probabilistic interpretation. (Footprinter, and probability calculations.) Powerpoint Slides
Sequence Alignment
- (01/30) Sequence similarity, dynamic programming (local alignment and the importance of expected negative score, DEKM 2.3 Smith-Waterman), affine gap penalty (DEKM 2.4). Paper presentation:
- Brudno et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA, Genome Research 2003 Apr;13(4):721-31. Powerpoint Slides
- (02/01) Quadratic time complexity of pairwise alignment, and the need for something faster. Seed based local alignments. Blast. Blast statistics (DEKM 2.5, 2.7): extreme value distribution.
- (02/06) The Bayesian take on pairwise alignment. (DEKM 2.6)
- (02/08) Multiple alignment (DEKM 6.5). Algorithms -- Sum of pairs, Progressive alignment, DIALIGN. Profile HMMs (proteins) and PFAM domains.
- (02/15) Paper presentation (by Hareesh Gadde):
- Do et al. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005 Feb;15(2):330-40.
Motif finding
- (02/20) Weight matrix, relative entropy. Paper presentation (Manish Agrawal):
- Lawrence et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208-14. Powerpoint slides
- (02/22) Overrepresentation of binding site motifs. Paper presentation
- Tompa. An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem. ISMB 1999. Powerpoint slides
- Sinha & Tompa. A Statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol. 2000;8:344-54. Powerpoint slides
- (02/27) Paper presentation (Lyndsy Kron):
- Nimwegen et al. Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics. PNAS 2002 May 28;99(11):7323-8. Powerpoint slides
Bayesian inference
- (03/01) Bayesian Inference (BB 2.3): priors, Gaussian, Gamma, Dirichlet, MAP & ML as the first level of Bayesian inference. Lagrange multipliers. Bayesian example (BB 3.1): the single die model with sequence data or with count data.
- (03/06) (Combined with 15.)
- (03/08) The Bayesian approach to motif finding: Expectation-Maximization. MEME and Stubb/PhyME. The statistical mechanics connection (BB 3.2.1 - 3.2.4, 4.4).
Module Detection
- (03/13) Hidden Markov Models for module detection. Paper presentation (Wen-Ting Lin):
- Sinha et al. A Probabilistic Method to detect Regulatory Modules. In Intelligent Systems for Molecular Biology 2003.
- Rajewsky et al. Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3:30 (2002).
- (03/15) Mid-term examination.
- (03/27) Paper presentation (by instructor):
- Zhou, Q. and Wong, W.H. (2004), CisModule: De novo Discovery of Cis-Regulatory Modules by Hierarchical Mixture Modeling, Proc. Natl. Acad. Sci. USA, 101: 12114-12119. Powerpoint slides
- Aerts et al. Computational detection of cis regulatory modules. Bioinformatics. 2003 Oct;19 Suppl 2:II5-II14. Powerpoint slides
Evolution
- (03/29) Evolution models (DEKM 8.1 - 8.3), calculating likelihood of alignment, reversibility, Metropolis algorithm for phylogenetic tree construction (DEKM 8.4), evolutionary models with gaps (DEKM 8.5).
- (04/03) Paper presentation (Josh Smith):
- A. Siepel and D. Haussler. Combining phylogenetic and hidden Markov models in biosequence analysis. Proc. 7th Annual Int'l Conf. on Research in Computational Biology (RECOMB '03), pp. 277-286, 2003.
- (04/05) Evolutionary events - large repeat families, minisatellites, microsatellites. Gene duplications and pseudogenes. Tandem repeat detection (with statistics). Applications - sequence turnover. Implications for probabilistic sequence analysis algorithms. Powerpoint slides
- (04/10) Paper presentation: (by instructor)
- (04/12) To be decided.
- (04/17) Paper presentation: (by instructor) Powerpoint slides
- Coin & Durbin. Improved techniques for the identification of pseudogenes. Bioinformatics, 20 Suppl 1:I94-I100, 2004.
- Wexler et al. Finding approximate tandem repeats in genomic sequences. Proceedings of the eighth annual international conference on Research in computational molecular biology, 2004.
Evolution and Motif finding
- (04/19) Paper presentation (by instructor): Powerpoint slides
- Siddharthan et al. PhyloGibbs. (Full reference to be provided later.)
Population Genetics
- (04/24) (Combined with 26.)
- (04/26) Wright Fisher model, random drift, with selection only, with mutations only, coalescence theory. Neutral sequence.
Project presentations
- (05/01)