CS598 SS Probabilistic Methods for Biological Sequence Analysis (Fall 2005)
Instructor: Saurabh Sinha
| Home
| Basic Information
| Schedule
|
| Readings
| Project
| Resources |
Schedule (tentative)
Crash course for project
- (8/26) Basic molecular biology: biological processes and sequence features, the genome and its annotation. Powerpoint Slides
- (8/31) More detailed molecular biology. The importance of gene regulation. Fruitfly segmentation. Binding sites and motifs. Modules and their composition. Tandem repeats. Bioinformatics goals for this system: motif finding (supervised, ab initio), module finding (supervised, ab initio), regulatory pathways and networks. Powerpoint Slides
- (9/2) Module finding. Hidden Markov models (DEKM 3, BB 8.2). Single species Stubb and E-M. Stubb posterior score as a measure of binding site content. Powerpoint Slides
- (9/7) Evolution and tree of life. Comparative genomics. Alignments. Multi-species motif finding (alignment free). Parsimony and its probabilistic interpretation. (Footprinter, and probability calculations.) Powerpoint Slides
Sequence Alignment
- (9/9) Sequence similarity, dynamic programming (local alignment and the importance of expected negative score, DEKM 2.3 Smith-Waterman), affine gap penalty (DEKM 2.4). Paper presentation:
- Brudno et al. LAGAN and Multi-LAGAN: efficient tools for large-scale multiple alignment of genomic DNA, Genome Research 2003 Apr;13(4):721-31. Presented by: Andra Ivan and Richard Leduc.
- (9/14) Quadratic time complexity of pairwise alignment, and the need for something faster. Seed based local alignments. Blast. Blast statistics (DEKM 2.5, 2.7): extreme value distribution.
- (9/16) Multiple alignment (DEKM 6.5). Algorithms -- Sum of pairs, Progressive alignment, DIALIGN. Profile HMMs (proteins) and PFAM domains.
- (9/21) Paper presentation:
- Do et al. ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005 Feb;15(2):330-40. Presented by Aditya Ramani and Qian Yang.
- Lippert et al. Finding Anchors for Genome wide Sequence Alignment. Proceedings of the eighth annual international conference on Research in computational molecular biology (RECOMB), 2004. Presented by Benjamin Liebald and Neelay Shah.
- (9/23) Paper presentation:
Motif finding
- (9/28) Recap transcriptional regulation and binding sites. Overrepresentation of binding site motifs. Paper presentation
- Sinha & Tompa. A Statistical method for finding transcription factor binding sites. Proc Int Conf Intell Syst Mol Biol. 2000;8:344-54. Presented by Chen Chen.
- Tompa. An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem. ISMB 1999. Presented by Yoonkyong Lee.
- (9/30) Weight matrix, relative entropy. Paper presentation:
- Lawrence et al. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science. 1993 Oct 8;262(5131):208-14. Presented by Xin He.
- (10/5) Paper presentation:
- Nimwegen et al. Probabilistic clustering of sequences: Inferring new bacterial regulons by comparative genomics. PNAS 2002 May 28;99(11):7323-8. Presented by Hamid Chitsaz.
Bayesian inference
- (10/7) Bayesian approach to local alignment statistics. Bayesian Inference (BB 2.3): priors, Gaussian, Gamma, Dirichlet, MAP & ML as the first level of Bayesian inference. Bayesian example (BB 3.1): the single die model with sequence data or with count data.
- (10/12) (Combined with 15.)
- (10/14) The Bayesian approach to motif finding: Expectation-Maximization. MEME and Stubb/PhyME. The statistical mechanics connection (BB 3.2.1 - 3.2.4, 4.4).
Module Detection
- (10/19) Hidden Markov Models for module detection. Paper presentation:
- Sinha et al. A Probabilistic Method to detect Regulatory Modules. In Intelligent Systems for Molecular Biology 2003. Presented by Tian Xia.
- Rajewsky et al. Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3:30 (2002) Presented by Bin Tan.
- (10/21) Paper presentation:
- Zhou, Q. and Wong, W.H. (2004), CisModule: De novo Discovery of Cis-Regulatory Modules by Hierarchical Mixture Modeling, Proc. Natl. Acad. Sci. USA, 101: 12114-12119. (2 students) Presented by Qiaozhu Mei and Hong Chen.
- Aerts et al. Computational detection of cis regulatory modules. Bioinformatics. 2003 Oct;19 Suppl 2:II5-II14. Presented by Chul Yun Kim.
Evolution
- (10/26) Evolution models (DEKM 8.1 - 8.3), calculating likelihood of alignment, reversibility, Metropolis algorithm for phylogenetic tree construction (DEKM 8.4), evolutionary models with gaps (DEKM 8.5).
- (10/28) Paper presentation:
- A. Siepel and D. Haussler. Combining phylogenetic and hidden Markov models in biosequence analysis. Proc. 7th Annual Int'l Conf. on Research in Computational Biology (RECOMB '03), pp. 277-286, 2003. Presented by Younhee Ko and Jaebum Kim.
- (11/2) Evolutionary events - large repeat families, minisatellites, microsatellites. Gene duplications and pseudogenes. Tandem repeat detection (with statistics). Applications - sequence turnover. Implications for probabilistic sequence analysis algorithms. Powerpoint slides
- (11/4) Paper presentation: (by instructor) Powerpoint slides
- Coin & Durbin. Improved techniques for the identification of pseudogenes. Bioinformatics, 20 Suppl 1:I94-I100, 2004.
- Wexler et al. Finding approximate tandem repeats in genomic sequences. Proceedings of the eighth annual international conference on Research in computational molecular biology, 2004.
Evolution and Motif finding
- (11/9) Paper presentation: Powerpoint slides
- Siddharthan et al. PhyloGibbs. (Full reference to be provided later.) (2 students)
Population Genetics
- (11/11) (Combined with 24.)
- (11/16) Wright Fisher model, random drift, with selection only, with mutations only, coalescence theory. Neutral sequence.
Project presentations
- (11/18) 1. Rich Leduc 2. Jaebum Kim, Yoonkyong Lee, Younhee Ko.
- (11/30) Lecture on Stochastic Grammars.
- (12/2) 1. Kranthi Varala, Ajith Harish, ChulYun Kim. 2. Neelay Shah, Andra Ivan, Ben Liebald.
- (12/7) 1. Qiaozhu Mei, Hong Cheng. 2. Aditya Ramani.
- (12/9) 1. Xin He, Bin Tan, Qian Yang. 2. Tian Xia, Chen Chen.