Resources

Newsgroup

Class Schedule

Class meets at every Wednesday, 4:00-5:00pm, 3403 SC

Two students per unit (20 minutes presentation and 5 minutes discussion for each research paper, i.e., two papers will be covered per class unit.  However, we may occasionally extend the discussion to 30 minutes, i.e., working on only one paper per class unit time when needed).

You are encouraged to select the papers you believe are interesting and in excellent quality.  Strongly recommend to present the papers published or to be published in 2004 or 2005.  Please discuss with the course instructor before you finalize your paper selection.

Recommended conference proceedings: KDD, SIGMOD, VLDB, PODS, ICDE, WWW, ICDM, SDM, ICML, EDBT, CIKM, PKDD, PAKDD, SSDBM, etc.  You can access SIGMOD/PODS04 E-Proceedings, KDD04 E-Proceedings, VLDB04 E-Proceedings, ICDE05 E-Proceedings, SIGMOD/PODS05 E-Proceedings, KDD05 E-Proceedings, VLDB05 E-Proceedings, by clicking the corresponding links.  Use citeseer or other Web services to find the papers you want to select.

Recommended journals:  IEEE TKDE, DMKD (Data Mining and Knowledge Discovery), KDD Explorations, Machine Learning, ACM Trans. Database Systems, JIIS, Information Systems, VLDB Journal, Data and Knowledge Engineering, Knowledge and Information Systems (KAIS), etc.

Survey papers will usually be allocated as one full slot.

All the papers to be presented must give out the e-paper (one week before the presentation) and e-slides (4 hours before the presentation).


Presentations for CS591 (Data Mining), Fall 2005

Week 1 (Aug. 24)  Course organization meeting

Week 2 (Aug. 31) Class will not meet due to VLDB’05 conference

Week 3 (Sept. 7) Hong Cheng

1.     Hong Cheng: Mining Tree Queries in a Graph, by Bart Goethals, Eveline Hoekx, and Jan Van den Bssche, http://www.cs.uiuc.edu/class/fa05/cs591han/kdd05/docs/p61.pdf. KDD'05.

Week 4 (Sept. 14)  Deng Cai

1.     Deng Cai: Orthogonal Locality Preserving Indexing, by Deng Cai and Xiaofei He, The 28th Annual International ACM SIGIR Conference (SIGIR'2005), Salvador, Brazil, Aug. 2005. http://www.ews.uiuc.edu/~dengcai2/f33-cai.pdf.  (Basically, this paper introduced a new dimensionality reduction algorithm. So I will share with you our thoughts on dimensionality reduction.)

Week 5 (Sept. 21) Chen Chen and Hector Gonzalez

1.     Chen Chen: Variable Latent Semantic Indexing, by A. Dasgupta (Cornell University), R. Kumar (IBM Almaden Research Center), P. Raghavan (Yahoo! Research), and A. Tomkins (IBM Almaden Research Center), KDD'05.

2.     Hector Gonzalez: Probabilistic Workflow Mining, by Ricardo Silva, Jiji Zhang, James G. Shanahan, KDD'05.

Week 6 (Sept. 28)   Xiaoxin Yin and Tianyi Wu

1.     Xiaoxin Yin: Efficient Classification from Multiple Heterogeneous Databases, by Xiaoxin Yin and Jiawei Han, Proc. 2005 European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal, Oct., 2005.

2.     Tianyi Wu: Discovering Large Dense Subgraphs in Massive Graphs, by David Gibson, Ravi Kumar, Andrew Tomkins (IBM Almaden Research Center, USA) , VLDB’05

Week 7 (Oct. 5)   Anthony Cozzie and Tao Cheng

1. Anthony Cozzie: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by J. Leskovec (Carnegie Mellon University), J. Kleinberg (Cornell University), and C. Faloutsos (Carnegie Mellon University), KDD’05, http://www.cs.uiuc.edu/class/fa05/cs591han/kdd05/docs/p177.pdf.

2. Tao Cheng: The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS, by Vuk Ercegovac, David J. DeWitt, Raghu Ramakrishnan (University of Wisconsin at Madison, USA), VLDB’05. http://www.cs.uiuc.edu/class/fa05/cs591han/vldb05/papers/p313-ercegovac.pdf

Week 8 (Oct. 12)   Dong Xin and Muyuan Wang

1.     Dong Xin: Catching the Best Views of Skyline: A Semantic Approach Based on Decisive Subspaces, by Jian Pei, Wen Jin, Martin Ester, Yufei Tao. Paper: http://www.vldb2005.org/program/paper/tue/p253-pei.pdf Slides: http://www.vldb2005.org/program/slides/tue/s253-pei.ppt

2.     Muyuan Wang: Streaming Pattern Discovery in Multiple Time-Series, by Spiros Papadimitriou, Jimeng Sun, Christos Faloutsos [paper][slides]

Week 9 (Oct. 19)   Qian Yang and Chulyun Kim

1.     Chulyun Kim:  XWAVE:Optimal and Approximate Extended Wavelets for Streaming Data, by Sudipto Guha (Univ. of Pennsylvania), Chulyun Kim, Kyuseok Shim (Seoul National Univ.),  VLDB04, http://www.cs.uiuc.edu/class/fa05/cs591han/vldb04/contents/pdf/RS8P1.PDF

2.     Qian Yang: Reasoning about Sets using Redescription Mining, by Mohammed J. Zaki (Rensselaer Polytechnic Institute) and Naren Ramakrishnan (Virginia Tech.)

Week 10 (Oct. 26)   Xu Ling

1.     Xu Ling: De novo identification of repeat families in large genomes, by Alkes L. Price, Neil C. Jones and Pavel A. Pevzner , ISMB’05, http://bioinformatics.oxfordjournals.org/cgi/reprint/21/suppl_1/i351

Week 11 (Nov. 2)   Zheng Shao and Adam Boot

1.     Zheng Shao:  Feature Bagging for Outlier Detection, by A. Lazarevic, V. Kumar (University of Minnesota) http://www.cs.uiuc.edu/class/fa05/cs591han/kdd05/docs/p157.pdf

2.     Adam Boot:  Selectivity Estimation for Fuzzy String Predicates in Large Data Sets, by Liang Jin (University of California, Irvine) and Chen Li (University of California, Irvine), VLDB2005 (slides)

Week 12 (Nov. 9)   Charis Ermopoulos and Xiaolei Li

1.    Charis Ermopoulos: B.-C. Chen, L. Chen, Y. Lin, and R. Ramakrishnan, Prediction cubes, VLDB 2005. [pdf]

2.    Xiaolei Li: SVM Selective Sampling for Ranking with Application to Data Retrieval (Page 354) H. Yu (University of Iowa), KDD05.

Week 13 (Nov. 16) Chao Liu and Sangkyum Kim

1.     Chao Liu: SOBER: Statistical Model-based Bug Localization, by C. Liu, X. Yan, L. Fei, J. Han, and S. Midkiff, Proc. 2005 ACM SIGSOFT Symp. on the Foundations of Software Engineering (FSE 2005), Lisbon, Portugal, Sept. 2005.

2.     Sangkyum Kim: Using Association Rules for Fraud Detection in Web Advertising Networks by Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi (University of California, Santa Barbara, USA), VLDB’05 (slides)

Week 14 (Nov. 23) Thanksgiving break: No class

Week 15 (Nov. 30) Class cancelled (due to ICDM'05 conference)

Week 16 (Dec. 7) Kely Garcia and Long Vu

1.    Kely Garcia: Using the Structure of Web Sites for Automatic Segmentation of Tables, Kristina Lerman, Craig Knoblock - USC Information Sciences Institute, Lise Getoor - University of Maryland, SIGMOD’04

2.    Long Vu: Xuehua Shen, Bin Tan, ChengXiang Zhai, Context-Sensitive Information Retrieval with Implicit Feedback, Proceedings of ACM SIGIR 2005. (pdf)




Past CS591 Presentations (Spring 2005 and earlier)


 

Presentations done at CS591 (Data Mining), Spring 2005

Week 1 (Jan. 19)   Course organization meeting

  • Class organization and research directions in data mining

Week 2 (Jan. 26) Class canceled due to NSF review meeting at D.C.

Week 3 (Feb. 2) Class presentation and award competition: ``What are the coolest research topics in data mining?” (Please either send me your slides or bring your slides in memory stick!)

§        Each student will need to prepare 2 minutes presentation in a few slides on the topic and the awards will be determined by class voting

Week 4 (Feb. 9) Research paper presentations:   Yifan Li,  Xiaoxin Yin

1.     Xin Luna Dong and Alon Halevy. “A Platform for Personal Information Management and Integration”, CIRD’05 (to be presented by Xiaoxin Yin)

2.     Keogh, E., Lonardi, S. and Ratanamahatana, C. "Towards Parameter-Free Data Mining", SIGKDD'04  (to be presented by Yifan Li)

Week 5 (Feb. 16) Research paper presentations: Hong Cheng, Sangkyum Kim

1.     H. Cheng, X. Yan, and J. Han, “SeqIndex: Indexing Sequences by Sequential Pattern Analysis”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005 (to be presented by Hong Cheng)

2.     J. S. Downie, “The Scientific Evaluation of Music Information Retrieval Systems: Foundations and Future”, Computer Music Journal, 28:2, pp. 12-23, Summer 2004 (to be presented by Sangkyum Kim)  and two other companion papers: 'Automatic Musical Genre Classification of Audio Signals' and 'A Comparative Study on Content-Based Music Genre Classification'---they all can be downloaded from http://www-courses.cs.uiuc.edu/~cs591han/misc/.

Week 6 (Feb. 23) Research paper presentations:  Zheng Shao, Chen Chen

1.     Parag and Pedro Domingos, “Multi-Relational Record Linkage”, Proc. the KDD-2004 Workshop on Multi-Relational Data Mining, pp. 31-48, 2004. Seattle, CA: ACM Press. http://www.cs.washington.edu/homes/pedrod/papers/mrdm04.pdf (to be presented by Zheng Shao)

2.     Xin Zhang, Nikos Mamoulis, David W. L. Cheung, Yutao Shou, “Fast Mining of Spatial Collocations”, , Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004 (to be presented by Chen Chen)

Week 7 (Mar. 2) Research paper presentations: Xiaolei Li, Alex Kotov

1.     A.J. Bagnall, G.J. Janacek, "Clustering Time Series from ARMA Models with Clipped Data", Proc. of the International Conference on Knowledge Discovery and Data Mining (KDD'04), pp. 49-58, Seattle, WA, Aug. 2004. (to be presented by Alex Kotov)

2.     X. Li, J. Han, X. Yin, and D. Xin, “Mining Evolving Customer-Product Relationships in Multi-Dimensional Space”, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.  (to be presented by Xiaolei Li)

Week 8 (Mar. 9) Research paper presentations:   Xifeng Yan, Chao Liu

1.     C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for `Backtrace’ of Noncrashing Bugs”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport Beach, CA, April 2005.

2.     X. Yan, X. J. Zhou, J. Han, “Mining Closed Relational Graphs with Connectivity Constraints”, Proc. 2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.

Week 9 (Mar. 16) Research paper presentations: Hong Cheng, Hongyan Liu

1.     F. Afrati, A. Gionis,  and H. Mannila, "Approximating a collection of Frequent Sets", KDD'04.

  1. G. Cong, K.-L. Tan, A. K. H. Tung, and X. Xu, “Mining Top-k Covering Rule Groups for Gene Expression Data”, SIGMOD’05.

Week 10 (Mar. 23) Spring-break, no class

Week 11 (Mar. 30) Research paper presentations: Class canceled due to a meeting in Chicago

Week 12 (Apr. 6) Jame Galagan’s invited talk at 2405 SC.

Week 13 (Apr. 13) Research paper presentations: Deng Cai, Long Vu

1.     Cristiano Rocha, Daniel Schwabe and Marcus Poggi Aragao, A hybrid approach for searching in the semantic web, WWW '04, pp. 374--383, (to be presented by Long Vu)

2.     Xiaofei He, Deng Cai, Haifeng Liu and Wei-Ying Ma, "Locality Preserving Indexing for Document Representation," The 27th Annual International ACM SIGIR Conference (SIGIR'2004), July 2004. http://www.ews.uiuc.edu/~dengcai2/p250-he.pdf

Week 14 (Apr. 20) Research paper presentations: Tao Cheng, Charis Ermopoulos

1.     Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel Weld, and Alex Yates, "Web-Scale Information Extraction in KnowItAll", WWW 2004. http://www.cs.washington.edu/research/knowitall/papers/www-paper.pdf (to be presented by Tao Cheng)

2.     Christian Bohm, Karin Kailing, Peer Kroger, Arthur Zimek, “Computing Clusters of Correlation Connected Objects” by http://www-courses.cs.uiuc.edu/~cs591han/sigmod04/R-374.pdf  (to be presented by Charis Ermopoulos)

Week 15 (Apr. 27) Research paper presentations: Hector Gonzalez, Dong Xin

1.     Surajit Chaudhuri, Gautam Das (Microsoft Research), Vagelis Hristidis (Florida International Univ.), Gerhard Weikum (MPI Informatik), “Probabilistic Ranking of Database Query Results”, in Proceedings of VLDB’04. http://www.vldb04.org/protected/eProceedings/contents/pdf/RS22P3.PDF, (to be presented by Dong Xin)

2.      N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. Cheung. Mining, Indexing, and Querying Historical Spatio-Temporal Data. Proc. of the  10th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, WA, August 2004.  (to be presented by Hector Gonzalez)

Week 16 (May 4)  There will be no class meeting.  

1.     Note: The data mining research group will have a semester summary meeting in this time slot (and location).


Presentations done at CS591 (Data Mining), Fall, 2004

Week 2 (Sept. 3) )   Mining Multiple Relational Databases  (Xiaoxin Yin)

1.     S. Dzeroski.   Multi-relational data mining: an introduction. ACM SIGKDD Explorations, Volume 5, Issue 1, July, 2003.

2.     X. Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple Database Relations”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA, March 2004 

Week 3 (Sept. 10) Graph mining (Xifeng Yan)

  1. X. Yan, P. S. Yu, and J. Han, “Graph Indexing: A Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004 
  2. Christos Faloutsos, Kevin McCurley, and Andrew Tomkins, “Fast Discovery of 'Connection Subgraphs,” Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004

Week 4 (Sept. 17) OLAP, Cubing, and Data Warehousing (Xiaolei Li and Zheng Shao)  

1.     X. Li, J. Han, and H. Gonzalez, “High-Dimensional OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004 (to be presented by Xiaolei Li)  

2.     Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004 (to be presented by Zheng Shao)  

Week 5 (Sept. 24) Social Network Analysis and Linkage Analysis (Deng Cai)  

1.     Deng Cai, Xiaofei He, Ji-Rong Wen and Wei-Ying Ma. Block-level Link Analysis ( pdf ) , The 27th Annual International ACM SIGIR Conference (SIGIR'2004) , July 2004.

2.     Deng Cai, Xiaofei He, Zhiwei Li, Wei-Ying Ma and Ji-Rong Wen. Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis ( pdf ), ACM Multimedia 2004 , Oct. 2004.

Week 6 (Oct. 1) Spatiotemporal Data Indexing and Mining (Yifan Li and Dong Xin)

1.     Yufei Tao - City University of Hong Kong, Christos Faloutsos - Carnegie Mellon University, Dimitris Papadias, Bin Liu - Hong Kong University of Science, Prediction and Indexing of Moving Objects with Unknown Motion Patterns. Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004 (to be presented by Yifan Li)

2.     Yuhan Cai, Raymond Ng - University of British Columbia, Vancouver, Canada, Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials. Proc. 2004 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June 2004 [Best paper award]  (to be presented by Dong Xin)

Week 7 (Oct. 8)   Clustering High-Dimensional Data (Tao Cheng)

§        L. Parsons, E. Haque and H. Liu, Subspace Clustering for High Dimensional Data: A Review , SIGKDD Explorations, Vol. 6(1), June 2004  (To be presented by Tao Cheng)

Week 8 (Oct. 15)   Data Mining and Software Engineering (Chao Liu and Charilaos Ermopoulos)

1.    Jeremy Kolter and Marcus A. Maloof, Learning to Detect Malicious Executables in the Wild, Proc.  2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004 (to be presented by Chao Liu)

2.    Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, Proc.  2004 ACM-