|
|
Class meets at every Wednesday,
Two students per unit (20 minutes presentation
and 5 minutes discussion for each research paper, i.e., two papers will be
covered per class unit. However, we may
occasionally extend the discussion to 30 minutes, i.e., working on only one
paper per class unit time when needed).
You are encouraged to select the papers you
believe are interesting and in excellent quality. Strongly recommend to present the papers
published or to be published in 2004 or 2005.
Please discuss with the course instructor before you finalize your paper
selection.
Recommended conference proceedings: KDD, SIGMOD, VLDB, PODS, ICDE, WWW, ICDM, SDM, ICML, EDBT, CIKM, PKDD, PAKDD, SSDBM, etc. You can access SIGMOD/PODS04 E-Proceedings, KDD04 E-Proceedings, VLDB04 E-Proceedings, ICDE05 E-Proceedings, SIGMOD/PODS05 E-Proceedings, KDD05 E-Proceedings, VLDB05 E-Proceedings, by clicking the corresponding links. Use citeseer or other Web services to find the papers you want to select.
Recommended journals: IEEE
TKDE, DMKD (Data Mining and Knowledge Discovery), KDD Explorations, Machine
Learning, ACM Trans. Database Systems,
JIIS, Information Systems, VLDB Journal, Data and Knowledge Engineering,
Knowledge and Information Systems (KAIS), etc.
Survey papers will usually be allocated as one
full slot.
All the papers to be presented
must give out the e-paper (one week before the presentation) and e-slides (4
hours before the presentation).
Presentations for CS591 (Data
Mining), Fall 2005
Week 1
(Aug. 24) Course organization meeting
Week 2
(Aug. 31) Class will not meet due to VLDB’05 conference
Week 3
(Sept. 7) Hong Cheng
1.
Hong Cheng: Mining Tree Queries in a Graph, by Bart Goethals,
Eveline Hoekx, and Jan Van
den Bssche, http://www.cs.uiuc.edu/class/fa05/cs591han/kdd05/docs/p61.pdf.
KDD'05.
Week 4
(Sept. 14) Deng Cai
1.
Deng Cai: Orthogonal Locality Preserving
Indexing, by Deng Cai and Xiaofei
He, The 28th Annual International ACM SIGIR Conference (SIGIR'2005),
Salvador, Brazil, Aug. 2005. http://www.ews.uiuc.edu/~dengcai2/f33-cai.pdf. (Basically, this paper introduced a new
dimensionality reduction algorithm. So I will share with you our thoughts on
dimensionality reduction.)
Week 5
(Sept. 21) Chen Chen and Hector Gonzalez
1. Chen Chen: Variable
Latent Semantic Indexing, by A. Dasgupta (Cornell
University), R. Kumar (IBM Almaden Research Center),
P. Raghavan (Yahoo! Research), and A. Tomkins (IBM Almaden Research
Center), KDD'05.
2.
Hector Gonzalez: Probabilistic Workflow
Mining, by Ricardo Silva, Jiji Zhang, James G.
Shanahan, KDD'05.
Week 6
(Sept. 28) Xiaoxin Yin and Tianyi Wu
1. Xiaoxin Yin: Efficient
Classification from Multiple Heterogeneous Databases, by Xiaoxin Yin and Jiawei Han, Proc.
2005 European Conf. on Principles and Practice of Knowledge Discovery in
Databases (PKDD'05), Porto, Portugal, Oct., 2005.
2.
Tianyi Wu: Discovering
Large Dense Subgraphs in Massive Graphs, by David Gibson, Ravi Kumar, Andrew Tomkins (
Week 7
(Oct. 5) Anthony Cozzie and Tao Cheng
1. Anthony Cozzie: Graphs
over Time: Densification Laws, Shrinking Diameters and Possible Explanations,
by J. Leskovec (
2. Tao Cheng: The
TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS,
by Vuk Ercegovac, David J.
DeWitt, Raghu Ramakrishnan
(University of Wisconsin at Madison, USA), VLDB’05.
http://www.cs.uiuc.edu/class/fa05/cs591han/vldb05/papers/p313-ercegovac.pdf
Week 8
(Oct. 12) Dong Xin and Muyuan Wang
1.
Dong Xin: Catching the Best
Views of Skyline: A Semantic Approach Based on Decisive Subspaces, by
2.
Muyuan Wang: Streaming
Pattern Discovery in Multiple Time-Series, by Spiros Papadimitriou, Jimeng Sun,
Christos Faloutsos [paper][slides]
Week 9 (Oct. 19) Qian
Yang and Chulyun Kim
1.
Chulyun Kim:
XWAVE:Optimal
and Approximate Extended Wavelets for Streaming Data, by Sudipto Guha (
2.
Qian Yang: Reasoning
about Sets using Redescription Mining, by
Mohammed J. Zaki (Rensselaer Polytechnic Institute)
and Naren Ramakrishnan
(Virginia Tech.)
Week 10
(Oct. 26) Xu Ling
1.
Xu Ling: De
novo identification of repeat families in large genomes, by Alkes L. Price, Neil C. Jones and Pavel
A. Pevzner , ISMB’05, http://bioinformatics.oxfordjournals.org/cgi/reprint/21/suppl_1/i351
Week 11
(Nov. 2) Zheng Shao
and Adam Boot
1.
Zheng Shao: Feature
Bagging for Outlier Detection, by A. Lazarevic,
V. Kumar (
2.
Adam Boot: Selectivity
Estimation for Fuzzy String Predicates in Large Data Sets, by Liang Jin (
Week 12
(Nov. 9) Charis Ermopoulos
and Xiaolei Li
1.
Charis Ermopoulos: B.-C. Chen, L.
Chen, Y. Lin, and R. Ramakrishnan, Prediction cubes,
VLDB 2005. [pdf]
2.
Xiaolei Li: SVM Selective Sampling for
Ranking with Application to Data Retrieval (Page 354) H. Yu (
Week 13
(Nov. 16) Chao Liu and Sangkyum
Kim
1.
Chao Liu: SOBER: Statistical
Model-based Bug Localization, by C. Liu, X. Yan,
L. Fei, J. Han, and S. Midkiff,
Proc. 2005 ACM SIGSOFT Symp. on the Foundations of
Software Engineering (FSE 2005),
2.
Sangkyum Kim: Using Association Rules for Fraud Detection
in Web Advertising Networks by
Week 14
(Nov. 23) Thanksgiving break: No class
Week 15 (Nov. 30) Class cancelled (due to ICDM'05 conference)
Week 16
(Dec. 7) Kely Garcia and Long Vu
1.
Kely Garcia: Using the Structure of Web Sites for Automatic Segmentation of
Tables, Kristina Lerman, Craig Knoblock - USC Information Sciences Institute,
Lise Getoor -
2.
Long Vu: Xuehua Shen, Bin Tan, ChengXiang
Zhai, Context-Sensitive
Information Retrieval with Implicit Feedback, Proceedings of ACM SIGIR 2005. (pdf)
Past CS591 Presentations (Spring 2005 and earlier)
Presentations done at
CS591 (Data Mining), Spring 2005
Week 1 (Jan. 19) Course organization meeting
Week 2 (Jan.
26) Class canceled due to NSF review meeting at D.C.
Week 3 (Feb. 2) Class
presentation and award competition: ``What are the coolest research topics in
data mining?” (Please either send me your slides or bring your slides in memory
stick!)
§
Each student will need to prepare 2 minutes presentation in a
few slides on the topic and the awards will be determined by class voting
Week 4 (Feb.
9) Research paper presentations: Yifan Li, Xiaoxin Yin
1. Xin Luna Dong and Alon Halevy. “A Platform for Personal Information Management and
Integration”, CIRD’05 (to be presented by Xiaoxin
Yin)
2. Keogh, E., Lonardi, S. and Ratanamahatana,
C. "Towards Parameter-Free Data Mining", SIGKDD'04 (to be presented by Yifan
Li)
Week 5 (Feb.
16) Research paper presentations: Hong Cheng, Sangkyum
Kim
1. H. Cheng, X. Yan, and J. Han, “SeqIndex: Indexing Sequences by Sequential Pattern Analysis”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport
Beach, CA, April 2005 (to be presented
by Hong Cheng)
2. J. S. Downie,
“The Scientific Evaluation of Music Information Retrieval
Systems: Foundations and Future”, Computer Music Journal, 28:2, pp. 12-23,
Summer 2004 (to be presented by Sangkyum Kim) and two other companion papers: 'Automatic
Musical Genre Classification of Audio Signals' and 'A Comparative Study on Content-Based
Music Genre Classification'---they all can be downloaded from
http://www-courses.cs.uiuc.edu/~cs591han/misc/.
Week 6 (Feb.
23) Research paper presentations: Zheng Shao, Chen Chen
1.
Parag and Pedro Domingos,
“Multi-Relational Record Linkage”, Proc. the KDD-2004 Workshop on
Multi-Relational Data Mining, pp. 31-48, 2004. Seattle, CA: ACM Press.
http://www.cs.washington.edu/homes/pedrod/papers/mrdm04.pdf (to be presented by Zheng
Shao)
2.
Xin Zhang, Nikos Mamoulis,
David W. L. Cheung, Yutao Shou,
“Fast Mining of Spatial Collocations”, , Proc. 2004
ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA, Aug. 2004 (to be presented by Chen Chen)
Week 7 (Mar.
2) Research paper presentations: Xiaolei Li,
Alex Kotov
1.
A.J. Bagnall,
G.J. Janacek, "Clustering Time Series from ARMA Models with Clipped Data", Proc. of the International Conference on Knowledge
Discovery and Data Mining (KDD'04), pp. 49-58,
2.
X. Li, J. Han, X. Yin, and
D. Xin, “Mining Evolving Customer-Product Relationships in
Multi-Dimensional Space”, Proc.
2005 Int. Conf. on Data Engineering (ICDE'05),
Week 8 (Mar.
9) Research paper presentations: Xifeng Yan, Chao
Liu
1. C. Liu, X. Yan, H. Yu, J. Han, and
P. S. Yu, “Mining Behavior Graphs for `Backtrace’
of Noncrashing Bugs”, Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport
Beach, CA, April 2005.
2. X. Yan, X. J. Zhou, J. Han, “Mining Closed Relational Graphs with Connectivity
Constraints”, Proc. 2005 Int. Conf.
on Data Engineering (ICDE'05),
Week 9 (Mar.
16) Research paper presentations: Hong Cheng, Hongyan
Liu
1. F. Afrati,
A. Gionis, and H. Mannila,
"Approximating a collection of Frequent Sets", KDD'04.
Week 10 (Mar.
23) Spring-break, no class
Week 11 (Mar.
30) Research paper presentations: Class canceled due to a meeting in
Week 12 (Apr. 6) Jame Galagan’s invited talk at
2405 SC.
Week 13 (Apr.
13) Research paper presentations: Deng Cai, Long
Vu
1. Cristiano Rocha, Daniel Schwabe and Marcus Poggi Aragao, A hybrid approach for searching in the semantic web, WWW '04, pp. 374--383,
(to be presented by Long Vu)
2. Xiaofei He, Deng Cai, Haifeng Liu and Wei-Ying Ma, "Locality Preserving Indexing for
Document Representation," The 27th Annual International ACM SIGIR
Conference (SIGIR'2004), July 2004. http://www.ews.uiuc.edu/~dengcai2/p250-he.pdf
Week 14 (Apr.
20) Research paper presentations: Tao Cheng, Charis
Ermopoulos
1. Oren Etzioni,
Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel Weld, and Alex Yates, "Web-Scale
Information Extraction in KnowItAll", WWW 2004. http://www.cs.washington.edu/research/knowitall/papers/www-paper.pdf (to be presented by Tao
Cheng)
2. Christian Bohm, Karin Kailing, Peer Kroger,
Arthur Zimek, “Computing Clusters of Correlation
Connected Objects” by http://www-courses.cs.uiuc.edu/~cs591han/sigmod04/R-374.pdf (to be presented by Charis
Ermopoulos)
Week 15 (Apr.
27) Research paper presentations: Hector Gonzalez, Dong Xin
1. Surajit Chaudhuri,
Gautam Das (Microsoft
Research), Vagelis Hristidis
(Florida International Univ.), Gerhard Weikum (MPI Informatik), “Probabilistic Ranking of Database Query
Results”, in Proceedings of VLDB’04. http://www.vldb04.org/protected/eProceedings/contents/pdf/RS22P3.PDF, (to be presented by Dong
Xin)
2. N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou, Y. Tao, and D. Cheung. Mining, Indexing, and Querying Historical Spatio-Temporal Data. Proc. of the 10th
ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD),
Week 16 (May 4)
There will be no class meeting.
1. Note: The data mining
research group will have a semester summary meeting in this time slot (and
location).
Presentations done at
CS591 (Data Mining), Fall, 2004
Week 2 (Sept.
3) ) Mining Multiple Relational Databases (Xiaoxin Yin)
1. S. Dzeroski. Multi-relational data mining: an introduction. ACM SIGKDD Explorations,
Volume 5, Issue 1, July, 2003.
2. X. Yin, J. Han, J. Yang,
and P. S. Yu, “CrossMine: Efficient Classification across Multiple Database
Relations”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04),
Week
3 (Sept. 10) Graph mining (Xifeng
Yan)
Week
4 (Sept. 17) OLAP, Cubing, and Data Warehousing (Xiaolei Li and Zheng Shao)
1.
X. Li, J. Han, and H. Gonzalez, “High-Dimensional
OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very Large Data
Bases (VLDB'04), Toronto, Canada, Aug. 2004 (to be presented by Xiaolei Li)
2.
Z. Shao, J. Han, and D. Xin, “MM-Cubing: Computing Iceberg Cubes by Factorizing the
Lattice Space”, Proc. 2004 Int. Conf. on Scientific and Statistical Database
Management (SSDBM'04), Santorini Island, Greece, June
2004 (to be presented by Zheng Shao)
Week
5 (Sept. 24) Social Network Analysis and Linkage Analysis (Deng Cai)
1.
Deng Cai, Xiaofei
He, Ji-Rong Wen and Wei-Ying Ma. Block-level Link Analysis (
pdf ) , The 27th Annual
International ACM SIGIR Conference (SIGIR'2004) , July 2004.
2.
Deng Cai, Xiaofei
He, Zhiwei Li, Wei-Ying Ma
and Ji-Rong Wen. Hierarchical
Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis ( pdf ), ACM Multimedia 2004 , Oct. 2004.
Week 6 (Oct. 1) Spatiotemporal Data
Indexing and Mining (Yifan Li and Dong Xin)
1.
Yufei Tao - City
University of Hong Kong, Christos Faloutsos - Carnegie Mellon University, Dimitris Papadias, Bin
Liu - Hong Kong University of Science, Prediction and Indexing of Moving Objects
with Unknown Motion Patterns.
Proc. 2004
ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June
2004 (to be presented by Yifan Li)
2.
Yuhan Cai, Raymond
Ng -
Week 7 (Oct.
8) Clustering High-Dimensional Data (Tao Cheng)
§
L. Parsons, E. Haque
and H. Liu, Subspace Clustering for High Dimensional Data: A Review , SIGKDD
Explorations, Vol. 6(1), June 2004 (To
be presented by Tao Cheng)
Week
8 (Oct. 15) Data Mining and Software Engineering (Chao Liu and Charilaos Ermopoulos)
1.
Jeremy Kolter and
Marcus A. Maloof, Learning to Detect Malicious Executables in the Wild, Proc. 2004 ACM-SIGKDD Int. Conf. on Management of
Data (KDD'04),
2. Deepayan Chakrabarti, Spiros Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully Automatic Cross-Associations, Proc. 2004 ACM-