Resources

Newsgroup

CS512 Spring 2008 “Data Mining: Principles and Algorithms”

Online only class – Lectures available online at the beginning of each week.

All lecture are pre-recorded from Spring 2007, please look at the schedule page for updated information on the class.

Videos are available at: https://agora.cs.uiuc.edu/display/I2CS/CS512

About the Course

This is a graduate-level (or advanced) course on data mining.  It introduces the principles, algorithms and applications of data mining, including algorithms, methods, implementations and applications of mining sequential and structured data, stream data, text data, Web data, spatiotemporal data, biomedical data and other forms of complex data. The course will serve mainly CS graduate students interested in data mining. Also, the course may attract students from other disciplines who need to implement and/or use data mining systems or methods to analyze large amounts of data.

Prerequisites

·        Background: CS 411 or CS 373 or consent of instructor (good statistics and machine learning knowledge will help understand the course materials), we strongly encourage students to take the undergraduate level data mining course (CS412 is offered in every Fall semester)

·        Programming:  There will be one programming assignment for the course.  The assignment is expected to be implemented in C++. We assume the student who attends this course has basic C++ programming skills.  We will not cover programming-specific issues in this course.

Textbook

·        Jiawei Han and Micheline KamberData Mining: Concepts and Techniques 2nd ed., Morgan Kaufmann, 2006. See the book's home page for errata, course slides, and other reference materials.

Reference

The following texts are recommended for reference, and are also on reserve at Grainger Engineering Library. There are numerous other books or online resources on data mining available. The books marked in red are highly recommended textbooks.

1.      C. M. Bishop, Pattern Recognition and Machine Learning, Springer 2007.

2.      S. Chakrabarti, “Mining the Web: Statistical Analysis of Hypertext and Semi-Structured Data”, Morgan Kaufmann, 2002.

3.      R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification, 2ed., Wiley-Inter-science, 2001.

4.      M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2002.

5.      U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, The MIT Press, 1996

6.      U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001

7.      D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001.

8.      T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001

9.      T. M. Mitchell, Machine Learning, McGraw Hill, 1997.

10.  P.-N.Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2006.  ISBN: 0-321-32136-7

11.  S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan Kaufmann, 1998

12.  I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2nd ed., 2005, ISBN  0-12-088407-0

Major conference proceedings that will be used in class, including  ACM SIGKDD (KDD), ACM SIGMOD, VLDB, ICDM, SDM (SIAM Data Mining conference), ICDE, ICML, WWW, and other related conferences.

Course Format and Activities

This course will draw materials mainly from the textbook and some recent data mining literature. Students will study the materials and complete all the course requirements.

Assignments and Course Project

There will be about three homework assignments (some may be essentially a programming assignment), one theme-based survey report, and one research project.   The theme-based survey report is expected to be 10-15 pages in length and will be evaluated with a similar standard as a survey paper publishable in a journal or magazine.  The course project will be evaluated with a similar standard as a research paper publishable in a conference.

Examination

There will be one midterm exam. The midterm exam will be 75 minutes in length around 8th or 9th week of the course. We will not normally give make-ups for missed exams; please see the policies.

Evaluation

We plan to determine final grades of the course in the following way:

·        Written Assignments: 20% (7% for each assignment, 3 assignments in total)

·        Theme-based data mining wikipedia: 15% (see: https://agora.cs.uiuc.edu/display/dmwiki/Home)

·        Midterm exam: 30%

·        Final course project: 35% (due at the end of semester, but a one-page proposal will be due at the end of the 4th week, the final project will be evaluated based on technical innovation, thoroughness of the work, and clarity of presentation). 

Administration

·        Course policies


Jiawei Han