|
|
Tuesday and Thursday, 9:30 – 10:45am,
1304
This is a graduate-level (or advanced) course on data mining. It introduces the principles, algorithms
and applications of data mining, including algorithms, methods, implementations
and applications of mining sequential and structured data, stream data, text
data, Web data, spatiotemporal data, biomedical data and other forms of complex
data. The course will serve mainly CS graduate students interested in data
mining. Also, the course may attract students from other disciplines who need
to implement and/or use data mining systems or methods to analyze large amounts
of data.
· Background: CS 411 or CS 373 or consent of instructor (good statistics and machine learning knowledge will help understand the course materials), we strongly encourage students to take the undergraduate level data mining course (CS412 is offered in every Fall semester)
· Programming: There will be one programming assignment for the course. The assignment is expected to be implemented in C++. We assume the student who attends this course has basic C++ programming skills. We will not cover programming-specific issues in this course.
·
Jiawei
Han and Micheline Kamber “Data Mining: Concepts and
Techniques” 2nd ed., Morgan Kaufmann, 2006. See the
book's home page for errata, course slides, and other
reference materials.
The following texts are recommended for reference, and are also on reserve at Grainger Engineering Library. There are numerous other books or online resources on data mining available. The books marked in red are highly recommended textbooks.
1.
C. M. Bishop, Pattern
Recognition and Machine Learning, Springer 2007.
2.
S. Chakrabarti, “Mining
the Web: Statistical Analysis of Hypertext and Semi-Structured Data”,
Morgan Kaufmann, 2002.
3.
R. O. Duda, P. E. Hart,
and D. G. Stork, Pattern Classification, 2ed., Wiley-Inter-science, 2001.
4. M. H. Dunham, Data Mining: Introductory and Advanced Topics, Prentice Hall, 2002.
5. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, The MIT Press, 1996
6. U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001
7. D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001.
8.
T. Hastie, R.
Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining,
Inference, and Prediction, Springer-Verlag, 2001
9. T. M. Mitchell, Machine Learning, McGraw Hill, 1997.
10.
P.-N.Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining,
Addison-Wesley, 2006. ISBN:
0-321-32136-7
11. S.
M. Weiss and
12.
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations, Morgan Kaufmann, 2nd
ed., 2005, ISBN 0-12-088407-0
Major conference proceedings that will be used in class, including ACM SIGKDD (KDD), ACM SIGMOD, VLDB, ICDM, SDM (SIAM Data Mining conference), ICDE, ICML, WWW, and other related conferences.
This course will draw materials mainly from the textbook and some recent data mining literature. Students will study the materials and complete all the course requirements.
There will be about three homework assignments (some may be essentially a programming assignment), one class presentation, one theme-based survey report, and one research project (which will be first presented in class and then hand in written report). The theme-based survey report is expected to be 10-15 pages in length and will be evaluated with a similar standard as a survey paper publishable in a journal or magazine. The course project will be evaluated with a similar standard as a research paper publishable in a conference.
There will be one midterm exam. The midterm exam will be 75 minutes in length around 8th or 9th week of the course. We will not normally give make-ups for missed exams; please see the policies.
We plan to determine final grades of the course in the following way:
·
Written Assignments:
21% (7% for each assignment, 3 assignments in total)
·
Theme-related presentation:
7% (each presentation may take 15-minutes)
·
Theme-based data mining
wikipedia: 7% (see: https://agora.cs.uiuc.edu/display/cs512/Home)
·
Midterm exam: 30%
· Final course project: 35% (due at the end of semester, but a one-page proposal will be due at the end of the 4th week, the final project will be evaluated based on technical innovation, thoroughness of the work, and clarity of presentation).