Resources

Newsgroup

CS412 Fall 2007.  Introduction to Data Warehousing and Data Mining

Wednesday and Friday, 12:30 - 1:45 pm, 1304 Siebel Center à 1310 DCL (starting on Sept. 5)

Please note that due to the large size of class, our classroom is to be changed to 1310 DCL starting on Wed. Sept. 5, 2007.

Administration

About the Course

As an introductory course on data mining, this course introduces the concepts, algorithms, techniques, and systems of data warehousing and data mining, including (1) data preprocessing, (2) design and implementation of data warehouse and OLAP systems, (3) methods for effective and scalable data mining, including frequent pattern and correlation analysis, classification and predictive modeling, and cluster analysis.  The course will serve both senior-level computer science undergraduate students and the first-year graduate students interested in the field.  Also, the course may attract students from other disciplines who need to implement and/or use data warehouse and data mining systems to analyze large amounts of data.

Prerequisites

  • Background: “Data Structure and Software Principles” or consent of instructor (good statistics and machine learning knowledge will help better understanding the course materials).
  • Programming: We will give one or two programming assignments.   You will need to be familiar with at least one programming language, such as C++, or Java. We will not cover programming-specific issues in this course.

Textbook

  • Data Mining: Concepts and Techniques, 2nd ed., Jiawei Han and Micheline Kamber, Morgan Kaufmann, 2006. See the book's home page for course slides and other reference materials.

Reference

The following texts are recommended but not required, for reference, and are also on reserve at Grainger Engineering Library. There are numerous other books or online resources on data mining available.

  • D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001.
  • T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, 2001.
  • T. M. Mitchell, Machine Learning, McGraw Hill, 1997.
  • P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2005. 
  • I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2nd ed. 2005.

Course Format and Activities

This course will draw materials mainly from the textbook.  Students will study the materials and complete all the course requirements.

Reading: Before and After Classes

We encourage students to read ahead, before lectures for the materials to be discussed. When doing so, if you have any questions that you wish to be discussed in class, email the questions to your section TA with subject "CS412: Lecture Questions"  by 10am the day before the corresponding lecture, and we will try to address the questions in class if they are common confusions.  

Assignments

There will about four assignments, spaced out over the course of the semester. Among these assignments, at least one will be programming assignment.

Examinations

There will be two exams. The midterm exam will be 75 minutes in length, and the final will be 2 hours in length. We will not normally give make-ups for missed exams; please see the policies.

Extra Quarter-Unit Work

This course is designed for three-hour credit.  However, graduate students may take this course for one extra unit if you are going to show your research strength.  Those taking the class for more credit are expected to have one-hour meeting time per week, and finish a course project.  Please refer to project description for more details.

Evaluation

We plan to determine final grades of the course in the following way:

  • Written Assignments: 25% (3 homework assignments expected)
  • Programming assignments: 10% (one programming assignment expected)
  • Midterm exam: 30%
  • Final exam: 35%
  • Project only for one extra unit students: 25%. The overall scores will be scaled proportionally.

Jiawei Han