Resources: Supplementary Readings
Part I: Reference Materials for CS412
Introduction
- U.
M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy, Advances in Knowledge Discovery and
Data Mining. The MIT Press, 1996.
- J.
Han and M. Kamber. Data Mining:
Concepts and Techniques. Morgan Kaufmann, 2000.
- R. O. Duda, P. E. Hart, and D.
G. Stork, Pattern Classification, 2ed., Wiley-Inter-science, 2001.
- U. Fayyad, G. Grinstein, and
A. Wierse, Information Visualization in Data Mining and Knowledge Discovery,
Morgan Kaufmann, 2001
- T. Hastie, R. Tibshirani, and
J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and
Prediction, Springer-Verlag, 2001
- I. H. Witten and E. Frank, Data
Mining: Practical Machine Learning Tools and Techniques with Java
Implementations, Morgan Kaufmann, 2001
- V.
Ganti, J. Gehrke, R. Ramakrishnan. Mining
very large databases. COMPUTER, 32(8):38-45, 1999.
- S.
Chaudhuri, U. Dayal, and V. Ganti, Database
Technology for Decision Support Systems. Computer,
34(12):48-55, Dec. 2001.
Data Preprocessing
- T.
Dasu and T. Johnson, Exploratory Data Mining and Data
Cleaning, John Wiley \& Sons, Inc., New Jersey, 2003.
- D.
Barbará et al.
The New Jersey Data Reduction Report.Bulletin of the Technical
Committee on Data Engineering, 20, Dec. 1997, pp. 3-45.
- Liu
H.; Hussain F.; Tan C.L.; Dash M.. Discretization:
An enabling techniques. Data Mining and Knowledge Discovery,
6(4): 393-423, 2002.
- V.
Raman and J. M. Hellerstein. Potter's
Wheel: An Interactive Data Cleaning System, Proc. 2001 Int.
Conf. on Very Large Data Bases (VLDB'01), Rome, Italy, pp. 381-390, Sept.
2001.
- H.
Galhardas, D. Florescu, D. Shasha, E. Simon, and C.-A. Saita. Declarative
Data Cleaning: Language, Model, and Algorithms Proc. 2001 Int. Conf.
on Very Large Data Bases (VLDB'01), Rome,
Italy, pp.
371-380, Sept. 2001.
- D.
Pyle. Data Preparation for Data Mining. Morgan Kaufmann, 1999.
- T.
Dasu, T. Johnson, S. Muthukrishnan, V.
Shkapenyuk. Mining
Database Structure; Or, How to Build a Data Quality Browser. Proc.
2002 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'02), Madison, WI,
pp. 240-251, June 2002.
Data Warehouse, OLAP, and Data Generalization
- R. Kimball. The Data Warehouse Toolkit, 2ed,
John Wiley & Sons, New York,
2002.
- S. Chaudhuri, and U. Dayal.
An overview of data warehousing and OLAP technology. ACM SIGMOD
Record, 26(1):65-74, 1997.
- J.
Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F.
Pellow, and H. Pirahesh. Data
cube: A relational aggregation operator generalizing group-by, cross-tab
and sub-totals. Data Mining and Knowledge Discovery,
1(1):29-54, 1997.
- V. Harinarayan,
A. Rajaraman, and J. D. Ullman.
Implementing data cubes efficiently. In SIGMOD'96, pp. 205-216,
Montreal, Canada, June 1996.
- S.
Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R.
Ramakrishnan, and S. Sarawagi. On
the computation of multidimensional aggregates. In Proc. 1996 Int.
Conf. Very Large Data Bases (VLDB'96), pp. 506-521, Bombay, India,
Sept. 1996.
- Y.
Zhao, P. M. Deshpande, and J. F. Naughton. An
array-based algorithm for simultaneous multidimensional aggregates. In
SIGMOD'97, pp. 159-170, Tucson,
Arizona, May 1997.
- R.
Agrawal, A. Gupta, and S. Sarawagi. Modeling
multidimensional databases. In Proc. 1997 Int. Conf. Data
Engineering (ICDE'97), Birmingham,
England,
April 1997.
- J.
Han, Y. Cai and N. Cercone, Knowledge
Discovery in Databases: An Attribute-Oriented Approach in (VLDB'92)
, Vancouver, Canada, August 1992, pp.
547-559.
- S.
Sarawagi, R. Agrawal, and N. Megiddo.
Discovery-driven exploration of OLAP data cubes. In Proc. Int.
Conf. of Extending Database Technology (EDBT'98), Valencia, Spain, pp.
168-182, March 1998.
- S.
Sarawagi Explaining
Differences in Multidimensional Aggregates. In Proc. Int. Conf. of
Very Large Data Bases (VLDB'99), pp. 42-53
- K. A. Ross, D. Srivastava, and D.
Chatziantoniou. Complex aggregation at multiple
granularities. In EDBT'98, pp. 263-277, Valencia, Spain, March 1998.
- K.
Beyer and R. Ramakrishnan. Bottom-up
computation of sparse and iceberg cubes. In SIGMOD'99, pp.
359--370, Philadelphia,
PA, June 1999.
- J.
Han. Towards
on-line analytical mining in large databases.ACM SIGMOD Record,
27:97-107, 1998.
- G.
Sathe and S. Sarawagi. Intelligent
Rollups in Multidimensional OLAP Data. In Proc. Int. Conf. of Very
Large Data Bases (VLDB'01), Rome,
Italy, pp.
531-540
- J.
Han, J. Pei, G. Dong, and K. Wang. Efficient
computation of iceberg cubes with complex measures. In SIGMOD'01,
pp. 1--12, Santa Barbara,
CA, May 2001.
- G.
Dong, J. Han, J. Lam, J. Pei, and K. Wang. Mining
Multi-Dimensional Constrained Gradients in Data Cubes. In VLDB'01,
Rome, Italy, Sept. 2001.
- W.
Wang, H. Lu, J. Feng, and J. X. Yu. Condensed
Cube: An Effective Approach to Reducing Data Cube Size. In Proc.
2002 Int. Conf. Data Engineering (ICDE'02) , San Fransisco, CA,
April 2002.
- L. V.
S. Lakshmanan, J. Pei, and J. Han, Quotient
Cube: How to Summarize the Semantics of a Data Cube, Proc.
2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug.
2002.
- D.
Xin, J. Han, X. Li, B. W. Wah, “Star-Cubing:
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration”,
Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany, Sept. 2003.
- X. Li, J. Han, and H.
Gonzalez, “High-Dimensional
OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on Very
Large Data Bases (VLDB'04), Toronto,
Canada,
Aug. 2004
- Z. Shao, J. Han, and D. Xin,
“MM-Cubing:
Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004
Int. Conf. on Scientific and Statistical Database Management (SSDBM'04), Santorini Island, Greece, June 2004
Mining Frequent Patterns and Association Rules in
Large Databases
Basic concepts
- R.
Agrawal, T. Imielinski, and A. Swami.
Mining association rules between sets of items in large
databases. SIGMOD'93,
207-216, Washington, D.C. (citeseer)
- H. Mannila, H. Toivonen, and A. I.
Verkamo. Efficient algorithms for discovering association rules.
KDD'94, 181-192, Seattle,
WA, July 1994. (citeseer)
Efficient mining algorithms (including
efficient algorithms for mining max
and closed patterns)
- R.
Agrawal and R. Srikant. Fast
algorithms for mining association rules. In VLDB'94, pp.
487-499, Santiago, Chile, Sept. 1994.
- Ashoka
Savasere, Edward Omiecinski, Shamkant B. Navathe: An Efficient Algorithm
for Mining Association Rules in Large Databases. VLDB 1995: 432-444. (citeseer)
- J.S. Park,
M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95, San
Jose, CA, May
1995. (citeseer)
- D.W.
Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered
association rules in large databases: An incremental updating technique.
ICDE'96, New Orleans, LA.
(citeseer)
- T.
Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using
two-dimensional optimized association rules: Scheme, algorithms, and
visualization. SIGMOD'96, Montreal,
Canada.
(citeseer)
- H.
Toivonen. Sampling large databases
for association rules. VLDB'96,
134-145, Bombay, India, Sept. 1996. (citeseer)
- J. Han,
J. Pei, and Y. Yin. Mining
Frequent Patterns without Candidate Generation., Proc. 2000
ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'00), Dallas, TX,
May 2000.
- R.
Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. In Journal of Parallel and Distributed
Computing, 2000. (citeseer)
- J. Pei,
J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang. H-Mine:
Hyper-Structure Mining of Frequent Patterns in Large Databases ,
Proc. 2001 Int. Conf. on Data Mining (ICDM'01)}, San Jose, CA, Nov. 2001.
- Zaki
and Hsiao. CHARM:
An Efficient Algorithm for Closed Itemset Mining, Proc. 2002 SIAM Int. Conf. Data Mining (SDM'02), Arlington, VA,
pp. 457-473, April 2002.
- J.
Wang, J. Han, and J. Pei, “CLOSET+:
Searching for the Best Strategies for Mining Frequent Closed Itemsets”,
Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03), Washington, D.C., Aug. 2003.
- Y.
Xu, J. X. Yu, G. Liu, H. Lu, From
Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns,
Proc. 2002 Int. Conf. on Data Mining (ICDM'02)}, Japan, Dec. 2002
- F. Pan, G. Cong, A. K. H. Tung,
J. Yang, and M. Zaki , CARPENTER:
Finding Closed Patterns in Long Biological Datasets, Proc. 2003
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03),
Washington, D.C., Aug. 2003.
- G. Liu, H. Lu, Y. Xu, J. X. Yu, Ascending
Frequency Ordered Prefix-tree: Efficient Mining of Frequent Patterns,
Proc. 2003 Int. Conf. on Database Systems for Advanced Applications
(DASFAA’03), Kyoto, Japan, March
2003.
- Mohammad
El-Hajj and Osmar R. Zaïane, Inverted Matrix: Efficient Discovery of Frequent
Items in Large Datasets in the Context of Interactive Mining,
in Proc. 2003 Int'l Conf. on Data Mining and Knowledge Discovery (ACM
SIGKDD), Washington, DC, USA, August 24-27, 2003
- G.
Liu, H. Lu, W. Lou, J. X. Yu , On
Computing, Storing and Querying Frequent Patterns, Proc. 2003 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03),
Washington, D.C., Aug. 2003.
- B.
Goethals, M. Zaki: FIMI: Workshop on Frequent Itemset Mining
Implementations (An Introduction). ICDM-FIMI Workshop, Melbourne, Florida,
Nov. 2003.
- Gao
Cong, Anthony K.H. Tung, Xin Xu, Feng Pan, Jiong Yang, FARMER:
Finding Interesting Rule Groups in Microarray Datasets, SIGMOD’04
Extension
of the scope: Mining multilevel,
quantitative rules, correlation and causality
- J.
Han and Y. Fu. Discovery
of multiple-level association rules from large databases. In VLDB'95,
pp. 420-431, Zürich, Switzerland, Sept. 1995.
- R.
Srikant and R. Agrawal. Mining
generalized association rules. In VLDB'95, pp. 407-419, Zürich, Switzerland, Sept. 1995.
- R.
Srikant and R. Agrawal. Mining
quantitative association rules in large relational tables. In SIGMOD'96,
pp. 1-12, Montreal, Canada, June 1996.
- M.J.
Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast
discovery of association rules. KDD’97. August 1997. (citeseer)
- B.
Lent, A. Swami, and J. Widom. Clustering
association rules. In ICDE'97, pp. 220-231, Birmingham, England,
April 1997.
- S. Brin, R. Motwani, and C.
Silverstein. Beyond
market basket: Generalizing association rules to correlations. In SIGMOD'97,
pp. 265-276, Tucson, Arizona, May 1997.
- C. Silverstein, S. Brin, R. Motwani, and
J. Ullman. Scalable techniques for mining causal structures. VLDB'98, 594-605, New York, NY.
(citeseer)
- D.
Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S. Nestorov.
Query flocks: A generalization of
association-rule mining. SIGMOD'98, 1-12, Seattle, Washington. (citeseer)
- Y. Aumann and Y. Lindell. A
Statistical Theory for Quantitative Association Rules Proc. 1999
Int. Conf. Knowledge Discovery and Data Mining (KDD'99), San Diego, CA,
261-270, Aug. 1999.
- R.
J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98,
85-93, Seattle, Washington. (citeseer)
- J.
Han, J. Wang, Y. Lu, and P. Tzvetkov, “Mining
Top-K Frequent Closed Patterns without Minimum Support”, Proc.
2002 Int. Conf. on Data Mining (ICDM'02), Maebashi, Japan,
Dec. 2002.
- A.
Savasere, E. Omiecinski, S. B. Navathe, Mining
for Strong Negative Associations in a Large Database of Customer
Transactions, In ICDE’98,Feb., 1998, Orlando, Florida.
- E.
Omiecinski. Alternative Interest Measures for Mining
Associations, IEEE Trans. Knowledge and Data Engineering,
15(1):57-69, 2003.
- Y.-K. Lee, W.-Y. Kim, Y. D. Cai, and J.
Han, “CoMine:
Efficient Mining of Correlated Patterns”, Proc.
2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL,
Nov. 2003.
- Deepayan Chakrabarti, Spiros
Papadimitriou, Dharmendra Modha, Christos Faloutsos, Fully
Automatic Cross-Associations, Proc.
2004 ACM-SIGKDD Int. Conf. on Management of Data (KDD'04), Seattle, WA,
Aug. 2004, pp. 79-88
Constraint-based
mining:
- R.
Srikant, Q. Vu, and R. Agrawal. Mining association rules with item
constraints. KDD'97, 67-73, Newport
Beach, California,
1997. (citeseer)
- R.
Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory
mining and pruning optimizations of constrained associations rules. In
SIGMOD'98, pp. 13-24 Seattle,
Washington, June 1998.
- F.
Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new
paradigm for fast, quantifiable data mining. VLDB'98, 582-593, New York, NY.
(citeseer)
- J. Han, L. V. S. Lakshmanan,
and R. T. Ng. Constraint-based,
multidimensional data mining. COMPUTER, 32(8): 46-50, 1999.
- Edith
Cohen, Mayur Datar, Shinji Fujiwara, Aristides Gionis, Piotr Indyk, Rajeev
Motwani, Jeffrey D. Ullman, Cheng Yang: Finding Interesting
Associations without Support Pruning. In Proc. Int. Conf. on Data
Engineering (ICDE 2000), pp. 489-499, 2000.
- R.
J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proc. 1999
Int. Conf. Knowledge Discovery and Data Mining (KDD'99), pp. 145-154, San Diego, CA,
Aug. 1999. (citeseer)
- N.
Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent
closed itemsets for association rules. In Proc. 7th Int. Conf. Database
Theory (ICDT'99), pages 398-416, Jerusalem,
Israel,
Jan. 1999. (citeseer)
- J. Pei, J. Han, and L. V. S.
Lakshmanan. Mining
Frequent Itemsets with Convertible Constraints, Proc. 2001 Int.
Conf. on Data Engineering (ICDE'01), Heidelberg,
Germany,
April 2001.
- G.
Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained
correlated sets. ICDE'00, 512-521, San
Diego, CA, Feb.
2000. (citeseer)
Language
primitives and applications:
- R.
Meo, G. Psaila, and S. Ceri. A new
SQL-like operator for mining association rules. In VLDB'96, pp.
122-133, Bombay, India, Sept. 1996.
- T.
Imielinski and A. Virmani. MSQL: a
query language for database mining. Data Mining and Knowledge
Discovery, 3(4): 373-408, 1999.
- G. Dong, J. Han, J. Lam, J. Pei, K.
Wang, and W. Zou, ``Mining
ConstrainedGradients in Multi-Dimensional Databases'', IEEE
Transactions on Knowledgeand Data Engineering, 16(5):, 2004.
Classification and Prediction
- J.
R. Quinlan. Induction of decision trees. Machine Learning,
1:81-106, 1986.
- T. M. Mitchell, Machine
Learning, McGraw Hill, 1997.
- S. M. Weiss and N. Indurkhya, Predictive Data Mining, Morgan
Kaufmann, 1998
- J.
Shafer, R. Agrawal, and M. Mehta. SPRINT:
A scalable parallel classifier for data mining. In VLDB'96, pp.
544-555, Bombay, India, Sept. 1996.
- J. Gehrke, R. Ramakrishnan, V. Ganti.
RainForest:
A framework for fast decision tree construction of large datasets. In VLDB'98,
pp. 416-427, New York,
NY, August 1998.
- J.
Gehrke, V. Gant, R. Ramakrishnan, and W.-Y. Loh, BOAT
-- Optimistic Decision Tree Construction . In SIGMOD'99 , Philadelphia, Pennsylvania,
1999
- S. K.
Murthy. Automatic
construction of decision trees from data: A multi-disciplinary survey.
Data Mining and Knowledge Discovery, 2(4): 345-389, 1998.
- C. J.
C. Burges. A
Tutorial on Support Vector Machines for Pattern Recognition. Data
Mining and Knowledge Discovery, 2(2): 121-168, 1998.
- B.
Liu, W. Hsu, and Y. Ma. Integrating
Classification and Association Rule Mining. Proc. 1998 Int. Conf.
Knowledge Discovery and Data Mining (KDD'98) New York, NY,
Aug. 1998.
- M. Ankerst, M. Ester, and H.-P.
Kriegel. Towards an effective cooperation of the user and the
computer for classification. In Proc. 2000 Int. Conf. Knowledge
Discovery and Data Mining (KDD'00), pages 179-188, Boston, MA,
Aug. 2000. (citeseer)
- W. Li,
J. Han, and J. Pei, CMAR:
Accurate and Efficient Classification Based on Multiple Class-Association
Rules, , Proc. 2001 Int. Conf. on Data Mining (ICDM'01), San Jose, CA,
Nov. 2001.
- X. Yin
and J. Han, “CPAR:
Classification based on Predictive Association Rules”, Proc. 2003 SIAM
Int.Conf. on Data Mining (SDM'03), San
Fransisco, CA,
May 2003.
- H. Yu,
J. Yang, and J. Han, “Classifying
Large Data Sets Using SVM with Hierarchical Clusters”, Proc. 2003 ACM
SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03),
Washington, D.C., Aug. 2003.
- X.
Yin, J. Han, J. Yang, and P. S. Yu, “CrossMine:
Efficient Classification across Multiple Database Relations”, Proc.
2004 Int. Conf. on Data Engineering (ICDE'04), Boston, MA,
March 2004.
Cluster Analysis
- L.
Kaufman and P. J. Rousseeuw. Finding
Groups in Data: an Introduction to Cluster Analysis. John Wiley
& Sons, 1990.
- R. Ng
and J. Han. Efficient
and effective clustering method for spatial data mining. In VLDB'94,
pp. 144-155, Santiago,
Chile,
Sept. 1994.
- T.
Zhang, R. Ramakrishnan, and M. Livny. BIRCH:
An efficient data clustering method for very large databases. In SIGMOD'96,
pp. 103-114, Montreal,
Canada,
June 1996.
- E.
Schikuta. Grid clustering: An efficient hierarchical clustering method for
very large data sets. Proc. 1996 Int. Conf. on Pattern Recognition,
101-105. (citeseer)
- M. Ester, H.-P. Kriegel, J. Sander,
and X. Xu. A
density-based algorithm for discovering clusters in large spatial
databases. In KDD'96, pp. 226-231, Portland, Oregon,
August 1996.
- W.
Wang, Yang, R. Muntz, STING: A Statistical Information grid Approach to
Spatial Data Mining, VLDB’97, 1997. (citeseer)
- S.
Guha, R. Rastogi, and K. Shim. CURE:
An efficient clustering algorithm for large databases. In SIGMOD'98,
pp. 73-84, Seattle, Washington, June 1998.
- S.
Guha, R. Rastogi, and K. Shim. ROCK:
A robust clustering algorithm for categorical attributes. In ICDE'99,
pp. 512-521, Sydney, Australia, March 1999.
- R.
Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic
subspace clustering of high dimensional data for data mining applications.
In SIGMOD'98, pp. 94-105, Seattle,
Washington, June 1998.
- Alexander
Hinneburg, Daniel A. Keim: An Efficient Approach to Clustering in Large
Multimedia Databases with Noise. KDD 1998: 58-65, 1998. (citeseer)
- G.
Sheikholeslami, S. Chatterjee, and A. Zhang. WaveCluster:
A multi-resolution clustering approach for very large spatial databases.
In VLDB'98, pp. 428-439, New
York, NY, August
1998.
- D. Gibson, J. Kleinberg, and P.
Raghavan. Clustering categorical data: An approach based on
dynamic systems. In Proc. VLDB’98. (citeseer)
- G.
Karypis, E.-H. Han, and V. Kumar. CHAMELEON:
A Hierarchical Clustering Algorithm Using Dynamic Modeling. COMPUTER,
32(8): 68-75, 1999.
- Wei Wang, Jiong Yang, Richard Muntz. STING+:
an approach to active spatial data mining. ICDE 99, pp. 116-125.
1999. (citeseer)
- M. Ankerst, M. Breunig, H.-P.
Kriegel, and J. Sander. Optics:
Ordering points to identify the clustering structure. In SIGMOD'99,
pp. 49-60, Philadelphia,
PA, June 1999.
- V.
Ganti, J. Gehrke, R. Ramakrishan. CACTUS Clustering Categorical Data Using
Summaries. Proc. 1999 Int. Conf. Knowledge Discovery and Data Mining
(KDD'99), San Diego,
CA, 261-270, Aug. 1999. (citeseer) (Journal
version: citeseer)
- M. M. Breunig, H.-P. Kriegel, R. Ng,
J. Sander. LOF: Identifying Density-Based Local Outliers. In Proc.
ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2000), Dallas, TX,
2000, pp. 93-104. (citeseer)
- A. K. H.
Tung, J. Han, L. V. S. Lakshmanan, and R. T. Ng. Constraint-Based
Clustering in Large Databases , Proc. 2001 Int. Conf. on
Database Theory (ICDT'01), London,
U.K., Jan.
2001.
- A. K.
H. Tung, J. Hou, and J. Han. Spatial
Clustering in the Presence of Obstacles , Proc. 2001 Int. Conf.
on Data Engineering (ICDE'01), Heidelberg,
Germany,
April 2001
- H.
Wang, W. Wang, J. Yang, and P.S. Yu. Clustering
by pattern similarity in large data sets, Proc. the ACM
SIGMOD International Conference on Management of Data (SIGMOD), Madison, Wisconsin,
2002.
- Beil
F., Ester M., Xu X.: "Frequent
Term-Based Text Clustering", Proc. 8th Int. Conf. on Knowledge
Discovery and Data Mining (KDD'02), Edmonton,
Alberta, Canada,
2002.
- L. Parsons, E. Haque and
H. Liu, Subspace
Clustering for High Dimensional Data: A Review , SIGKDD Explorations,
Vol. 6(1), June 2004
- Samer Nassar, Jörg Sander, Corrine Cheng, Incremental
and Effective Data Summarization for Dynamic Hierarchical Clustering,
SIGMOD’04
- Sugato Basu, Mikhail Bilenko,
Raymond Mooney, A
Probabilistic Framework for Semi-Supervised Clustering, Proc. 2004 ACM-SIGKDD Int. Conf. on Management
of Data (KDD'04), Seattle,
WA, Aug. 2004
Part II: Reference Materials for CS512
Stream Data Mining
- R. Motwani and P. Raghavan, Randomized
Algorithms, Cambridge
University
Press,1995.
- S. Babu and J. Widom Continuous
Queries over Data Streams. SIGMOD Record, pp. 109-120, Sept. 2001.
- B.
Babcock, S. Babu, M. Datar, R. Motwani and J. Widom, “Models
and Issues in Data Stream Systems”, Proc. 2002
ACM-SIGACT/SIGART/SIGMOD Int. Conf. on Principles of Data base (PODS'02),
Madison, WI, June 2002. (Conference
tutorial)
- M.
Garofalakis, J. Gehrke, R. Rastogi, “Querying
and Mining Data Streams: You Only Get One Look”, Tutorial
at 2002 ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), Madison, WI,
June 2002.
- S.
Muthukrishnan, Data streams: algorithms and
applications, Proceedings
of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, 2003.
- Y.
Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, " Multi-Dimensional
Regression Analysis of Time-Series Data Streams '', Proc. 2002 Int.
Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, Aug. 2002.
- S.
Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering
Data Streams, Proc. IEEE Symposium on Foundations of Computer
Science (FOCS'00), Redondo
Beach, CA, pp.
359-366, 2000
- Stratis
Viglas, Jeffrey Naughton, Rate-Based
Query Optimization for Streaming Information Sources, SIGMOD’02
- Samuel
Madden, Mehul Shah, Joseph Hellerstein, Vijayshankar Raman, Continuously
Adaptive Continuous Queries over Streams, SIGMOD02.
- Alin
Dobra, Minos
N. Garofalakis, Johannes
Gehrke, Rajeev
Rastogi:, Processing
Complex Aggregate Queries over Data Streams, SIGMOD’02
- Gurmeet
Singh Manku, Rajeev Motwani.. Approximate
Frequency Counts over Data Streams, VLDB’02
- Yunyue
Zhu, Dennis Shasha. StatStream: Statistical
Monitoring of Thousands of Data Streams in Real Time, VLDB’02
- J. Gehrke, F. Korn, D.
Srivastava. On
computing correlated aggregates over continuous data streams.
Proc. 2001 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'01), Santa Barbara, CA,
pp. 13-24, May 2001.
- Geoff
Hulten, Laurie Spencer, Pedro Domingos: Mining
time-changing data streams. KDD
2001: 97-106
- J.
Han, ``Mining
Dynamics of Data Streams in Multidimensional Space '' (in PowerPoint),
ICDM'02 Keynote Speech, Maebashi
City, Japan,
Dec. 2002.
- C.
Aggarwal, J. Han, J. Wang, P. S. Yu, “A
Framework for Clustering Evolving Data Streams”, Proc.
2003 Int. Conf. on Very Large Data Bases (VLDB'03), Berlin, Germany,
Sept. 2003.
- H. Wang, W. Fan, P. S. Yu,
and J. Han, “Mining
Concept-Drifting Data Streams using Ensemble Classifiers”, Proc. 2003
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03),
Washington, D.C., Aug. 2003.
- C. Giannella, J. Han,
J. Pei, X. Yan and P.S. Yu, “Mining
Frequent Patterns in Data Streams at Multiple Time Granularities”, H.
Kargupta, A. Joshi, K. Sivakumar, and Y. Yesha (eds.), Next Generation
Data Mining, 2003.
- Wei Fan, Systematic
Data Selection to Mine Concept-Drifting Data Streams, KDD-04.
Mining Time-Series
- R.
Agrawal, K.-I. Lin, H.S. Sawhney, and K. Shim. Fast
similarity search in the presence of noise, scaling, and translation in
time-series databases. In VLDB'95, pp. 490-501, Zurich, Switzerland, Sept. 1995.
- Y.-S.
Moon, K.-Y. Whang, W.-K. Loh. Duality-Based
Subsequence Matching in Time-Series Databases., Proc. 2001 Int.
Conf. Data Engineering (ICDE'01), Heidelberg,
Germany,
pp. 263-272, April 2001
- R.
Agrawal, G. Psaila, E. L. Wimmers, and M. Zait. Querying
shapes of histories. In VLDB'95, pp. 502-514, Zürich, Switzerland,
Sept. 1995.
- Michail Vlachos, Chris Meek, Zografoula Vagena, Dimitrios
Gunopulos, Identifying Similarities, Periodicities and Bursts for
Online Search Queries, SIGMOD’04, pp. 213-224
Mining Sequential Patterns
5. R.
Agrawal and R. Srikant. Mining
sequential patterns. In ICDE'95, pp. 3-14, Taipei, Taiwan,
March 1995.
- Mannila H.; Toivonen H.;
Inkeri Verkamo A., Discovery
of Frequent Episodes in Event Sequences. Data Mining and Knowledge
Discovery, 1997, vol. 1, no. 3, pp. 259-289(31)
- M.
Garofalakis, R. Rastogi, and K. Shim. SPIRIT:
Sequential pattern mining with regular expression constraints. In Proc.
1999 Int. Conf. Very Large Data Bases (VLDB'99), pp. 223-234,
Edinburgh, UK, Sept. 1999.
- J. Pei,
J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth.
, Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
- J.
Han, G. Dong, and Y. Yin. Efficient mining of
partial periodic patterns in time series database. In ICDE'99,
pp. 106-115, Sydney, Australia, April 1999.
- J.
Pei, J. Han, and W. Wang, “Mining
Sequential Patterns with Constraints in Large Databases”, Proc.
2002 Int. Conf. on Information and Knowledge Management (CIKM'02)}, Washington, D.C.,
Nov. 2001.
- X.
Yan, J. Han, and R. Afshar, “CloSpan:
Mining Closed Sequential Patterns in Large Datasets”, Proc. 2003 SIAM Int. Conf. on Data Mining (SDM'03), San Fransisco, CA,
May 2003.
- P. Tzvetkov, X. Yan, and J. Han, “TSP: Mining Top-K
Closed Sequential Patterns”, Proc.
2003 Int. Conf. on Data Mining (ICDM'03), Melbourne, FL,
Nov. 2003.
- J.
Wang and J. Han, “BIDE: Efficient
Mining of Frequent Closed Sequences”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04), Boston,
MA, March 2004.
Spatial, Spatiotemporal, and Multimedia Data Mining
- K.
Koperski and J. Han. Discovery
of spatial association rules in geographic information databases. In Proc.
4th Int'l Symp. on Large Spatial Databases (SSD'95), pp. 47-66, Portland, Maine,
Aug. 1995.
- X.
Zhou, D. Truffet, and J. Han. Efficient polygon
amalgamation methods for spatial OLAP and spatial data mining. In SSD'99,
pp. 167-187, Hong Kong, Aug. 1999.
- J. Han,
R. B. Altman, V. Kumar, H. Mannila and D. Pregibon, “ Emerging
Scientific Applications in Data Mining”, Communications of ACM,
45(8):54-58, 2002.
- Shashi Shekhar and Sanjay
Chawla, Spatial Databases: A Tour , Prentice Hall, 2003 (ISBN
013-017480-7). Chapter
7.: Introduction to Spatial
Data Mining.
- S.
Shekhar and Y. Huang, Discovering
Spatial Co-location Patterns: A Summary of Results , Proc. in 7th
International Symposium on Spatial and Temporal Databases(SSTD01), L.A.,
CA, July 2001.
- S.
Shekhar, C. T. Lu, and P. Zhang, A
Unified Approach to Detecting Spatial Outliers , GeoInformatica,
2003 (A shorter version appeared in SIGKDD 2001).
- S.
Shekhar, V. R. Raju, P. Schrater, W. Wu, Spatial
Contextual Classification and Prediction Models for Mining Geospatial Data,
IEEE Trans. on Multimedia Systems, vol4. No.2, June 2002.
- Tom
Barclay, Jim Gray, and Don Slutz, Microsoft
TerraServer: A Spatial Data Warehouse, 2000 ACM SIGMOD Dallas, TX,
pg. 307-318
- Yufei Tao - City University of Hong
Kong, Christos Faloutsos - Carnegie Mellon University,
Dimitris Papadias, Bin Liu - Hong Kong University of Science,
Prediction and Indexing of Moving Objects with Unknown
Motion Patterns. Proc. 2004 ACM-SIGMOD Int. Conf. on Management
of Data (SIGMOD'04), Paris,
France,
June 2004
- Yuhan Cai, Raymond Ng - University of British
Columbia, Vancouver,
Canada, Indexing Spatio-Temporal Trajectories with Chebyshev
Polynomials. Proc. 2004 ACM-SIGMOD
Int. Conf. on Management of Data (SIGMOD'04), Paris, France,
June 2004
- Man Lung Yiu, Nikos Mamoulis, Clustering
Objects on a Spatial Network, SIGMOD’04
Mining
Graphs and Structured Patterns
- X.
Yan and J. Han, “gSpan:
Graph-Based Substructure Pattern Mining”, Proc. 2002 Int. Conf. on
Data Mining (ICDM'02), Maebashi,
Japan,
Dec. 2002.
- X.
Yan and J. Han, “CloseGraph:
Mining Closed Frequent Graph Patterns”, Proc. 2003 ACM SIGKDD Int.
Conf. on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C.,
Aug. 2003.
- G.Jeh, and J. Widom, Mining
the Space of Graph Properties, KDD'04 pp.187-197.
- Christos Faloutsos, Kevin
McCurley, and Andrew Tomkins, “Fast
Discovery of 'Connection Subgraphs’,” Proc. 2004 ACM-SIGKDD Int. Conf. on Management
of Data (KDD'04), Seattle,
WA, Aug. 2004
- X.
Yan, P. S. Yu, and J. Han, “Graph
Indexing: A Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD
Int. Conf. on Management of Data (SIGMOD'04), Paris, France, June
2004
Biological Data Mining
- J.
Yang, P. Yu, W. Wang, and J. Han, '' Mining
Long Sequential Patterns in a Noisy Environment '', Proc. 2002
ACM-SIGMOD Int. Conf. on Management of Data (SIGMOD'02), Madison, WI, June
2002.
- H. Wang, W. Wang, J. Yang, and
P.S. Yu. Clustering
by pattern similarity in large data sets, Proc. the ACM SIGMOD
International Conference on Management of Data (SIGMOD), Madison, Wisconsin,
2002.
Mining Social Networks
- P.
Domingoa and M. Richardson. Mining
the Network Value of Customers, in Proc. 2001 ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (pp. 57-66), 2001. San Francisco, CA:
ACM Press.
- P.
Domingos and M. Richardson. Mining
Knowledge-Sharing Sites for Viral Marketing, Proceedings of the Eighth
International Conference on Knowledge Discovery and Data Mining, 2002.
- D. Kempe, J. Kleinberg, E. Tardos. Maximizing
the Spread of Influence through a Social Network. Proc. 9th ACM SIGKDD
Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
- Deng Cai, Xiaofei He, Ji-Rong
Wen and Wei-Ying Ma. Block-level Link Analysis ( pdf ) , The 27th Annual International ACM
SIGIR Conference (SIGIR'2004) , July 2004.
- Deng Cai, Xiaofei He,
Zhiwei Li, Wei-Ying Ma and Ji-Rong Wen. Hierarchical Clustering of WWW
Image Search Results Using Visual, Textual and Link Analysis ( pdf ), ACM Multimedia 2004 , Oct. 2004.
Multi-relational
Data Mining
- S. Dzeroski. Multi-relational data mining: an introduction. ACM
SIGKDD Explorations, Volume 5, Issue 1, July, 2003.
- X. Yin, J.
Han, J. Yang, and P. S. Yu, “CrossMine: Efficient Classification across Multiple
Database Relations”, Proc. 2004 Int. Conf. on Data
Engineering (ICDE'04), Boston, MA, March 2004
Intrusion Detection and Data Mining
- S.
Mukkamala et al. “Intrusion
detection: support vector machines and neural networks,” in IEEE IJCNN
(May 2002).
- W.
Lee, S. Stolfo, and K. Mok. A
data mining framework for building intrusion detection models. In
Information and System Security, Vol. 3, No. 4, 2000.
- Stefan Axelsson, “Intrusion
Detection Systems: A Taxomomy and Survey”, Technical Report No
99-15, Dept. of Computer Engineering, Chalmers University of Technology, Sweden,
March 2000.
- Stefan Axelsson, “Research
in Intrusion Detection Systems: A Survey”, Technical Report No
98-17, Dept. of Computer
Engineering, Chalmers University of Technology, Sweden, Dec 15, 1998 revised
Aug 19, 1999.
- L. Mé and C.
Michel, Intrusion
Detection: A Bibliography - (2001)
- The Snort project,
Snort
User Manual 2.1.1, 2004
Collaborative Filtering and Data Mining
- Badrul
M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: “Analysis
of recommendation algorithms for e-commerce.” ACM Conference on
Electronic Commerce 2000:158-167
- J.
Breese, D. Heckerman, C. Kadie, “Empirical
Analysis of Predictive Algorithms for Collaborative Filtering”, In Proceedings of Fourteenth Conference
on Uncertainty in Artificial Intelligence, Madison, WI,
Morgan Kaufmann, July 1998 (also
has some item-item methods)
- B. M. Sarwar, G. Karypis, J. A.
Konstan, and J. Riedl.“Item-based
collaborative filtering recommendation algorithms”. In Proc. of the
10th International World Wide Web Conference (WWW10), Hong Kong, May 2001.
- Weiyang Lin, Sergio A. Alvarez, and
Carolina Ruiz. “Efficient
adaptive-support association rule mining for recommender systems”,
Data Mining and Knowledge Discovery, 6:83--105, 2002
Web Mining
- S.
Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A.
Tomkins, D. Gibson, and J. Kleinberg. Mining
the Web's link structure. COMPUTER, 32(8):60-67, 1999.
- J.
M. Kleinberg. Authoritative
Sources in a Hyperlinked Environment. Journal of ACM, 46(5):604-632,
1999.
- H.
Yu, J. Han, and K. C.-C. Chang, " PEBL:
Positive Example Based Learning for Web Page Classification Using SVM '',
Proc. 2002 Int. Conf. on Knowledge Discovery in Databases (KDD'02), Edmonton, Canada, July 2002.
- K.
Wang, S. Zhou and S. C. Liew. Building
hierarchical classifiers using class proximity. In VLDB99, Edinburgh, UK, Sept. 1999.
- J.
Han, and K. C.-C. Chang, “Data Mining
for Web Intelligence”, Computer, Nov. 2002
- Corin
R. Anderson, Pedro Domingos, Daniel S. Weld: Personalizing Web
Sites for Mobile Users. In WWW 2001: pages 565-575. 2001.
Data Mining Applications and Trends in Data Mining
- H.
Mannila, Theoretical
Frameworks of Data Mining. SIGKDD Explorations , 1(2): 30-32, 2000
- C.
Clifton and D. Marks. Security
and Privacy Implications of Data Mining. In Proc. 1996 SIGMOD'96
Workshop on Research Issues on Data Mining and Knowledge Discovery
(DMKD'96), Montreal, Canada, pp. 15-20, June 1996.
- R. Agrawal and R. Srikant. Privacy-preserving
data mining. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data
(SIGMOD'00), pages 439-450, Dallas,
TX, May 2000.
- H.
V. Jagadish, J. Madar, and R. Ng. Semantic
compression and pattern extraction with fascicles. In Proc. 1999
Int. Conf. Very Large Data Bases (VLDB'99), pages 186-197, Edinburgh, UK, Sept. 1999.
- Qiming Chen, Umesh Dayal, Meichun Hsu, OLAP-based Scalable Profiling of
Customer Behavior, In Proc.1999 Int.l Conf.Data Warehousing
and Knowledge Discovery(DAWAK99), Italy, 1999.
- Ron Kohavi, Mining E-Commerce Data: The Good, the
Bad, and the Ugly, KDD’2001, 2001.
- S. Hill and F. Provost,The Myth of the Double-Blind
Review? Author Identification Using Only Citations, KDD Explorations,
5(2), Jan. 2004
Data
Mining and Software Engineering
- Jeremy
Kolter and Marcus A. Maloof, Learning to Detect Malicious Executables in the Wild,
Proc. 2004 ACM-SIGKDD Int. Conf.
on Management of Data (KDD'04), Seattle, WA, Aug. 2004
Jiawei Han