· All answers must be computer generated (including text and diagrams).
· The hand-in version must be ordered correctly and stapled in the top left corner.
· The hand-in version must include a header page (or with sufficient space) indicating: student name, student number, user id, course number and assignment number.
1. Multi-dimensional stream data outlier analysis: An important task in stream data analysis is to detect outliers in a multidimensional environment, such as detect unusual power surge in a multi-dimensional space, including time (i.e., comparing with the normal duration), region (i.e., comparing with surrounding regions), sector (i.e., university, residence, government, and so on), etc. Outline an efficient stream OLAP method that may detect outliers effectively and efficiently in data streams, and reason why your design can ensure such quality.
2.
Frequency counts
for frequent items over data streams: The lossy
counting algorithm ensures a theoretical error bound for maintaining counts of
frequent items over data streams.
However, it does not consider how to maintain the approximate counts for
maximal number of items over data streams given a limited amount of main
memory. Moreover, it does not give more
weights to more recent items than older ones.
Outline two efficient stream counting methods that overcome these
deficiencies, respectively, that is, one method that takes limited memory into
consideration, and the second one considers the assigned weights of items
arriving at different times. Reason why
your design has such advantages.
3.
Proposing an effective method for mining botnet: Botnet
is a serious kind of attack that uses a large pool of compromised hosts called
zombie machines to launch malicious activities, such as DOS attacks, illegal
data hosting, valuable information gathering, etc. These machines are infected
with a bot, which is typically an executable program,
acting as Trojan horses, backdoors, or worms, under the control of the
attacker. The collection of bots, running on thousands of zombie machines, is
therefore formed a botnet. An attacker could remotely
control the large number of infected zombie machines via the botnet to attack against other target machines, such as
school, enterprise, or government resources, or use it for other malicious
activities. Propose
a data mining method that may automatically detect as many as possible infected
machines in the botnet and eventually uncover the
whole botnet as well as the hidden controlled
attackers.
4. Sequential
patterns in the same sequence: Sequential
pattern mining algorithm introduced by Srikant and Agrawal (1996) is to find
sequential patterns among a set of sequences. Although there has been
interesting follow up studies to develop interesting mining algorithms, such as
PrefixSpan (
Page maintained by: Jiawei Han
Last update: January 31, 2007