I am currently a Ph.D student at the Computer Science Department, University of Illinois at Urbana Champaign. My Ph.D. advisor is Prof. Jiawei Han.
I received my Master degree in The Department of Automation, at Tsinghua University in 2003.
Before this, I received my B.E. degree in the same department in 2000.
My research interests include data mining, machine learning and information retrieval.
My main research interests are in the fields of Information Processing. Related topics include data mining, information retrieval, machine learning, pattern recognition... I'd like to help people get what they need more easily. Let the computer do more for us with less help from us, learn from experience, adapt effortlessly, and discover new knowledge. We need computers that reduce the information overload by extracting the important patterns from masses of data. And we need computer understand what we need. This poses many deep and fascinating scientific problems: How can a computer decide autonomously which representation is best for target knowledge? How can it tell genuine regularities from chance occurrences? How can pre-existing knowledge be exploited? How can learned results be made understandable by us?
My research addresses these and related questions. Research topics that I'm working on, or have recently worked on, include:
Learning on Graph (Manifold)
Spectral Regression
Spectral Regression (SR) is a novel regression framework for efficient
regularized subsapce learning. It casts the problem of learning an embedding function into a
regression framework which facilitates both efficient computation and the use
of regularization techniques.
Locality Preserving Criteria
In many applications,
the high dimensional data lie on a low dimensional manifold
embedded in the ambient space. Such a manifold structure can usually be revealed by the assumption
that neighboring points probably belong to the same underlying class (or have the similar responses).
We called it Locality
Preserving Criteria. Based on this criteria, we have been developing manifold
awareness algorithms for unsupervised and semi-supervised learning.
Mining the Web
VIsion-based Page Segmentation (VIPS)
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. I am working on a fundamental problem that how to extract the semantic structure of a web page based on its visual perception.
Block Level Link Analysis
By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. We further developed two ranking algorithms, block level PageRank and block level HITS.
Web Image Searching and Clustering
With VIsion-based Page Segmentation, a web page can be partitioned into blocks, each containing semantically coherent information, and the textual and link information of an image can be accurately extracted within each image block.
The textual information is used for image representation.
A large image graph can obtained from block-level link analysis. This method is less sensitive to noisy links than previous method like PicASHOW, and hence the image graph can to some extent reflect the semantic relationship between images. By spectral techniques, the obtained image graph can be partitioned into clusters which are used to enhance the search results.