UCAIR: You Care

ChengXiang Zhai
Professor ChengXiang Zhai is as frustrated with search as everyone else, and he's doing something about it by creating a software agent for intelligent searching that considers both user information and search context. This software, based on machine-learning and Bayesian decision theory, keeps track of a user's recent history and employs this information to help reorganize and optimize search results. "The idea is that every user should have their own search assistant," he said, "and you can integrate all the information that surrounds the user." The bonus is that the Zhai's system resides on the desktop.
A common major limitation of existing retrieval models and systems is that the retrieval decision is, in general, based solely on the query and the document collection. Information about the actual user and the search context is largely ignored. Zhai has developed a system called User-Centered Adaptive Information Retrieval, in which a personal search agent uses past queries and viewed documents to improve document ranking accuracy, refining as it goes along.
Zhai's approach keeps more information on the client side. By keeping the user's query history on the desktop, rather than in a log on a Google server, he avoids the privacy issues that have plagued the search industry. It also distributes the workload among many more machines, thereby relieving the server.
Typing a search term like "IR" into a search engine, like Google, will fetch articles about "infrared" as well as "information retrieval." Typing the same thing into the UCAIR toolbar will return the same information but with this important difference: The hits at the top of the list are going to be about "information retrieval" because the system knows that the word "infrared" has not shown up in any documents browsed by the user in the last three months, but "information retrieval" has.
Zhai has based his framework on Bayseian decision theory. This is a statistical approach used to solve pattern recognition problems in which something is already known about the information-in this case, probabilistic information (the probability that "information retrieval" is sought when the user types in "IR").
One technical challenge is how to balance current information and history information. If the software relies too much on history, then the information provided by the search may not be useful. The same applies if it doesn't use enough of the history. The system also has to notice that perhaps in the future, the user does want to find out about something "infrared." A technique called Adaptive Personalization uses a statistical model to build on user clicks. If a user clicks on a particular link, then more certainty is assumed by the software.
Given a set of news articles to summarize, a Theme Evolution Graph can be automatically generated that shows what topics are covered using comparative text mining. To build the graph, the software extracts common and unique themes from a set of comparable text collections (e.g., news articles). The next step is to identify phrases instead of just words. By developing new retrieval methods to leverage user similarities to better infer one particular user's information need based on information about similar users, Zhai's technology can be applied to the customer service domain.
In addition to personalized search, Zhai is working on several other ways to enhance our capacity to access and exploit online information. To help a user navigate in a huge information space, which is especially useful when a user cannot find a good query to easily find the wanted information, he is developing techniques to automatically construct a multi-resolution topic map. This map will guide a user in navigating an information space in the same way a geographic map guides a person touring a city. "Humans beings are already comfortable by navigating by written maps," said Zhai, "and this information space map will be multi-resolution, in which the user can zoom in and out of topic spaces."
To help a user analyze and summarize topic patterns in a large set of documents, such as search results, Zhai is also developing a suite of contextual text mining tools that can automatically extract "contextual topic patterns" for text, such as temporal and spatial distribution of topics. These tolls can be used to summarize customers' opinions about products, to track and detect emerging trends in research publications, and to compare business competitors in terms of their product strengths.
Written by Judy Tolliver, July 5, 2006
--
Last Modified August 07 2006 08:57:12.