space space space
space
University of Illinois at Urbana-Champaign
space
space

Applying Medical Practice on Software Quality Care


Yuanyuan Zhou

Just as the practice of medicine is as much an art as a science, so is computer programming, according to computer science professor Yuanyuan Zhou. And a computer program with buggy code is like a sick patient and should be treated as such. Software bugs account for as much as 40% of computer system failures, and because they cost the U.S. economy an estimated $59.5 billion annually (about 0.6% of the gross domestic product), healthy programs are crucial. Identifying and fixing bugs in large program is extremely labor intensive, especially in commercial software, so Zhou is trying to automate as much of this process as possible. Not only are her efforts directed toward detecting, diagnosing and fixing bugs, but she is also exploring techniques that allow software to survive in the presence of bugs.

Detecting bugs with psychology

Zhou has developed a suite of tools to intelligently find software bugs by analyzing the source code for defects. Already she has found lots of bugs on open-source code like Linux and the Apache HTTP Server, and after the work was published in flagship conferences on operating systems and software engineering, she and her students have received numerous emails inquiring about the availability of her tools. Her project is called ARTS: Available, Robust, and Trusted Software-a name true to her belief the programming is truly an art.

Unlike most debugging tools, hers infers the programmer's intent, and, for most of her tools, you don't even need to run the program to use it. Normally, bug detection programs require reproduction of the bug during execution. The program is run super slowly, and the debugger looks at what's going on inside, one step at a time. Most of Zhou's tools don't care if the bug manifests itself. Instead, they simply scan the code to find the bugs. Bonuses are many: the tools are super fast, scalable to millions of lines of code, and can detect bugs peculiar to parallel programs. For bugs that require dynamic monitoring during program execution, she cleverly leverages hardware support to reduce the monitoring overhead and perturbation.

Zhou's tools draw upon how programmers write code to illuminate common sources or reasons for how and why bugs are introduced. Copy-pasted code is routinely found in large programs, and although copy-paste saves programming effort, it is prone to introducing bugs, especially those related to consistency. For instance, the programmer may have forgotten to match the names of pasted items with the rest of the code. Unfortunately, copy-pasted code has been hard to spot, and existing tools for copy-paste detection cannot identify copy-pasted code with modifications nor detect bugs introduced by copy-pasting.

Zhou's copy-paste tool, called CP-Miner, uses data mining techniques to find copy-paste induced bugs, and it can even be used on operating systems with millions lines of code. In fact, in less than twenty minutes, CP-Miner found 190,000 copy-pasted segments in Linux, which accounts for about 20% of the code. Among the top 60 errors detected by CP-Miner, Linux developers have confirmed 49 of them to be real bugs in the latest version of Linux and have fixed them after Zhou and her students reported them.

Programs follow many implicit programming rules, most of which are not documented by programmers, so Zhou developed a related tool, called PR-Miner, to detect when these rules have been violated. Like CP-Miner, it is based on data mining techniques, and has also detected many new bugs in the latest versions of Linux, Apache, and other open-source programs.

Already, these tools and others developed by the ARTS project are robust enough to be used in production settings. Although a fancy interface has not been developed yet, the tools can be easily customized and extended to find bugs in most large commercial and open source software.

Survive software failures by treating bugs as allergies

Bug detection should find bugs before something bad happens-the system crashes or security is breached. Failure recovery, on the other hand, happens after the unfortunate fact. Rx is a recovery technique Zhou has developed that allows software to survive software failures by treating bugs like allergies. "You cannot beat them, but you can run away from them," she said. "If you are allergic to cats, you simply don't touch one when you see it. That is the spirit of the idea behind Rx." Inspired by real-life allergy treatment, Rx rolls back the buggy program to a recent checkpoint, upon a software failure, and re-executes the program in a modified environment. The idea is based on the observation that many bugs correlate with the execution environment, in the same way a human allergy may be correlated with the amount of pollen in the air.

Triage for software

Like a complicated organism, gigantic programs with many lines of code are difficult to dissect and analyze. Zhou's Triage tool works like an emergency room in a hospital. The tool dynamically diagnoses an occurring software failure online at the end user site. It leverages the moment of the failure and follows a human-like diagnosis protocol to identify the bug nature, bug type, buggy code regions, bug-triggering inputs, bug-triggering environment, etc., all of which are very valuable for programmers to quickly understand the occurred failure and fix the bug.

"We treat software like a human being that needs to periodically have a physical examination," she explained. "Or if something is abnormal, we will go see the doctor. Maybe the doctor will prescribe some medicine if we're sick. If it doesn't work, we go back. Like humans, the health of software must be monitored in a similar way." The most efficient way to do this is with hardware support to monitor the software execution. If something bad happens or the software crashes, Triage's diagnosis protocol will start automatically and a recovery strategy will be implemented.

Zhou would like to see bug detection support incorporated into the run-time system including hardware, operating system, middleware, network, etc. In the future, she hopes that programmers will have special systems specifically designed for software testing, "all the way from the hardware to the operating system to the tools, so that programmers will have a special execution environment to improve their productivity." Taking a cue from the car manufacturing industry, in which reliability improves as fewer humans take part in the process, Zhou believes that the construction of software should be more automated, starting from software debugging and testing.

Written by Judy Tolliver, May 31, 2006


--
Last Modified August 07 2006 09:03:22.

space
space

space

Department of Computer Science, Thomas M. Siebel Center for Computer Science, 201 N Goodwin Ave,
Urbana, IL 61801-2302. The Department is part of the College of Engineering at the University of Illinois at Urbana-Champaign. Contact academic@cs.uiuc.edu with academic questions
or webmaster@cs.uiuc.edu with questions or comments on this page.