CS PhD student Jon Calhoun recently received one of six Blue Waters Graduate Fellowships in recognition of his research.
Calhoun has long been interested in taking things to the next level. He remembered being in the eighth grade and receiving a programmable graphing calculator. The teacher gave them some formulas to “program” into the calculator. “But it didn’t actually run,” he said. “It would just show the formula.”
Calhoun read the calculator’s manual, and learned how to make the programs run. From then on, “Every formula that came in the class, I made a program that calculated everything for me,” Calhoun said.
Soon, he was programming text-based adventure games with his calculator. “Much of my high school career was spent making or doing some sort of programming. I enjoyed it so much that I decide to pursue it during my undergrad as my degree,” Calhoun said.
Calhoun still spends time thinking how he can take computing tools to the next level. In particular, with the Blue Waters petascale computing facility just down the road, Calhoun is spending his time creating processes that will enable successful exascale computing, a thousandfold increase over the petascale level. In his PhD research, he is working with advisors Luke Olson and Marc Snir.
According to Calhoun, as high-performance computing moves to the exascale, “fault tolerance becomes a primary concern,” he said.
"A vast majority of science and engineering applications involve solving linear systems,” said Calhoun. “And algebraic multigrid [AMG] has proven itself currently as a scalable and robust linear solver." In AMG, a sparse linear system is solved by first creating a hierarchy of sparse linear systems, where the linear system at any level is dependent on the previous level in the hierarchy, and secondly traversing this hierarchy by using a predefined order known as cycling. These new linear systems are smaller and therefore faster to solve. AMG is attractive because of its scalability, robustness, and efficiency.
Calhoun explains, "As we go to exascale, we would like to make algebraic multigrid first off scale to the size of the system, but also it needs to be able to handle other challenges like fault tolerant issues, and that’s why I’m looking at it. When you are running machines with this large number of cores, some of the nodes will go down, but what can be more damaging is what if the nodes don’t go down; they just have computations that are incorrect. These will lead either to application crashes or silent data corruptions inside the application. How from an application perspective can we handle those?”
Calhoun’s research focuses on using improving the resiliency of AMG. The detection of transient faults at scale will be the first stage of his project. The work will transcend AMG and can be applied to other nonnumerical codes. Calhoun then plans to utilize algorithmic properties of AMG to create a tailored recovery scheme.
The Blue Waters Fellowship provides a year of support, including a $38,000 stipend, up to $12,000 in tuition allowance, an allocation on the powerful Blue Waters petascale computing system, and support for travel to the annual Blue Waters Symposium.
Media inquiries may be directed to:
Associate Director of Communications
moone [at] illinois [dot] edu