CS 296 Honors Projects

Course Instructor: Eric Shaffer (shaffer1@uiuc.edu)

CS296 is a 1-hour honors course that CS225 students can take in addition to the regular CS225 course. That is, you should NOT drop CS225 if you want to enroll in CS296; instead, you should enroll in both CS225 and CS296. You will receive a separate grade for each course, and thus your CS296 work cannot affect the grade you get in CS225.

There are different sections for CS296; make sure you are enrolled in section &25. If you do want to enroll in the course, you should try and do this by the undergrad class-add date, which is about two weeks after the semester starts. If you want to try and add the course after that date, you will need to go to your college office and talk to them, and there's still no guarantee that they will let you sign up late.

You can work alone or in groups. Group projects will be expected to be more elaborate than individual projects. You can propose your own project or you can select a project from the list provided on this page. Any project proposal will need to be approved by the course instructor.   You are free to work very independently on this project; you are only required to meet with the instructor to propose a project and give a mid-term report. "Handing in" the project will be done via e-mail.

Using a code management system, specifically the subversion system, is strongly reccommended for group projects. It could also be useful for individual projects as it will provide you with a way to backup your project easily. If you are unfamiliar with subversion, you can read about it here. If you wish to use subversion to keep track of your code, e-mail me and we can have the department set up a repository for you. 

You should feel free to e-mail the instructor for help at any time. The goal of this project is for you to accomplish something challenging and rewarding and we will assist as much as is feasible to make sure that happens. The final project handin is the only thing that will be graded.

Important dates:

2/11/2008   --- Project proposal due. E-mail the proposal by this date and we'll schedule a face-to-face meeting to discuss it.

3/22/2008   --- Midterm report due. E-mail an update on your progress and schedule a meeting.

5/9/2008     --- Project due

Suggested Projects

  1. B-Tree [out-of-core data structures]

    The B-Tree is a data structure used to organize data on a hard disk to allow efficient insertions, deletions, and searches. This is very useful, as a lot of  data sets are too large to fit in the memory of commodity computers. The utility of the B-Tree is attested to by its use in a multiplicity of databases and filesystems. The goal of this project is to implement the B-Tree as a C++ template class. We will choose an interesting and very large data set (maybe biological data or census data) to test your implementation.

    References: [Wikipedia B-Tree article]

  2. External sorting [out-of-core data structures]

    Many applications require the manipulation of data that is too large for the memory of conventional computers. Sorting is, of course, one of the fundamental algorithmic tasks in the field of computing and the ability to sort very large data sets can. The goal of this project will be to implement an external sorting algorithm in C++.  The exact algorithm used can be chosen according your interests; multiway mergesort or radix sorting would both be good candidates. I would like the end result of this project to be a production-quality code that can be made available on the web.  Free and easy to use external sorting codes are not easily found and I think putting one up on the web would generate a few hits and possibly be a nice line on your resume.

    References: [Wikipedia External Sorting article]

  3. Three-dimensional kd-tree [geometric data structures]

    The kd-tree is a geometric data structure used in a number of computer graphics applications (ray-tracing and photon mapping are two examples). It is also useful for k-nearest neighbor queries and that will be the application in which we are interested. In this project, you will implement a 3D kd-tree as a C++ template class. The operations the tree should support will be insertion of data using a 3D point as a key, clear (delete everything), and finding the k-nearest neighbors of a query point. We will provide an implementation of 3D points and code to read and write points to files as well as some test data.

    References: [Wikipedia kd-tree article]

  4. Computing the distance between two 3D surfaces [geometric data structures]

    Representations of 3D surfaces, typically meshes of triangles, are useful in many computational activities. One important operation on meshes is calculating the "distance" between two surfaces in an attempt to understand how similar the meshes are. The goal of this project is to implement an algorithm to find the maximum of the set of minimum distances between two triangle meshes. This actually is less confusing than it sounds -- it's called the Hausdorff distance and you can find a full description of it by reading the linked references below. Implementation will be in C++ and we will provide you with code to read, write and view meshes. For this project, a simple approximation of the distance by sampling at the vertices of the meshes will be sufficient. Data structures come in to play in that you will need to implement a simple spatial grid data structure to accelerate the computation. If this is a single person project, an STL multimap can be used instead.

    References:  [McGill University webpage on Hausdorff distance]
                        [MESH: Measuring Errors between Surfaces using the Hausdorff distance(PDF)]

  5. Streaming point set processing [geometric data sturctures]

    Systems for acquiring 3D data (e.g. laser scanners) have become much more common over the last decade. These systems typically generate data sets of three-dimensional points. Before being used in computer graphics applications, these points sets often need to have "noise" removed; this means smoothing the point set to remove spurious sharp features that are errors from the scanning process. One method for doing this is proposed in the paper linked below. The goal of this project will be to implement a subset of the point set operations described in the paper (such as estimating normal vectors, smoothing, and rendering).  We will provide some starter code for points, reading and writing files, and code for a kd-heap. If the project goes well and you would like to expand it to a research project we can try to implement operations not described in the paper (e.g. simplifcation and refinement) and possibly write a paper about it.

    References: [Stream-Processing Point Data(PDF)]

  6. Triangle mesh repair [geometric data structures]

    The triangle mesh is perhaps the most commonly used data structure for representing 3D surfaces in computer graphics and scientific computing. The mesh generation process can result in meshes that are "broken" in some sense. They can have duplicate triangles, holes, self-intersections, mid-edge intersections, poorly-shaped triangles and numerous other issues. In this project, we will create a tool to fix some of these problems. The implementation will be in C++. I can provide code to read, write and view meshes as well as some meshes to serve as test cases.

    References:[Automatic Restoration of Polygon Models (PDF)][Structure Preserving CAD Model Repair (PDF)]

  7. Clustering for automatic bookmark organization [graph algorithms, web computing]

    As you may be aware, some people have made a great deal of money by using graph theory to analyze the structure of the web. This project will do that as well (except for the making money part...). The set of bookmarks from your browser is a subset of the web. By analyzing the relationship of your bookmarked sites to each other in the graph formed by the web, it should be possible to cluster bookmarks of related sites together. You can imagine a number of simple clustering techniques: a cluster could be defined as a  set of sites that are all directly linked to each other, or are at most n links away from each other in the web, etc. This project will be to write a program in a language of your choosing that reads in a bookmark file and a file describing the subset of the web containing the bookmarks. The program will then organize the bookmarks into folders using clustering. You get to decide how to cluster on your own, and are encouraged to try multiple methods if you have time. I can provide code to generate the "subset web file" and will give you a description of that file format along with technical references to help with the programming. Please note  -- I do not expect this program will do a good job of organizing bookmarks, but it should still be interesting to see how well it performs. For all I know, someone has already done this and created a Firefox add-on using this idea. Don't use that code -- write your own.


  8. Implementing the Smith-Waterman sequence alignment algorithm [bioinformatics]

    In compuational biology, aligning two nucleotide or amino sequences is done to find the most similar subsequences. These subsequences identify important functional or evolutionary realtionships (e.g. they could identify the gene responsible for a given physical feature of an organism). The Smith-Waterman algorithm is one method for finding these subsequences. In this project you will implement this algorithm in C++ and analyze it's efficiency by doing some performance testing. Again, you can probably find this code on the web. Don't use that code -- write your own. We may also try writing a multi-threaded version of the algorithm that will accelerate it on mult-core processors, depending on your interests.

    References: [Smith-Waterman Wikipedia article][Sequence alignment Wikipedia article]

  9. Monte Carlo option pricing [computational finance]

    "Options" are essentially contracts that convey the right, but not the obligation, to engage in a financial transaction in the future. For example, you can buy a "call option" giving you the right to buy a specified stock at a set "strike price" some time on or before the expiration of the option. If the price of the stock goes higher than the strike price, you could exercise the option and purchase the stock. In that way, you would earn a profit (assuming the cost of the option wasn't more than the money made on the stock transaction). Being able to determine the value of an option has been an area of great interest in the financial community for decades. One method for valuing options is essentially to run many simulatations of the results of buying the option using different random values to describe market conditions, etc. In this way, one can determine the expected payoff of buying the option. Computational methods that use repeated random sampling to calculate somthing are called Monte Carlo methods. In this project, you will implement some Monte Carlo option pricing algorithms in C++.  The code to do this at a basic level is very simple, so we'll probably implement multiple versions (e.g. evaluating European and American options, computing derivatives of option prices) and do a performance analaysis of the code using simple profiling tools (e.g. gprof).

    References:[Monte Carlo methods in finance Wikipedia article][Option pricing Wikipedia article]

     
  10. Graph algorithms on a GPU [parallel computing, graph algorithms]

    A lot of recent research in computing has focussed on using graphics cards for general purpose computation. Modern graphics cards are essentially parallel computers that are programmable but have significantl restrictions. To be efficiently implemented on a GPU, an algorithm must be parallel and able to run in a "SIMD" (single-instruction multiple data) formulation. In this project you explore the possibility of implementing graph algorithms on a GPU. Breadth-first search, shortest path algorithms, and graph layout algorithms have all been successfully implemented by other researchers. You can implement some of these algorithms or try to implement something of your own choosing. You will be given access to a Linux machine with and NVIDIA 8800 GPU. Programming will be done in CUDA, NVIDIA's GPU programming language (it greatly resembles C in it's syntax).  This is probably the most time-consuming project listed here, as it involves learning a new language and parallel programming concepts. But, it should be interesting and a potentially good resume-builder. Parallel programming is widely regarded as a key technology for the future; the computing industry seems to be betting that "many-core" machines will become ubiquitous and there is growing interest in figuring out how best to exploit such hardware.

    References: [Accelerating Large Graph Algorithms on the GPU using CUDA(PDF)][CUDA Documentation]

  11. Implement a game-playing AI [artificial intelligence, state-space search]

    Implementing a game-playing AI is another possible project. Typically this means selecting a game and then figuring what the state space and some good rules for ranking the states would be. At runtime, the AI searches the state space starting from the current board configuration and determines what move would be most favoravble for it. A game like Othello or Reversi would a good choice (things like tic-tac-toe and checkers are too simple), but any non-trivial turn-based strategy game would definitely be acceptable.  We can find appropriate references for the game you are interested in when we discuss your proposal. 

  12. Submit an entry to the IEEE Visualization Design Contest [scientific visualization]

    This is perhaps the most difficult project on this list, but it also offers a great chance to do something very impressive and be recognized for it.  Every year the IEEE Visualization conference has a design contest. They provide the data and contestants create visualizations of the data that scientists can use to answer questions they have. This year's contest involves visualizing data from a simulation of star formation in the early history of the universe. If you want to consider working on this, you should have strong technical skills or a willingness to spend a lot of time on the project. Both would probably be good. This would be a particularly good project for a group. It would be easiest to use an existing visualization package such as the Visualization Toolkit (VTK) or VISIT. Writing a program from scratch using OpenGL and FLTK or something similar is an option, but I would not suggest doing that unless you already have some familiarity with OpenGL.  You can read about the contest here and decide if you want to attempt it. If you would like to discuss the specific requirements for this project in more depth before agreeing to do it, just send me mail.

    References: [IEEE Vis contest][VTK][Visit][OpenGL][FLTK]

     

Proposal Requirements

If you select one of the suggested projects, simply send me an e-mail saying you want to do that project. If someone else has already selected that project, you can work as a team if you both agree to do so.

If you wish to propose your own project, you are encouraged to do so. This can be implementing an algorithm or data structure or something completely different. One different thing could be doing a survey: find out which websites CS students read the most, which "technical" sites do they read the most, how do they read them (e.g. laptop, phone, desktop,...) or try to determine if people in CS 225 using IDEs write programs with fewer errors than people that just use editors, etc. If you propose a survey project,  we'll need to work with the department to get apporval and also make sure you have enough participation to make the project worthwhile. In all cases, we reserve the right to reject a project proposal if it is too simple, or infeasible for some reason.

If you choose to propose a project, e-mail the following to me:
Mid-term Report Requirements

For the mid-term report, e-mail me the following:
The questions can be technical (e.g. how to implement something, how an algorithm works, etc.), practical (e.g. how can I get something to compile on Windows), or suggestions for how the project could be improved (e.g. improvements to an algorithm, etc).  We'll attempt to answer the questions together.

Final Project Handin

This can be done via e-mail. In the case of a code project you should e-mail the following
The basic idea behind the automated test case is that you will create a short test program that exercises your code and checks results against a known correct result. This "correct result" can be hard-coded into the program or can be in file whose contents are compared to the code output.

For non-code projects (e.g. survey), you will be expected to hand in a report of five or more pages in either a PDF or Word file. The exact format of the report will be decided when we discuss the project proposal.

You are free to hand-in the project before the due date and to schedule an in-person demo if you feel a demo would better help us understand the project.