CS 296 Honors Projects
Course Instructor: Eric
Shaffer (shaffer1@uiuc.edu)
CS296 is a 1-hour honors course that CS225 students can take in
addition to the regular CS225 course. That is, you should NOT drop
CS225 if you want to enroll in CS296; instead, you should enroll in
both CS225 and CS296. You will receive a separate grade for each
course, and thus your CS296 work cannot affect the grade you get in
CS225.
There are different sections for CS296; make sure you are enrolled in
section &25. If
you do want to enroll in the course, you should try and do this by the
undergrad class-add date, which is about two weeks after the semester
starts. If you want to try and add the course after that date, you will
need to go to your college office and talk to them, and there's still
no guarantee that they will let you sign up late.
You can work alone or in groups. Group projects will be expected to be
more elaborate than individual projects. You can propose your own
project or you can select a project from the list provided on this
page. Any project proposal will need to be approved by the course
instructor. You are free to work very independently on this
project; you are only required to meet with the instructor to propose a
project and give a mid-term report. "Handing in" the project will be
done via e-mail.
Using a code management system, specifically the subversion
system, is strongly reccommended for group projects. It could also be
useful for individual projects as it will provide you with a way to
backup your project easily. If you are unfamiliar with subversion, you
can read about it here.
If you wish to use subversion to keep track of your code,
e-mail me and we can have the department set up a repository for
you.
You should feel free to e-mail the instructor for help at any time. The
goal of this project is for you to accomplish something challenging and
rewarding and we will assist as much as is feasible to make sure that
happens. The final project handin is the only thing that will be graded.
Important dates:
2/11/2008
--- Project proposal due. E-mail the proposal by this date and we'll
schedule a face-to-face meeting to discuss it.
3/22/2008 --- Midterm
report due. E-mail an update on your progress and schedule a meeting.
5/9/2008
--- Project due
Suggested
Projects
- B-Tree [out-of-core data
structures]
The B-Tree is a data structure used to organize data on a hard disk to
allow efficient insertions, deletions, and searches. This is very
useful, as a lot of data sets are too large to fit in the
memory of commodity computers. The utility of the B-Tree is attested to
by its use in a multiplicity of databases and filesystems. The goal of
this project is to implement the B-Tree as a C++ template class. We
will choose an interesting and very large data set (maybe biological
data or census data) to test your implementation.
References: [Wikipedia
B-Tree article]
- External sorting
[out-of-core data structures]
Many applications require the manipulation of data that is too large
for the memory of conventional computers. Sorting is, of course, one of
the fundamental algorithmic tasks in the field of computing and the
ability to sort very large data sets can. The goal of this project will
be to implement an external sorting algorithm in C++. The exact
algorithm used can be chosen according your interests; multiway
mergesort or radix sorting would both be good candidates. I would like
the end result of this project to be a production-quality code that can
be made available on the web. Free and easy to use external
sorting codes are not easily found and I think putting one up on the
web would generate a few hits and possibly be a nice line on your
resume.
References: [Wikipedia
External Sorting article]
- Three-dimensional kd-tree
[geometric data structures]
The kd-tree is a geometric data structure used in a number of computer
graphics applications (ray-tracing and photon mapping are two
examples). It is also useful for k-nearest
neighbor queries and that will be the application in which we
are interested. In this project, you will implement a 3D kd-tree as a C++
template class. The operations the tree
should support will be insertion of data using a 3D point as a key,
clear (delete everything), and finding the k-nearest neighbors of a
query point. We will provide an implementation of 3D points and code to
read and write points to files as well as some test data.
References: [Wikipedia
kd-tree article]
- Computing the distance between
two 3D surfaces [geometric data structures]
Representations of 3D surfaces, typically meshes of triangles, are
useful in many computational activities. One important operation on
meshes is calculating the "distance" between two surfaces in an attempt
to understand how similar the meshes are. The goal of this project is
to implement an algorithm to find the maximum of the set of minimum
distances between two triangle meshes. This actually is less confusing
than it sounds -- it's called the Hausdorff
distance and you can find a full description of it by reading the
linked references below. Implementation will be in C++ and we will
provide you with code to read, write and view meshes. For this project,
a simple approximation of the distance by sampling at the vertices of
the meshes will be sufficient. Data structures come in to play in that
you will need to implement a simple spatial grid data structure to
accelerate the computation. If this is a single person project, an STL
multimap can be used instead.
References: [McGill
University webpage on Hausdorff distance]
[MESH:
Measuring Errors between Surfaces using the Hausdorff
distance(PDF)]
- Streaming point set processing
[geometric data sturctures]
Systems for acquiring 3D data (e.g. laser scanners) have become
much more common over the last decade. These systems typically generate
data sets of three-dimensional points. Before being used in computer
graphics applications, these points sets often need to have "noise"
removed; this means smoothing the point set to remove spurious sharp
features that are errors from the scanning process. One method for
doing this is proposed in the paper linked below. The goal of this
project will be to implement a subset of the point set operations
described in the paper (such as estimating normal vectors, smoothing,
and rendering). We will provide some starter code for
points, reading and writing files, and code for a kd-heap. If the
project goes well and you would like to expand it to a research project
we can try to implement operations not described in the paper (e.g.
simplifcation and refinement) and possibly write a paper about it.
References: [Stream-Processing
Point Data(PDF)]
- Triangle mesh repair
[geometric data structures]
The triangle mesh is perhaps the most commonly used data structure for
representing 3D surfaces in computer graphics and scientific computing.
The mesh generation process can result in meshes that are "broken" in
some sense. They can have duplicate triangles, holes,
self-intersections, mid-edge intersections, poorly-shaped triangles and
numerous other issues. In this project, we will create a tool to fix
some of these problems. The implementation will be in C++. I can
provide code to read, write and view meshes as well as some meshes to
serve as test cases.
References:[Automatic
Restoration of Polygon Models (PDF)][Structure
Preserving CAD Model Repair (PDF)]
- Clustering for automatic
bookmark organization [graph algorithms, web computing]
As you may be aware, some people have
made
a great deal of money by using graph theory to analyze the structure of
the web. This project will do that as well (except for the making money
part...). The set of bookmarks from your browser is a subset of the
web. By analyzing the relationship of your bookmarked sites to each
other in the graph formed by the web, it should be possible to cluster
bookmarks of related sites together. You can imagine a number of simple
clustering techniques: a cluster could be defined as a set of
sites that are all directly linked to each other, or are at most n links
away from each other in the web, etc. This project will be to write a
program in a language of your choosing that reads in a bookmark file
and a file describing the subset of the web containing the bookmarks.
The program will then organize the bookmarks into folders using
clustering. You get to decide how to cluster on your own, and are
encouraged to try multiple methods if you have time. I can provide code
to generate the "subset web file" and will give you a description of
that file format along with technical references to help with the
programming. Please note
-- I do not expect this program will do a good job of organizing
bookmarks, but it should still be interesting to see how well it
performs. For all I know, someone has already done this and created a
Firefox add-on using this idea. Don't use that code -- write your own.
- Implementing the Smith-Waterman
sequence alignment algorithm [bioinformatics]
In compuational biology, aligning two nucleotide or amino
sequences is done to find the most similar subsequences. These
subsequences identify important functional or evolutionary
realtionships (e.g. they could identify the gene responsible for a
given physical feature of an organism). The Smith-Waterman algorithm is
one method for finding these subsequences. In this project you will
implement this algorithm in C++ and analyze it's efficiency by doing
some performance testing. Again, you can probably find this code on the
web. Don't use that code -- write your own. We may also
try writing a multi-threaded version of the algorithm that will
accelerate it on mult-core processors, depending on your interests.
References: [Smith-Waterman
Wikipedia article][Sequence
alignment Wikipedia article]
- Monte Carlo option pricing
[computational finance]
"Options" are essentially contracts that convey the right, but not
the obligation, to engage in a financial transaction in the future. For
example, you can buy a "call option" giving
you the right to buy a specified stock at a set "strike price"
some time on or before the expiration of the option. If the price of
the stock goes higher than the strike price, you could exercise the
option and purchase the stock. In that way, you would earn a profit
(assuming the cost of the option wasn't more than the money made on the
stock transaction). Being able to determine the value of an option has
been an area of great interest in the financial community for decades.
One method for valuing options is essentially to run many simulatations
of the results of buying the option using different random values to
describe market conditions, etc. In this way, one can determine the
expected payoff of buying the option. Computational methods that use
repeated random sampling to calculate somthing are called Monte
Carlo methods. In this project, you will implement some Monte Carlo
option pricing algorithms in C++. The code to do this at a basic
level is very simple, so we'll probably implement multiple versions
(e.g. evaluating European and American options, computing derivatives
of option prices) and do a performance analaysis of the code using
simple profiling tools (e.g. gprof).
References:[Monte
Carlo methods in finance Wikipedia article][Option
pricing Wikipedia article]
- Graph algorithms on a GPU
[parallel computing, graph algorithms]
A lot of recent research in computing has focussed on using graphics
cards for general purpose computation. Modern graphics cards are
essentially parallel computers that are programmable but have
significantl restrictions. To be efficiently implemented on a GPU, an
algorithm must be parallel and able to run in a "SIMD"
(single-instruction multiple data) formulation. In this project you
explore the possibility of implementing graph algorithms on a GPU.
Breadth-first search, shortest path algorithms, and graph layout
algorithms have all been successfully implemented by other researchers.
You can implement some of these algorithms or try to implement
something of your own choosing. You will be given access to a Linux
machine with and NVIDIA 8800 GPU. Programming will be done in CUDA,
NVIDIA's GPU programming language (it greatly resembles C in it's
syntax). This is probably the most time-consuming project listed
here, as it involves learning a new language and parallel programming
concepts. But, it should be interesting and a potentially good
resume-builder. Parallel programming is widely regarded as a key
technology for the future; the computing industry seems to be betting
that "many-core" machines will become ubiquitous and there is
growing interest in figuring out how best to exploit such hardware.
References: [Accelerating Large Graph Algorithms on the
GPU using CUDA(PDF)][CUDA
Documentation]
- Implement a game-playing AI [artificial intelligence, state-space
search]
Implementing a game-playing AI is another possible project. Typically
this means selecting a game and then figuring what the state space and
some good rules for ranking the states would be. At runtime, the AI
searches the state space starting from the current board configuration
and determines what move would be most favoravble for it. A game like
Othello or Reversi would a good choice (things like tic-tac-toe and
checkers are too simple), but any non-trivial turn-based strategy game
would definitely be acceptable. We can find appropriate
references for the game you are interested in when we discuss your
proposal.
- Submit an entry to the IEEE
Visualization Design Contest [scientific
visualization]
This is perhaps the most difficult project on this list, but it also
offers a great chance to do something very impressive and be recognized
for it. Every year the IEEE Visualization conference has
a design contest. They provide the data and contestants create
visualizations of the data that scientists can use to answer questions
they have. This year's contest involves visualizing data from a
simulation of star formation in the early history of the universe. If
you want to consider working on this, you should have strong technical
skills or a willingness to spend a lot of time on the project. Both
would probably be good. This would be a particularly good project for a
group. It would be easiest to use an existing visualization package
such as the Visualization Toolkit (VTK) or VISIT. Writing a program
from scratch using OpenGL and FLTK or something similar is an option,
but I would not suggest doing that unless you already have some
familiarity with OpenGL. You can read
about the contest here
and decide if you want to attempt it. If you would like to discuss the
specific requirements for this project in more depth before agreeing to
do it, just send me mail.
References: [IEEE
Vis contest][VTK][Visit][OpenGL][FLTK]
Proposal
Requirements
If you select one of the suggested projects, simply send me an
e-mail saying you want to do that project. If someone else has already
selected that project, you can work as a team if you both agree to do
so.
If you wish to propose your own project, you are encouraged to do so.
This can be implementing an algorithm or data structure or something
completely different. One different thing could be doing a survey: find
out which websites CS students read the most, which "technical"
sites do they read the most, how do they read them (e.g. laptop, phone,
desktop,...) or try to determine if people in CS 225 using IDEs write
programs with fewer errors than people that just use editors, etc. If
you propose a survey project, we'll need to work with the
department to get apporval and also make sure you have enough
participation to make the project worthwhile. In all cases, we reserve
the right to reject a project proposal if it is too simple, or
infeasible for some reason.
If you choose to propose a project, e-mail the following to me:
- A short description of the algorithm or data structure you wish
to implement, or whatever it is you propose to do
- How many people will work on the project
- What operating system your code will build and run on (if you are
writing code)
- Any references (e.g. papers, websites, etc.) I should read to get
familiar with the algorithm/data structure, or whatever
- What you propose to have complete by the mid-term report date
- 3 suggested days & times we could meet to discuss this
in-person; office hours (Tu 10-11) are a good choice
Mid-term
Report Requirements
For the mid-term report, e-mail me the following:
- A short description of what you have completed
- Any questions you have about the project
- 3 suggested days & times we could meet to discuss this
in-person; again, office hours (Tu 10-11) are a good choice
The questions can be technical (e.g. how to implement
something, how an algorithm works, etc.), practical (e.g. how can I get
something to compile on Windows), or suggestions for how the project
could be improved (e.g. improvements to an algorithm, etc). We'll
attempt to answer the questions together.
Final
Project Handin
This can be done via e-mail. In the case of a code project you
should e-mail the following
- Your source code, appropriately commented per the CS 225
commenting standard
- At least two test cases for your code which automatically
generate "PASSED" or "FAILED" messages
- A short text file called "readme.txt" which describes how to
build and run your code and the test cases
The basic idea behind the automated test case is that you will create a
short test program that exercises your code and checks results against
a known correct result. This "correct result" can be hard-coded into
the program or can be in file whose contents are compared to the code
output.
For non-code projects (e.g. survey), you will be expected to hand in a
report of five or more pages in either a PDF or Word file. The exact
format of the report will be decided when we discuss the project
proposal.
You are free to hand-in the project before the due date and to schedule
an in-person demo if you feel a demo would better help us understand
the project.