| Unit Project |
|
The primary objective of the unit project is to have the students explore
a programming language concept or problem in greater depth than is possible
with the standard lecture and machine problems.
Those of you who are signed up for 4 hours of credit for CS 421
will need to complete a unit project. The deadlines are listed to
the right of this page. You are encouraged to work in groups of two to
three people, although you may work alone. You should expect about two MP's worth of work for
each person involved in the project. You may do any of the following:
- pick a project topic from the list below
- find a research paper published in ACM or IEEE that details a
programming language concept, and implement it.
- develop a project idea of your own, based on previous experience or
research interests.
Write a one page proposal that details the scope of your project and an
implementation schedule. If you are basing your project on a paper, cite
the paper and provide a URL to a PDF or postscript version. Be sure to
include a description of the target language you want to work on, the
concept you want to implement, and an outline of how you would like to to
demonstrate your project at the end of the semester. Also, give a clear
outline of each individual's responsibility in the project.
Your proposals are due by June 23. You should submit your proposal
electronically, in plain text form contained
in the body of an e-mail message (no attachments please). It should be not
more than 2 pages long, and is expected to be around a page. Feel free to ask for an appointment to discuss the
project proposal with me before the proposals are due.
Some of the projects below are based on
LLVM, a compiler framework that
has been developed in Prof. Adve's group over the past several years and has been
distributed in open-source form since Oct. 2003. It has begun to attract
a number of users and developers in the open-source and research
communities outside UIUC. LLVM is based on a
language-independent, typed assembly language that serves
both as the compiler's internal representation and as the
persistent "bytecode" representation for programs.
LLVM is designed to support effective analysis and optimization of
programs from arbitrary source-level languages (but does not aim to
support language-specific techniques).
They currently have front-ends for C and C++, based on GCC 3.4.
The LLVM compiler system supports sophisticated
link-time optimizations, just-in-time or static code generation (for x86
or Sparc), and they are developing a novel runtime optimization system.
Technically, the LLVM instruction set uses RISC-like instructions, but
with an infinite single-assignment register set, and a rich
language-independent type system. It is a very small, elegant instruction
set and can be understood in an afternoon.
There is extensive help available for LLVM developers. This includes a
rich set of detailed documentation, an active mailing list
(llvmdev@cs.uiuc.edu) for LLVM-related questions,
and (for people at UIUC) anyone in Prof. Adve's group.
See the LLVM Web pages
for information, documentation, and how to install the code.
Here are some potential project ideas. All these suggestions either involve
writing standalone projects using OCAML or in the LLVM framework using C++.
- COOL front-end to LLVM
-
Write a translator from the COOL
Abstract Syntax Tree to LLVM. Some of most interesting aspect will be
implementing classes and run-time type inquiries (case) using
the low-level types available in LLVM. A 3-person team should also
implement Garbage Collection for COOL, which requires the compiler to
generate information about pointers for our existing GC library.
- OCAML front-end to LLVM
-
Write a translator for the
functional subset of OCAML subset to LLVM. A relatively easy but
limited approach would be to use the simple lexer, parser, and AST we
use for MP4. A more ambitious approach would be to retarget parts of
the complete OCAML parser from INRIA. A key interesting aspect will be
implementing closures efficiently in LLVM. A 3-person team should
to incorporate a public-domain Garbage Collection library.
- Other language front-ends for LLVM
-
Choose your favorite
research or production language (other than C or C++), and write a
front-end to LLVM for it.
- Prolog implementation in OCAML
-
Our course could use an
implementation of Prolog in OCaml. In a prior semester, one group of students
made a very good start by writing a working but partial implementation
of the language, using ocamllex and ocamlyacc to parse Prolog
programs (with several restrictions). Use this implementation (if I can find it!) as a
starting point to provide a more sophisticated Prolog implementation.
Some of the key improvements needed include more of the language syntax,
lists, equality, negation, and better user interaction.
- Adding polymorphism to the LLVM type system
-
LLVM uses a simple, imperative type system with primitive types (ints,
floats, bool, and void) and exactly four derived types: pointers,
structures, arrays, and functions. A key missing feature is polymorphism,
which forces us to use char* or to duplicate code in many places (as you
have to do in C), even for code from polymorphic languages like OCaml.
Write a simple type-inference engine for LLVM that infers polymorphic
types for LLVM instructions and functions. A mechanism for expressing
such a type (before inference) already exists, so the work should mainly
require:
- Write formal type rules for the LLVM instructions, using
polymorphic types wherever it makes sense (there are only 28
instructions and most are trivial, like arithmetic, logical,
and control-flow).
- Write a simple unification engine in C++, similar to the one in MP3.
- Write a type-inference algorithm for the LLVM type rules and
internal representation.
- Garbage Collection
-
Write a generational mark-and-sweep garbage collector for LLVM. There
is already a well-defined GC interface between LLVM-generated code and
GC libraries. There is also a simple working GC algorithm based on
copying between two GC spaces, which will show you how to use the
interface and give you the basic structure of the implementation.
Because these exist, most of your time would be spent on the interesting
part: learning and implementing a more sophisticated GC algorithm.
- Other Ideas
- There are many other potential project ideas -- these are just several that have been developed over the last several years as the course has been offered.
- If you have a strong interest in a particular area, one possibility is to use OCaml as a framework for developing a domain specific language. You should develop the syntax and semantics of the language, providing a lexer, parser, and interpreter. It should be non-trivial and well-focused on the problem domain.
-
-
- Another possibility is to use an OCaml definition of a language as a springboard for problems in program analysis. The goals here would be to formulate a simple language or use an existing language, and then drive various analyses off of this. For instance, if you are interested in concurrency, you could verify properties of CCS (Milner's Calculus of Communicating Systems) programs.
-
-
- Also, look in papers and follow your interests. The project should be something fun and, if you are doing research, hopefully something you can tie back in. You cannot just pick as a project existing research, though -- it should be something new.
-
The due dates for this project are listed at the bottom of this page.
Half-way into the project, you should submit a very short (1-2 page) status
report by e-mail describing your progress on your implementation. This
report should be as specific as possible about what has been implemented and
what has not.
At the end of the project, you need to turn in two things:
- a tar file containing your source code, tests, and Makefile; and
- a project report, as a PDF file, containing the information below.
Later, we will arrange a demo session where each group will give a brief
presentation and demo of their code.
Your final report should have 4 sections:
- Overview
- Describe the motivation, goals, and broad accomplishments of your
project in general terms.
- Implementation
- A brief description of the important aspects of your implementation,
in terms of
(a) the major tasks or capabilities of your code;
(b) components of the code;
(d) status of the project -- what works well, what works partially, and
and what is not implemented at all. You MUST compare these with
your original proposed goals in the project proposal.
- Tests
-
Coming up with appropriate tests is one of the most important part of good
software development. Your tests should include unit tests, feature
tests, and larger test codes. Give a short description of the tests
used, performance results if appropriate (e.g., memory consumption for
garbage collection) etc. Be sure to explain how these tests exercise the
concept(s) you've implemented.
- Listing
- A listing of your code. The code should be documented thoroughly and
clearly. You don't need to comment every single line or even every single
function. Instead, focus on the central functions and data structures in
your implementation, and document them well.
We will have demos on the afternoon of Friday, August 4 (contact me if you
have an exam scheduled at that time). During this time, your group will
make a brief presentation of your work and then give a demo of your project. NOTE: I am currently trying to pick a different time for this, as this is when I2CS students will be taking their final exams. I'll have a new time posted soon!
|
|
| Milestone Dates |
| Proposal |
6/23/2006 |
| Status Report |
TBA |
| Final Report |
TBA |
| Demos |
TBA |
|
|