CS 421: Programming Languages and Compilers
Unit Project

Objectives

[top]

The primary objective of the unit project is to have the students explore a programming language concept or problem in greater depth than is possible with the standard lecture and machine problems.

Overview

[top]

Those of you who are signed up for 1 unit of credit for CS 421 will need to complete a unit project. The deadlines are listed to the right of this page. You must work in groups of two to three people. You should expect about two MP's worth of work for each person involved in the project. You may do any of the following:
  • Pick a project topic from the list below

  • Find a research paper published in ACM or IEEE that details a programming language concept, and implement it.

  • Pick two published research papers on programming language topics (some may be found here), write a two page summary of each, and prepare a twenty minute presentation of each.
  • Develop a project idea of your own, based on previous experience or research interests.

Project Proposal

[top]

Write a one page proposal that details the scope of your project and an implementation schedule. If you are basing your project on a paper, cite the paper and provide a URL to a PDF or postscript version. Be sure to include a description of the target language you want to work on, the concept you want to implement, and an outline of how you would like to to demonstrate your project at the end of the semester. Also, give a clear outline of each individual's responsibility in the project.

Your proposals are due by Oct. 15. You should submit your proposal electronically, in plain text form contained in the body of an e-mail message (no attachments please). It should be not more than 2 pages long. Feel free to ask for an appointment to discuss the project proposal with me before the proposals are due.

The LLVM Instruction Set and Compiler Infrastructure

[top]

Some of the projects below are based on LLVM, a compiler framework that has been developed in my group over the past 3 years and has been distributed in open-source form since Oct. 2003. It has begun to attract a number of users and developers in the open-source and research communities outside UIUC. LLVM is based on a language-independent, typed assembly language that serves both as the compiler's internal representation and as the persistent "bytecode" representation for programs. LLVM is designed to support effective analysis and optimization of programs from arbitrary source-level languages (but does not aim to support language-specific techniques). We currently have front-ends for C and C++, based on GCC 3.4. The LLVM compiler system supports sophisticated link-time optimizations, just-in-time or static code generation (for x86 or Sparc), and we are developing a novel runtime optimization system.

Technically, the LLVM instruction set uses RISC-like instructions, but with an infinite single-assignment register set, and a rich language-independent type system. It is a very small, elegant instruction set and can be understood in an afternoon.

There is extensive help available for LLVM developers. This includes a rich set of detailed documentation, an active mailing list (llvmdev@cs.uiuc.edu) for LLVM-related questions, and (for people at UIUC) anyone in my group. See the LLVM Web pages for information, documentation, and how to install the code. See the LLVM Web pages for information, documentation, and how to install the code.

Project Ideas

[top]

Here are some potential project ideas. All these suggestions either involve writing standalone projects using OCAML or in the LLVM framework using C++.

COOL front-end to LLVM
Write a translator from the COOL Abstract Syntax Tree to LLVM. Some of most interesting aspect will be implementing classes and run-time type inquiries (case) using the low-level types available in LLVM. A 3-person team should also implement Garbage Collection for COOL, which requires the compiler to generate information about pointers for our existing GC library.
OCAML front-end to LLVM
Write a translator for the functional subset of OCAML subset to LLVM. A relatively easy but limited approach would be to use the simple lexer, parser, and AST we use for MP4. A more ambitious approach would be to retarget parts of the complete OCAML parser from INRIA. A key interesting aspect will be implementing closures efficiently in LLVM. A 3-person team should to incorporate a public-domain Garbage Collection library.
Other language front-ends for LLVM
Choose your favorite research or production language (other than C or C++), and write a front-end to LLVM for it.
Prolog implementation in OCAML
Our course could use an implementation of Prolog in OCaml. Last year, one group of students made a very good start by writing a working but partial implementation of the language, using ocamllex and ocamlyacc to parse Prolog programs (with several restrictions). Use this implementation as a starting point to provide a more sophisticated Prolog implementation. Some of the key improvements needed include more of the language syntax, lists, equality, negation, and better user interaction.
Adding polymorphism to the LLVM type system
LLVM uses a simple, imperative type system with primitive types (ints, floats, bool, and void) and exactly four derived types: pointers, structures, arrays, and functions. A key missing feature is polymorphism, which forces us to use char* or to duplicate code in many places (as you have to do in C), even for code from polymorphic languages like OCaml. Write a simple type-inference engine for LLVM that infers polymorphic types for LLVM instructions and functions. A mechanism for expressing such a type (before inference) already exists, so the work should mainly require:
  • Write formal type rules for the LLVM instructions, using polymorphic types wherever it makes sense (there are only 28 instructions and most are trivial, like arithmetic, logical, and control-flow).
  • Write a simple unification engine in C++, similar to the one in MP3.
  • Write a type-inference algorithm for the LLVM type rules and internal representation.
    Garbage Collection
    Write a generational mark-and-sweep garbage collector for LLVM. There is already a well-defined GC interface between LLVM-generated code and GC libraries. There is also a simple working GC algorithm based on copying between two GC spaces, which will show you how to use the interface and give you the basic structure of the implementation. Because these exist, most of your time would be spent on the interesting part: learning and implementing a more sophisticated GC algorithm.
  • Project Reports

    [top]

    The due dates for this project are listed at the bottom of this page.

    Half-way into the project, you should submit a very short (1-2 page) status report by e-mail describing your progress on your implementation. This report should be as specific as possible about what has been implemented and what has not.

    At the end of the project, you need to turn in two things:

  • a tar file containing your source code, tests, and Makefile; and
  • a project report, as a PDF file, containing the information below. Later, we will arrange a demo session where each group will give a brief presentation and demo of their code.

    Your final report should have 4 sections:

    Overview
    Describe the motivation, goals, and broad accomplishments of your project in general terms.
    Implementation
    A brief description of the important aspects of your implementation, in terms of (a) the major tasks or capabilities of your code; (b) components of the code; (d) status of the project -- what works well, what works partially, and and what is not implemented at all. You MUST compare these with your original proposed goals in the project proposal.
    Tests
    Coming up with appropriate tests is one of the most important part of good software development. Your tests should include unit tests, feature tests, and larger test codes. Give a short description of the tests used, performance results if appropriate (e.g., memory consumption for garbage collection) etc. Be sure to explain how these tests exercise the concept(s) you've implemented.
    Listing
    A listing of your code. The code should be documented thoroughly and clearly. You don't need to comment every single line or even every single function. Instead, focus on the central functions and data structures in your implementation, and document them well.

    Project Demo

    [top]

    We will have demos on the afternoon of Monday, Dec. 13 (contact me if you have an exam scheduled at that time). During this time, your group will make a brief presentation of your work and then give a demo of your project.

  • Milestone Dates
    Proposal TBA
    Status Report TBA
    Final Report TBA
    Demos TBA

    Contents
    Objectives
    Overview
    Project Proposal
    The LLVM Instruction Set and Compiler Infrastructure
    Project Ideas
    Project Reports
    Project Demo

    Last updated Sat Oct 9 17:44:58 CDT 2004