CS 421: Programming Languages
Unit Project

Objectives

[top]

The primary objective of the unit project is to have the students explore a programming language concept or problem in greater depth than is possible with the standard lecture and machine problems.

Overview

[top]

Those of you who are signed up for 4 hours of credit for CS 421 will need to complete a unit project. The deadlines are listed to the right of this page. You are encouraged to work in groups of two to three people, although you may work alone. You should expect about two MP's worth of work for each person involved in the project. You may do any of the following:
  • pick a project topic from the list below

  • find a research paper published in ACM or IEEE that details a programming language concept, and implement it.

  • develop a project idea of your own, based on previous experience or research interests.

Project Proposal

[top]

Write a one page proposal that details the scope of your project and an implementation schedule. If you are basing your project on a paper, cite the paper and provide a URL to a PDF or postscript version. Be sure to include a description of the target language you want to work on, the concept you want to implement, and an outline of how you would like to to demonstrate your project at the end of the semester. Also, give a clear outline of each individual's responsibility in the project.

Your proposals are due by June 23. You should submit your proposal electronically, in plain text form contained in the body of an e-mail message (no attachments please). It should be not more than 2 pages long, and is expected to be around a page. Feel free to ask for an appointment to discuss the project proposal with me before the proposals are due.

The LLVM Instruction Set and Compiler Infrastructure

[top]

Some of the projects below are based on LLVM, a compiler framework that has been developed in Prof. Adve's group over the past several years and has been distributed in open-source form since Oct. 2003. It has begun to attract a number of users and developers in the open-source and research communities outside UIUC. LLVM is based on a language-independent, typed assembly language that serves both as the compiler's internal representation and as the persistent "bytecode" representation for programs. LLVM is designed to support effective analysis and optimization of programs from arbitrary source-level languages (but does not aim to support language-specific techniques). They currently have front-ends for C and C++, based on GCC 3.4. The LLVM compiler system supports sophisticated link-time optimizations, just-in-time or static code generation (for x86 or Sparc), and they are developing a novel runtime optimization system.

Technically, the LLVM instruction set uses RISC-like instructions, but with an infinite single-assignment register set, and a rich language-independent type system. It is a very small, elegant instruction set and can be understood in an afternoon.

There is extensive help available for LLVM developers. This includes a rich set of detailed documentation, an active mailing list (llvmdev@cs.uiuc.edu) for LLVM-related questions, and (for people at UIUC) anyone in Prof. Adve's group. See the LLVM Web pages for information, documentation, and how to install the code.

Project Ideas

[top]

Here are some potential project ideas. All these suggestions either involve writing standalone projects using OCAML or in the LLVM framework using C++.

COOL front-end to LLVM
Write a translator from the COOL Abstract Syntax Tree to LLVM. Some of most interesting aspect will be implementing classes and run-time type inquiries (case) using the low-level types available in LLVM. A 3-person team should also implement Garbage Collection for COOL, which requires the compiler to generate information about pointers for our existing GC library.
OCAML front-end to LLVM
Write a translator for the functional subset of OCAML subset to LLVM. A relatively easy but limited approach would be to use the simple lexer, parser, and AST we use for MP4. A more ambitious approach would be to retarget parts of the complete OCAML parser from INRIA. A key interesting aspect will be implementing closures efficiently in LLVM. A 3-person team should to incorporate a public-domain Garbage Collection library.
Other language front-ends for LLVM
Choose your favorite research or production language (other than C or C++), and write a front-end to LLVM for it.
Prolog implementation in OCAML
Our course could use an implementation of Prolog in OCaml. In a prior semester, one group of students made a very good start by writing a working but partial implementation of the language, using ocamllex and ocamlyacc to parse Prolog programs (with several restrictions). Use this implementation (if I can find it!) as a starting point to provide a more sophisticated Prolog implementation. Some of the key improvements needed include more of the language syntax, lists, equality, negation, and better user interaction.
Adding polymorphism to the LLVM type system
LLVM uses a simple, imperative type system with primitive types (ints, floats, bool, and void) and exactly four derived types: pointers, structures, arrays, and functions. A key missing feature is polymorphism, which forces us to use char* or to duplicate code in many places (as you have to do in C), even for code from polymorphic languages like OCaml. Write a simple type-inference engine for LLVM that infers polymorphic types for LLVM instructions and functions. A mechanism for expressing such a type (before inference) already exists, so the work should mainly require:
  • Write formal type rules for the LLVM instructions, using polymorphic types wherever it makes sense (there are only 28 instructions and most are trivial, like arithmetic, logical, and control-flow).
  • Write a simple unification engine in C++, similar to the one in MP3.
  • Write a type-inference algorithm for the LLVM type rules and internal representation.
    Garbage Collection
    Write a generational mark-and-sweep garbage collector for LLVM. There is already a well-defined GC interface between LLVM-generated code and GC libraries. There is also a simple working GC algorithm based on copying between two GC spaces, which will show you how to use the interface and give you the basic structure of the implementation. Because these exist, most of your time would be spent on the interesting part: learning and implementing a more sophisticated GC algorithm.
    Other Ideas
    There are many other potential project ideas -- these are just several that have been developed over the last several years as the course has been offered.
  • If you have a strong interest in a particular area, one possibility is to use OCaml as a framework for developing a domain specific language. You should develop the syntax and semantics of the language, providing a lexer, parser, and interpreter. It should be non-trivial and well-focused on the problem domain.
  • Another possibility is to use an OCaml definition of a language as a springboard for problems in program analysis. The goals here would be to formulate a simple language or use an existing language, and then drive various analyses off of this. For instance, if you are interested in concurrency, you could verify properties of CCS (Milner's Calculus of Communicating Systems) programs.
  • Also, look in papers and follow your interests. The project should be something fun and, if you are doing research, hopefully something you can tie back in. You cannot just pick as a project existing research, though -- it should be something new.
     
  • Project Reports

    [top]

    The due dates for this project are listed at the bottom of this page.

    Half-way into the project, you should submit a very short (1-2 page) status report by e-mail describing your progress on your implementation. This report should be as specific as possible about what has been implemented and what has not.

    At the end of the project, you need to turn in two things:

  • a tar file containing your source code, tests, and Makefile; and
  • a project report, as a PDF file, containing the information below. Later, we will arrange a demo session where each group will give a brief presentation and demo of their code.

    Your final report should have 4 sections:

    Overview
    Describe the motivation, goals, and broad accomplishments of your project in general terms.
    Implementation
    A brief description of the important aspects of your implementation, in terms of (a) the major tasks or capabilities of your code; (b) components of the code; (d) status of the project -- what works well, what works partially, and and what is not implemented at all. You MUST compare these with your original proposed goals in the project proposal.
    Tests
    Coming up with appropriate tests is one of the most important part of good software development. Your tests should include unit tests, feature tests, and larger test codes. Give a short description of the tests used, performance results if appropriate (e.g., memory consumption for garbage collection) etc. Be sure to explain how these tests exercise the concept(s) you've implemented.
    Listing
    A listing of your code. The code should be documented thoroughly and clearly. You don't need to comment every single line or even every single function. Instead, focus on the central functions and data structures in your implementation, and document them well.

    Project Demo

    [top]

    We will have demos on the afternoon of Friday, August 4 (contact me if you have an exam scheduled at that time). During this time, your group will make a brief presentation of your work and then give a demo of your project. NOTE: I am currently trying to pick a different time for this, as this is when I2CS students will be taking their final exams. I'll have a new time posted soon!

  • Milestone Dates
    Proposal 6/23/2006
    Status Report TBA
    Final Report TBA
    Demos TBA

    Contents
    Objectives
    Overview
    Project Proposal
    The LLVM Instruction Set and Compiler Infrastructure
    Project Ideas
    Project Reports
    Project Demo

    Last updated Th Jun 15 10:55:00 CDT 2006