What would it take for a computer to read, and understand, a newspaper? In
order to construct a representation of the meaning of a text, it is necessary to
parse its sentences, i.e. identify their grammatical structures. Although compilers
contain parsers too, natural languages are both more complex and more
ambiguous than programming languages. Computational linguists have
developed a number of expressive grammar formalisms that are intended to
capture these additional complexities, as well as parsing models that use
machine-learning techniques to find the most likely structure of a sentence. In
order to achieve wide coverage (i.e. be able to deal with actual newspaper text)
and high accuracy, these models require significant amounts of labeled training
data -- so-called treebanks, or corpora of sentences that were annotated with
the correct analysis. In order to create a parser for a specific formalism, it
is thus often necessary to first translate a treebank into the desired formalism.
This course will give an overview over the most commonly used formalisms in
natural language processing and current research on grammar extraction and
wide-coverage parsing.
Prerequisites: basic exposure to AI and/or machine learning, or an intro to
natural language processing.
Assessment:
- 25% Homework. This consists of 25 (ungraded) reading assignments (textbook chapters or original research papers) that you need to get from the syllabus page. Each assignment has an associated list of questions that you have to answer and bring in to class. These questions test whether you have understood and can assess the paper. I will collect your answers. Each completed questionnaire contributes 1% to your grade.
- 25% Practical. This will be a practical assignment that will mostly test how well you understand and can use exisiting tools and resources.
- 50% Term paper. This will be an in-depth literature review of one aspect of this course. If you are a grad student, you will be expected to go beyon what we covered in class.