What would it take for a computer to read, and understand, a newspaper? In order to construct a representation of the meaning of a text, it is necessary to parse its sentences, i.e. identify their grammatical structures. Although compilers contain parsers too, natural languages are both more complex and more ambiguous than programming languages. Computational linguists have developed a number of expressive grammar formalisms that are intended to capture these additional complexities, as well as parsing models that use machine-learning techniques to find the most likely structure of a sentence. In order to achieve wide coverage (i.e. be able to deal with actual newspaper text) and high accuracy, these models require significant amounts of labeled training data -- so-called treebanks, or corpora of sentences that were annotated with the correct analysis. In order to create a parser for a specific formalism, it is thus often necessary to first translate a treebank into the desired formalism. This course will give an overview over the most commonly used formalisms in natural language processing and current research on grammar extraction and wide-coverage parsing.

Prerequisites: basic exposure to AI and/or machine learning, or an intro to natural language processing.

Assessment: