Note: these lexicons are extracted from an old version of CCGbank.
Julia Hockenmaier and Mark Steedman. Acquiring
Compact Lexicalized Grammars from a Cleaner Treebank
They will
soon be replaced by ones extraced from the final version of CCGbank (which you
can get from the LDC).
Here you can download lexicons that we have extracted from CCGbank,
our translation of the Penn Treebank
to a corpus of Combinatory Categorial
Grammar derivations. You are free to use these for research purposes;
however, we would appreciate
it if you could acknowledge us by citing the following reference:
Proceedings of Third International Conference on Language Resources
and Evaluation, Las Palmas, 2002..ps
Data format
Each entry has five columns:
mail N 0.00013495 0.508772 29
mail N/N 0.000168503 0.438596 25
mail S[b]\NP 0.000367647 0.0175439 1
mail ((S[b]\NP)/NP)/NP 0.00359712 0.0175439 1
mail ((S[dcl]\NP)/PP)/NP 0.000619963 0.0175439 1
The probabilities are simple relative frequency estimates obtained from the
observed frequency counts.
In the .tags files, we have appended the POS-tag to each word:
mail|NN N 0.000134947 0.537037 29
mail|NN N/N 0.000168487 0.462963 25
mail|VB S[b]\NP 0.000367647 0.5 1
mail|VB ((S[b]\NP)/NP)/NP 0.00359712 0.5 1
mail|VBP ((S[dcl]\NP)/PP)/NP 0.000619963 1 1
The files