Where academic tradition
meets the exciting future

Treebanking Finnish

Katri Haverinen, Timo Viljanen, Veronika Laippala, Samuel Kohonen, Filip Ginter, Tapio Salakoski, Treebanking Finnish. In: Markus Dickinson, Kaili Müürisep, Marco Passarotti (Eds.), Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories, 79-90, Northern European Association fo Language Technology (NEALT), 2010.


In this paper, we present the current version of a
syntactically annotated corpus for Finnish, the Turku Dependency
Treebank (TDT). This is the first publicly available Finnish treebank
of practical size, currently consisting of 4,838 sentences (66,042
tokens). The treebank includes both morphological and syntactic
analyses, the morphological information being produced using the FinCG
analyzer, and the syntax being human-annotated in the Stanford
Dependency scheme. Additionally, we conduct an experiment in automatic
pre-annotation and find the overall effect positive. In particular,
pre-annotation may be tremendously helpful in terms of both speed and
accuracy for an annotator still in training, although for more
experienced annotators such obvious benefit was not observed.

In addition to the treebank itself, we have constructed a custom
annotation software, as well as a web-based interface with advanced
search functions. Both the treebank, including the full edit-history
with exact timings, and its associated software are publicly available
under an open license at the address http://bionlp.utu.fi.


Full publication in PDF-format

BibTeX entry:

  title = {Treebanking Finnish},
  booktitle = {Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories},
  author = {Haverinen, Katri and Viljanen, Timo and Laippala, Veronika and Kohonen, Samuel and Ginter, Filip and Salakoski, Tapio},
  editor = {Dickinson, Markus and Müürisep, Kaili and Passarotti, Marco},
  publisher = {Northern European Association fo Language Technology (NEALT)},
  pages = {79-90},
  year = {2010},
  keywords = {treebank, Finnish, Stanford Dependency, annotation},

Belongs to TUCS Research Unit(s): Turku BioNLP Group

Edit publication