Furthermore, I illustrate the approaches using an example from Dutch linguistics, two-verb cluster constructions, and estimate precision and recall for this construction on a large automatically annotated treebank of Dutch. I discuss the relative advantages and disadvantages of four approaches to this type of evaluation: manual evaluation of the results, manual evaluation of the text, falling back to simpler annotation and searching for particular instances of the construction. To judge the quality of linguistic evidence in this case, it would be beneficial to estimate annotation quality over all instances of a particular construction. While treebanks are usually evaluated with a general measure over all the data, linguistic studies often focus on a particular construction or a group of structures. This study discusses evaluation methods for linguists to use when employing an automatically annotated treebank as a source of linguistic evidence. Several alternatives for the probability estimation of the transformed n-grams are explored, and an approach to deal with separable verbs in Dutch is also discussed. We investigate whether a language model trained on the expanded data performs better than a baseline n-gram model with modified Kneser-Ney smoothing in terms of perplexity and word error rate. By adding transformed n-grams, we hope to adapt the language model such that it matches better with spoken language. Moreover, since language models for automatic speech recognition are usually trained on written language resources while they are tested on spoken language, certain patterns that are typical for spontanous spoken language will be under-represented and patterns characteristic of written language will be over-represented. The main aim of this technique is to alleviate a classical problem for language models: data sparsity. The subject of this paper is the expansion of n-gram training data with the aid of morpho-syntactic transformations, in order to create a larger amount of reliable n-grams for Dutch language models.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |