Text Chunking based on a Generalization of Winnow
Tong Zhang, Fred Damerau, David Johnson;
2(Mar):615-637, 2002.
Abstract
This paper describes a text chunking system based on
a generalization of the Winnow algorithm. We propose
a general statistical model for text chunking which we then convert
into a classification problem. We argue that the Winnow family of
algorithms is particularly suitable for solving classification
problems arising from NLP applications, due to their robustness to
irrelevant features.
However in theory, Winnow may not converge for linearly non-separable data.
To remedy this problem, we employ a generalization of the original Winnow
method.
An additional advantage of the new algorithm is that it provides reliable
confidence estimates for its classification predictions. This property
is required in our statistical modeling approach.
We show that our system achieves state of the art performance in
text chunking with less computational cost then previous systems.
[abs]
[pdf]
[ps.gz]
[ps]