SYNOPSIS

fst-train [ options ] file [ input-file ]

OPTIONS

-t file

use multiple transducers in the same way as fst-infl2.

-b

This option is used for supervised training with disambiguated data.

-d

Disambiguate the analyses symbolically as described in the man pages of fst-infl2.

-q

quiet mode

DESCRIPTION

fst-train is used to learn statistical weights for the transducers transitions based on training data. Training is either unsupervised (default) or supervised (option -b).

In supervised mode, the input contains fully disambiguated data with the surface and the analysis form. The format restrictions are identical to those applying for lexicon entries, i.e. all operators other than the colon operator (:) are interpreted literally.

In unsupervised mode, the input data consists of surface strings. The format is identical to the input format of fst-infl and fst-infl2.

The transducer weights are stored in files whose names are obtained by appending .prob to the names of the transducer files.

BUGS

No bugs are known so far.

RELATED TO fst-train…

fst-infl2, fst-compiler

AUTHOR

Helmut Schmid, Institute for Computational Linguistics, University of Stuttgart, Email: [email protected], This software is available under the GNU Public License.