Gaussian mixture model (gmm) training
gmm [-h] [-v] -i string [-g int] [-n int] [-P] [-N double] [-o string] [-p double] [-r] [-S int] [-s int] [-T double] [-t int] -V
This program takes a parametric estimate of a Gaussian mixture model (GMM) using the EM algorithm to find the maximum likelihood estimate. The model is saved to an XML file, which contains information about each Gaussian.
If GMM training fails with an error indicating that a covariance matrix could not be inverted, be sure that the 'no_force_positive' flag was not specified. Alternately, adding a small amount of Gaussian noise to the entire dataset may help prevent Gaussians with zero variance in a particular dimension, which is usually the cause of non-invertible covariance matrices.
The 'no_force_positive' flag, if set, will avoid the checks after each iteration of the EM algorithm which ensure that the covariance matrices are positive definite. Specifying the flag can cause faster runtime, but may also cause non-positive definite covariance matrices, which will cause the program to crash.
--input_file (-i) [string] File containing the data on which the model will be fit.
--gaussians (-g) [int] Number of Gaussians in the GMM. Default value 1.
--help (-h) Default help info.
--info [string] Get help on a specific module or option. Default value ''.
--max_iterations (-n) [int] Maximum number of iterations of EM algorithm (passing 0 will run until convergence). Default value 250.
--no_force_positive (-P) Do not force the covariance matrices to be positive definite.
--noise (-N) [double] Variance of zero-mean Gaussian noise to add to data. Default value 0.
--output_file (-o) [string] The file to write the trained GMM parameters into (as XML). Default value 'gmm.xml'.
--percentage (-p) [double] If using --refined_start, specify the percentage of the dataset used for each sampling (should be between 0.0 and 1.0). Default value 0.02.
--refined_start (-r) During the initialization, use refined initial positions for k-means clustering (Bradley and Fayyad, 1998).
--samplings (-S) [int] If using --refined_start, specify the number of samplings used for initial points. Default value 100.
--seed (-s) [int] Random seed. If 0, 'std::time(NULL)' is used. Default value 0.
--tolerance (-T) [double] Tolerance for convergence of EM. Default value 1e-10.
--trials (-t) [int] Number of trials to perform in training GMM. Default value 10.
--verbose (-v) Display informational messages and the full list of parameters and timers at the end of execution.
--version (-V) Display the version of mlpack.
For further information, including relevant papers, citations, and theory, consult the documentation found at http://www.mlpack.org or included with your distribution of MLPACK.