Phonetic transcription for slovak
g2p-sk [--color] [--dl debug level] [--help] [--stats] [--ofile <file_name>] [<input file>]
The phonetic transcription is essential for some linguistic or speech recognition applications. Depending on the language either rule based or statistical approach is being used. g2p-sk implements the rule based approach but in the future it may be replaced by statistical one.
Each input word consisting of the sequence of graphemes is transcribed in to the sequence of phones in the SAMPA coding. If no input file is specified, the standard input is expected. If input file is used then the output is written in to the file as well. The filename is input filename with the extension "_trans.txt".
The input output code page is ISO 8859-2. To use it with different CP use some CP converter and pipes. For example to have input and output in UTF-8 use (for interactive use): filterm UTF8-iso2 iso2-UTF8 g2p-sk or (for batch processing) iconv -f UTF-8 -t ISO_8859-2 | g2p-sk | iconv -f ISO_8859-2 -t UTF-8
Performance of the phonetic transcription depend on the morphematic segmentation. To improve the quality of the morphematic segmentation is possible to replace the small version of the simple morphematic dictionary in the /usr/share/g2p_sk/Exceptions/morfemy.ddat with the better one. The syllabic segmentation is as important as morphematic one. The syllabic segmentation is provided by sylseg-sk package.
The design of the g2p-sk is language dependent. To use it for another language the all rules need to be rewritten.
--color
Enable color output.
--dl 1..5
Set the debug level. Control the amount of displayed information The debug level 0 displays nothing. The maximum level 5 displays full debugging report. The default debug level is 1.
--help
Display a short help text
--ofile <file_name>
Write output also in to given file.
--stats
Count and display statistic for each phone
Use standard input and debug level 3:
g2p-sk --dl 3
Process all the from file aaa.txt:
g2p-sk aaa.txt
g2p-sk returns a zero if it succeeds to process all the input words
Jozef Ivanecky (dodo (at) kanoistika.sk)