M-text objects and api for them.
M-text objects and API for them.
In the m17n library, text is represented as an object called M-text rather than as a C-string (char * or unsigned char *). An M-text is a sequence of characters whose length is equals to or more than 0, and can be coined from various character sources, e.g. C-strings, files, character codes, etc.
M-texts are more useful than C-strings in the following points.
M-texts can handle mixture of characters of various scripts, including all Unicode characters and more. This is an indispensable facility when handling multilingual text.
Each character in an M-text can have properties called text properties. Text properties store various kinds of information attached to parts of an M-text to provide application programs with a unified view of those information. As rich information can be stored in M-texts in the form of text properties, functions in application programs can be simple.
In addition, the library provides many functions to manipulate an M-text just the same way as a C-string.
Type of M-texts. The type MText is for an M-text object. Its internal structure is concealed from application programs.
Enumeration for specifying the format of an M-text. The enum MTextFormat is used as an argument of the mtext_from_data() function to specify the format of data from which an M-text is created.
Enumerator:
MTEXT_FORMAT_US_ASCII
US-ASCII encoding
MTEXT_FORMAT_UTF_8
UTF-8 encoding
MTEXT_FORMAT_UTF_16LE
UTF-16LE encoding
MTEXT_FORMAT_UTF_16BE
UTF-16BE encoding
MTEXT_FORMAT_UTF_32LE
UTF-32LE encoding
MTEXT_FORMAT_UTF_32BE
UTF-32BE encoding
MTEXT_FORMAT_MAX
Enumeration for specifying a set of line breaking option. The enum MTextLineBreakOption is to control the line breaking algorithm of the function mtext_line_break() by specifying logical-or of the members in the arg option.
Enumerator:
MTEXT_LBO_SP_CM
Specify the legacy support for space character as base for combining marks. See the section 8.3 of UAX#14.
MTEXT_LBO_KOREAN_SP
Specify to use space characters for line breaking Korean text.
MTEXT_LBO_AI_AS_ID
Specify to treat characters of ambiguous line-breaking class as of ideographic line-breaking class.
MTEXT_LBO_MAX
Variable of value MTEXT_FORMAT_UTF_16LE or MTEXT_FORMAT_UTF_16BE. The global variable MTEXT_FORMAT_UTF_16 is initialized to MTEXT_FORMAT_UTF_16LE on a 'Little Endian' system (storing words with the least significant byte first), and to MTEXT_FORMAT_UTF_16BE on a 'Big Endian' system (storing words with the most significant byte first).
SEE ALSO
mtext_from_data()
Variable of value MTEXT_FORMAT_UTF_32LE or MTEXT_FORMAT_UTF_32BE. The global variable MTEXT_FORMAT_UTF_32 is initialized to MTEXT_FORMAT_UTF_32LE on a 'Little Endian' system (storing words with the least significant byte first), and to MTEXT_FORMAT_UTF_32BE on a 'Big Endian' system (storing words with the most significant byte first).
SEE ALSO
mtext_from_data()
Generated automatically by Doxygen for The m17n Library from the source code.
Copyright (C) 2001 Information-technology Promotion Agency (IPA)
Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and Technology (AIST)
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License <http://www.gnu.org/licenses/fdl.html>.