Detailed Description

M-text objects and API for them.

In the m17n library, text is represented as an object called M-text rather than as a C-string (char * or unsigned char *). An M-text is a sequence of characters whose length is equals to or more than 0, and can be coined from various character sources, e.g. C-strings, files, character codes, etc.

M-texts are more useful than C-strings in the following points.

  • M-texts can handle mixture of characters of various scripts, including all Unicode characters and more. This is an indispensable facility when handling multilingual text.

  • Each character in an M-text can have properties called text properties. Text properties store various kinds of information attached to parts of an M-text to provide application programs with a unified view of those information. As rich information can be stored in M-texts in the form of text properties, functions in application programs can be simple.

In addition, the library provides many functions to manipulate an M-text just the same way as a C-string.

Typedef Documentation

typedef struct \fBMText\fP \fBMText\fP

Type of M-texts. The type MText is for an M-text object. Its internal structure is concealed from application programs.

Enumeration Type Documentation

enum \fBMTextFormat\fP

Enumeration for specifying the format of an M-text. The enum MTextFormat is used as an argument of the mtext_from_data() function to specify the format of data from which an M-text is created.

Enumerator:

MTEXT_FORMAT_US_ASCII

US-ASCII encoding

MTEXT_FORMAT_UTF_8

UTF-8 encoding

MTEXT_FORMAT_UTF_16LE

UTF-16LE encoding

MTEXT_FORMAT_UTF_16BE

UTF-16BE encoding

MTEXT_FORMAT_UTF_32LE

UTF-32LE encoding

MTEXT_FORMAT_UTF_32BE

UTF-32BE encoding

MTEXT_FORMAT_MAX

enum \fBMTextLineBreakOption\fP

Enumeration for specifying a set of line breaking option. The enum MTextLineBreakOption is to control the line breaking algorithm of the function mtext_line_break() by specifying logical-or of the members in the arg option.

Enumerator:

MTEXT_LBO_SP_CM

Specify the legacy support for space character as base for combining marks. See the section 8.3 of UAX#14.

MTEXT_LBO_KOREAN_SP

Specify to use space characters for line breaking Korean text.

MTEXT_LBO_AI_AS_ID

Specify to treat characters of ambiguous line-breaking class as of ideographic line-breaking class.

MTEXT_LBO_MAX

Variable Documentation

enum \fBMTextFormat\fP \fBMTEXT_FORMAT_UTF_16\fP

Variable of value MTEXT_FORMAT_UTF_16LE or MTEXT_FORMAT_UTF_16BE. The global variable MTEXT_FORMAT_UTF_16 is initialized to MTEXT_FORMAT_UTF_16LE on a 'Little Endian' system (storing words with the least significant byte first), and to MTEXT_FORMAT_UTF_16BE on a 'Big Endian' system (storing words with the most significant byte first).

SEE ALSO

mtext_from_data()

const int \fBMTEXT_FORMAT_UTF_32\fP

Variable of value MTEXT_FORMAT_UTF_32LE or MTEXT_FORMAT_UTF_32BE. The global variable MTEXT_FORMAT_UTF_32 is initialized to MTEXT_FORMAT_UTF_32LE on a 'Little Endian' system (storing words with the least significant byte first), and to MTEXT_FORMAT_UTF_32BE on a 'Big Endian' system (storing words with the most significant byte first).

SEE ALSO

mtext_from_data()

\fBMSymbol\fP \fBMlanguage\fP The symbol whose name is 'language'.

Author

Generated automatically by Doxygen for The m17n Library from the source code.

COPYRIGHT

Copyright (C) 2001 Information-technology Promotion Agency (IPA)

Copyright (C) 2001-2011 National Institute of Advanced Industrial Science and Technology (AIST)

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License <http://www.gnu.org/licenses/fdl.html>.