Modification of utf-7 encoding for imap
use Encode qw/encode decode/; use Encode::IMAPUTF7; print encode('IMAP-UTF-7', 'RA\*~Xpertoire'); print decode('IMAP-UTF-7', R&AOk-pertoire');
\s-1IMAP\s0 mailbox names are encoded in a modified \s-1UTF7\s0 when names contains international characters outside of the printable \s-1ASCII\s0 range. The modified \s-1UTF-7\s0 encoding is defined in \s-1RFC2060\s0 (section 5.1.3).
There is another \s-1CPAN\s0 module with same purpose, Unicode::IMAPUtf7. However, it works correctly only with strings, which encoded form does not contain plus sign. For example, the Cyrillic string \x{043f}\x{0440}\x{0435}\x{0434}\x{043b}\x{043e}\x{0433} is represented in \s-1UTF-7\s0 as +BD8EQAQ1BDQEOwQ+BDM- Note the second plus sign 4 characters before the end. Unicode::IMAPUtf7 encodes the above string as +BD8EQAQ1BDQEOwQ&BDM- which is not valid modified \s-1UTF-7\s0 (the ampersand and the plus are swapped). The problem is solved by the current module, which is slightly modified Encode::Unicode::UTF7 and has nothing common with Unicode::IMAPUtf7.
By convention, international mailbox names are specified using a modified version of the \s-1UTF-7\s0 encoding described in [\s-1UTF-7\s0]. The purpose of these modifications is to correct the following problems with \s-1UTF-7:\s0
1) \s-1UTF-7\s0 uses the \*(L"+\*(R" character for shifting; this conflicts with
the common use of \*(L"+\*(R" in mailbox names, in particular \s-1USENET\s0 newsgroup names.
2) \s-1UTF-7\s0's encoding is \s-1BASE64\s0 which uses the \*(L"/\*(R" character; this
conflicts with the use of \*(L"/\*(R" as a popular hierarchy delimiter.
3) \s-1UTF-7\s0 prohibits the unencoded usage of \*(L"\\*(R"; this conflicts with
the use of \*(L"\\*(R" as a popular hierarchy delimiter.
4) \s-1UTF-7\s0 prohibits the unencoded usage of \*(L"~\*(R"; this conflicts with
the use of \*(L"~\*(R" in some servers as a home directory indicator.
5) \s-1UTF-7\s0 permits multiple alternate forms to represent the same
string; in particular, printable US-ASCII chararacters can be represented in encoded form.
In modified \s-1UTF-7\s0, printable US-ASCII characters except for \*(L"&\*(R" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character \*(L"&\*(R" (0x26) is represented by the two- octet sequence \*(L"&-\*(R".
All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified \s-1BASE64\s0, with a further modification from [\s-1UTF-7\s0] that \*(L",\*(R" is used instead of \*(L"/\*(R". Modified \s-1BASE64\s0 \s-1MUST\s0 \s-1NOT\s0 be used to represent any printing US-ASCII character which can represent itself.
\*(L"&\*(R" is used to shift to modified \s-1BASE64\s0 and \*(L"-\*(R" to shift back to \s-1US-\s0 \s-1ASCII\s0. All names start in US-ASCII, and \s-1MUST\s0 end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet \s-1MUST\s0 end with a \*(L"- \*(R").
For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-
Please report any requests, suggestions or bugs via the \s-1RT\s0 bug-tracking system at http://rt.cpan.org/ or email to [email protected].
http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode-IMAPUTF7 is the \s-1RT\s0 queue for Encode::IMAPUTF7. Please check to see if your bug has already been reported.
Copyright 2005 Sava Chankov
Sava Chankov, [email protected]
This software may be freely copied and distributed under the same terms and conditions as Perl.
Peter Makholm <[email protected]>, current maintainer
Sava Chankov <[email protected]>, original author
perl\|(1), Encode.