Encode::IMAPUTF7: Modification of utf-7 encoding for imap

SYNOPSIS

  use Encode qw/encode decode/;
  use Encode::IMAPUTF7;

  print encode('IMAP-UTF-7', 'RA\*~Xpertoire');
  print decode('IMAP-UTF-7', R&AOk-pertoire');

ABSTRACT

\s-1IMAP\s0 mailbox names are encoded in a modified \s-1UTF7\s0 when names contains international characters outside of the printable \s-1ASCII\s0 range. The modified \s-1UTF-7\s0 encoding is defined in \s-1RFC2060\s0 (section 5.1.3).

There is another \s-1CPAN\s0 module with same purpose, Unicode::IMAPUtf7. However, it works correctly only with strings, which encoded form does not contain plus sign. For example, the Cyrillic string \x{043f}\x{0440}\x{0435}\x{0434}\x{043b}\x{043e}\x{0433} is represented in \s-1UTF-7\s0 as +BD8EQAQ1BDQEOwQ+BDM- Note the second plus sign 4 characters before the end. Unicode::IMAPUtf7 encodes the above string as +BD8EQAQ1BDQEOwQ&BDM- which is not valid modified \s-1UTF-7\s0 (the ampersand and the plus are swapped). The problem is solved by the current module, which is slightly modified Encode::Unicode::UTF7 and has nothing common with Unicode::IMAPUtf7.

RFC2060 - section 5.1.3 - Mailbox International Naming Convention

By convention, international mailbox names are specified using a modified version of the \s-1UTF-7\s0 encoding described in [\s-1UTF-7\s0]. The purpose of these modifications is to correct the following problems with \s-1UTF-7:\s0

1) \s-1UTF-7\s0 uses the \*(L"+\*(R" character for shifting; this conflicts with

   the common use of \*(L"+\*(R" in mailbox names, in particular \s-1USENET\s0
   newsgroup names.

2) \s-1UTF-7\s0's encoding is \s-1BASE64\s0 which uses the \*(L"/\*(R" character; this

   conflicts with the use of \*(L"/\*(R" as a popular hierarchy delimiter.

3) \s-1UTF-7\s0 prohibits the unencoded usage of \*(L"\\*(R"; this conflicts with

   the use of \*(L"\\*(R" as a popular hierarchy delimiter.

4) \s-1UTF-7\s0 prohibits the unencoded usage of \*(L"~\*(R"; this conflicts with

   the use of \*(L"~\*(R" in some servers as a home directory indicator.

5) \s-1UTF-7\s0 permits multiple alternate forms to represent the same

   string; in particular, printable US-ASCII chararacters can be
   represented in encoded form.

In modified \s-1UTF-7\s0, printable US-ASCII characters except for \*(L"&\*(R" represent themselves; that is, characters with octet values 0x20-0x25 and 0x27-0x7e. The character \*(L"&\*(R" (0x26) is represented by the two- octet sequence \*(L"&-\*(R".

All other characters (octet values 0x00-0x1f, 0x7f-0xff, and all Unicode 16-bit octets) are represented in modified \s-1BASE64\s0, with a further modification from [\s-1UTF-7\s0] that \*(L",\*(R" is used instead of \*(L"/\*(R". Modified \s-1BASE64\s0 \s-1MUST\s0 \s-1NOT\s0 be used to represent any printing US-ASCII character which can represent itself.

\*(L"&\*(R" is used to shift to modified \s-1BASE64\s0 and \*(L"-\*(R" to shift back to \s-1US-\s0 \s-1ASCII\s0. All names start in US-ASCII, and \s-1MUST\s0 end in US-ASCII (that is, a name that ends with a Unicode 16-bit octet \s-1MUST\s0 end with a \*(L"- \*(R").

For example, here is a mailbox name which mixes English, Japanese, and Chinese text: ~peter/mail/&ZeVnLIqe-/&U,BTFw-

REQUESTS & BUGS

Please report any requests, suggestions or bugs via the \s-1RT\s0 bug-tracking system at http://rt.cpan.org/ or email to [email protected].

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Encode-IMAPUTF7 is the \s-1RT\s0 queue for Encode::IMAPUTF7. Please check to see if your bug has already been reported.

COPYRIGHT

Sava Chankov, [email protected]

This software may be freely copied and distributed under the same terms and conditions as Perl.

AUTHORS

Peter Makholm <[email protected]>, current maintainer

Sava Chankov <[email protected]>, original author

RELATED TO Encode::IMAPUTF7…

perl\|(1), Encode.

Encode::IMAPUTF7 (3pm)