<html><head><meta name="color-scheme" content="light dark"></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">package MIME::Words;

=head1 NAME

MIME::Words - deal with RFC 2047 encoded words


=head1 SYNOPSIS

Before reading further, you should see L&lt;MIME::Tools&gt; to make sure that
you understand where this module fits into the grand scheme of things.
Go on, do it now.  I'll wait.

Ready?  Ok...


    use MIME::Words qw(:all);

    ### Decode the string into another string, forgetting the charsets:
    $decoded = decode_mimewords(
          'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= &lt;keld@dkuug.dk&gt;',
          );

    ### Split string into array of decoded [DATA,CHARSET] pairs:
    @decoded = decode_mimewords(
          'To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= &lt;keld@dkuug.dk&gt;',
          );

    ### Encode a single unsafe word:
    $encoded = encode_mimeword("\xABFran\xE7ois\xBB");

    ### Encode a string, trying to find the unsafe words inside it:
    $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB in town");



=head1 DESCRIPTION

Fellow Americans, you probably won't know what the hell this module
is for.  Europeans, Russians, et al, you probably do.  C&lt;:-)&gt;.

For example, here's a valid MIME header you might get:

      From: =?US-ASCII?Q?Keith_Moore?= &lt;moore@cs.utk.edu&gt;
      To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= &lt;keld@dkuug.dk&gt;
      CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard &lt;PIRARD@vm1.ulg.ac.be&gt;
      Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
       =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
       =?US-ASCII?Q?.._cool!?=

The fields basically decode to (sorry, I can only approximate the
Latin characters with 7 bit sequences /o and 'e):

      From: Keith Moore &lt;moore@cs.utk.edu&gt;
      To: Keld J/orn Simonsen &lt;keld@dkuug.dk&gt;
      CC: Andr'e  Pirard &lt;PIRARD@vm1.ulg.ac.be&gt;
      Subject: If you can read this you understand the example... cool!


=head1 PUBLIC INTERFACE

=over 4

=cut

require 5.001;

### Pragmas:
use strict;
use re 'taint';
use vars qw($VERSION @EXPORT_OK %EXPORT_TAGS @ISA);

### Exporting:
use Exporter;
%EXPORT_TAGS = (all =&gt; [qw(decode_mimewords
			   encode_mimeword
			   encode_mimewords
			   )]);
Exporter::export_ok_tags('all');

### Inheritance:
@ISA = qw(Exporter);

### Other modules:
use MIME::Base64;
use MIME::QuotedPrint;



#------------------------------
#
# Globals...
#
#------------------------------

### The package version, both in 1.23 style *and* usable by MakeMaker:
$VERSION = "5.509";

### Nonprintables (controls + x7F + 8bit):
my $NONPRINT = "\\x00-\\x1F\\x7F-\\xFF";


#------------------------------

# _decode_Q STRING
#     Private: used by _decode_header() to decode "Q" encoding, which is
#     almost, but not exactly, quoted-printable.  :-P
sub _decode_Q {
    my $str = shift;
    local $1;
    $str =~ s/_/\x20/g;                                # RFC-1522, Q rule 2
    $str =~ s/=([\da-fA-F]{2})/pack("C", hex($1))/ge;  # RFC-1522, Q rule 1
    $str;
}

# _encode_Q STRING
#     Private: used by _encode_header() to decode "Q" encoding, which is
#     almost, but not exactly, quoted-printable.  :-P
sub _encode_Q {
    my $str = shift;
    local $1;
    $str =~ s{([ _\?\=$NONPRINT])}{sprintf("=%02X", ord($1))}eog;
    $str;
}

# _decode_B STRING
#     Private: used by _decode_header() to decode "B" encoding.
sub _decode_B {
    my $str = shift;
    decode_base64($str);
}

# _encode_B STRING
#     Private: used by _decode_header() to decode "B" encoding.
sub _encode_B {
    my $str = shift;
    encode_base64($str, '');
}



#------------------------------

=item decode_mimewords ENCODED

I&lt;Function.&gt;
Go through the string looking for RFC 2047-style "Q"
(quoted-printable, sort of) or "B" (base64) encoding, and decode them.

B&lt;In an array context,&gt; splits the ENCODED string into a list of decoded
C&lt;[DATA, CHARSET]&gt; pairs, and returns that list.  Unencoded
data are returned in a 1-element array C&lt;[DATA]&gt;, giving an effective
CHARSET of C&lt;undef&gt;.

    $enc = '=?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= &lt;keld@dkuug.dk&gt;';
    foreach (decode_mimewords($enc)) {
        print "", ($_-&gt;[1] || 'US-ASCII'), ": ", $_-&gt;[0], "\n";
    }

B&lt;In a scalar context,&gt; joins the "data" elements of the above
list together, and returns that.  I&lt;Warning: this is information-lossy,&gt;
and probably I&lt;not&gt; what you want, but if you know that all charsets
in the ENCODED string are identical, it might be useful to you.
(Before you use this, please see L&lt;MIME::WordDecoder/unmime&gt;,
which is probably what you want.)

In the event of a syntax error, $@ will be set to a description
of the error, but parsing will continue as best as possible (so as to
get I&lt;something&gt; back when decoding headers).
$@ will be false if no error was detected.

Any arguments past the ENCODED string are taken to define a hash of options:

=cut

sub decode_mimewords {
    my $encstr = shift;
    my @tokens;
    local($1,$2,$3);
    $@ = '';           ### error-return

    ### Collapse boundaries between adjacent encoded words:
    $encstr =~ s{(\?\=)\s*(\=\?)}{$1$2}gs;
    pos($encstr) = 0;
    ### print STDOUT "ENC = [", $encstr, "]\n";

    ### Decode:
    my ($charset, $encoding, $enc, $dec);
    while (1) {
	last if (pos($encstr) &gt;= length($encstr));
	my $pos = pos($encstr);               ### save it

	### Case 1: are we looking at "=?..?..?="?
	if ($encstr =~    m{\G             # from where we left off..
			    =\?([^?]*)     # "=?" + charset +
			     \?([bq])      #  "?" + encoding +
			     \?([^?]+)     #  "?" + data maybe with spcs +
			     \?=           #  "?="
			    }xgi) {
	    ($charset, $encoding, $enc) = ($1, lc($2), $3);
	    $dec = (($encoding eq 'q') ? _decode_Q($enc) : _decode_B($enc));
	    push @tokens, [$dec, $charset];
	    next;
	}

	### Case 2: are we looking at a bad "=?..." prefix?
	### We need this to detect problems for case 3, which stops at "=?":
	pos($encstr) = $pos;               # reset the pointer.
	if ($encstr =~ m{\G=\?}xg) {
	    $@ .= qq|unterminated "=?..?..?=" in "$encstr" (pos $pos)\n|;
	    push @tokens, ['=?'];
	    next;
	}

	### Case 3: are we looking at ordinary text?
	pos($encstr) = $pos;               # reset the pointer.
	if ($encstr =~ m{\G                # from where we left off...
			 (.*?    #   shortest possible string,
			  \n*)             #   followed by 0 or more NLs,
		         (?=(\Z|=\?))      # terminated by "=?" or EOS
			}sxg) {
	    length($1) or die "MIME::Words: internal logic err: empty token\n";
	    push @tokens, [$1];
	    next;
	}

	### Case 4: bug!
	die "MIME::Words: unexpected case:\n($encstr) pos $pos\n\t".
	    "Please alert developer.\n";
    }
    return (wantarray ? @tokens : join('',map {$_-&gt;[0]} @tokens));
}

#------------------------------

=item encode_mimeword RAW, [ENCODING], [CHARSET]

I&lt;Function.&gt;
Encode a single RAW "word" that has unsafe characters.
The "word" will be encoded in its entirety.

    ### Encode "&lt;&lt;Franc,ois&gt;&gt;":
    $encoded = encode_mimeword("\xABFran\xE7ois\xBB");

You may specify the ENCODING (C&lt;"Q"&gt; or C&lt;"B"&gt;), which defaults to C&lt;"Q"&gt;.
You may specify the CHARSET, which defaults to C&lt;iso-8859-1&gt;.

=cut

sub encode_mimeword {
    my $word = shift;
    my $encoding = uc(shift || 'Q');
    my $charset  = uc(shift || 'ISO-8859-1');
    my $encfunc  = (($encoding eq 'Q') ? \&amp;_encode_Q : \&amp;_encode_B);
    "=?$charset?$encoding?" . &amp;$encfunc($word) . "?=";
}

#------------------------------

=item encode_mimewords RAW, [OPTS]

I&lt;Function.&gt;
Given a RAW string, try to find and encode all "unsafe" sequences
of characters:

    ### Encode a string with some unsafe "words":
    $encoded = encode_mimewords("Me and \xABFran\xE7ois\xBB");

Returns the encoded string.
Any arguments past the RAW string are taken to define a hash of options:

=over 4

=item Charset

Encode all unsafe stuff with this charset.  Default is 'ISO-8859-1',
a.k.a. "Latin-1".

=item Encoding

The encoding to use, C&lt;"q"&gt; or C&lt;"b"&gt;.  The default is C&lt;"q"&gt;.

=back

B&lt;Warning:&gt; this is a quick-and-dirty solution, intended for character
sets which overlap ASCII.  B&lt;It does not comply with the RFC 2047
rules regarding the use of encoded words in message headers&gt;.
You may want to roll your own variant,
using C&lt;encode_mimeword()&gt;, for your application.
I&lt;Thanks to Jan Kasprzak for reminding me about this problem.&gt;

=cut

sub encode_mimewords {
    my ($rawstr, %params) = @_;
    my $charset  = $params{Charset} || 'ISO-8859-1';
    my $encoding = lc($params{Encoding} || 'q');

    ### Encode any "words" with unsafe characters.
    ###    We limit such words to 18 characters, to guarantee that the
    ###    worst-case encoding give us no more than 54 + ~10 &lt; 75 characters
    my $word;
    local $1;
    $rawstr =~ s{([a-zA-Z0-9\x7F-\xFF]+\s*)}{     ### get next "word"
	$word = $1;
	(($word !~ /(?:[$NONPRINT])|(?:^\s+$)/o)
	 ? $word                                          ### no unsafe chars
	 : encode_mimeword($word, $encoding, $charset));  ### has unsafe chars
    }xeg;
    $rawstr =~ s/\?==\?/?= =?/g;
    $rawstr;
}

1;
__END__


=back

=head1 SEE ALSO

L&lt;MIME::Base64&gt;, L&lt;MIME::QuotedPrint&gt;, L&lt;MIME::Tools&gt;

For other implementations of this or similar functionality (particularly, ones
with proper UTF8 support), see:

L&lt;Encode::MIME::Header&gt;, L&lt;MIME::EncWords&gt;, L&lt;MIME::AltWords&gt;

At some future point, one of these implementations will likely replace
MIME::Words and MIME::Words will become deprecated.

=head1 NOTES

Exports its principle functions by default, in keeping with
MIME::Base64 and MIME::QuotedPrint.


=head1 AUTHOR

Eryq (F&lt;eryq@zeegee.com&gt;), ZeeGee Software Inc (F&lt;http://www.zeegee.com&gt;).
Dianne Skoll (dfs@roaringpenguin.com) http://www.roaringpenguin.com

All rights reserved.  This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.

Thanks also to...

      Kent Boortz        For providing the idea, and the baseline
                         RFC-1522-decoding code!
      KJJ at PrimeNet    For requesting that this be split into
                         its own module.
      Stephane Barizien  For reporting a nasty bug.

</pre></body></html>