[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4. Basic Functions

This chapter describes the basic, ground-level functions for parsing and handling. Covered here is parsing From lines, removing comments from header lines, decoding encoded words, parsing date headers and so on. High-level functionality is dealt with in the first chapter (see section 1. Decoding and Viewing).

4.1 rfc2045  Encoding Content-Type headers.
4.2 rfc2231  Parsing Content-Type headers.
4.3 ietf-drums  Handling mail headers defined by RFC822bis.
4.4 rfc2047  En/decoding encoded words in headers.
4.5 time-date  Functions for parsing dates and manipulating time.
4.6 qp  Quoted-Printable en/decoding.
4.7 base64  Base64 en/decoding.
4.8 binhex  Binhex decoding.
4.9 uudecode  Uuencode decoding.
4.10 yenc  Yenc decoding.
4.11 rfc1843  Decoding HZ-encoded text.
4.12 mailcap  How parts are displayed is specified by the `.mailcap' file

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.1 rfc2045

RFC2045 is the "main" MIME document, and as such, one would imagine that there would be a lot to implement. But there isn't, since most of the implementation details are delegated to the subsequent RFCs.

So `rfc2045.el' has only a single function:

Takes a parameter and a value and returns a `PARAM=VALUE' string. value will be quoted if there are non-safe characters in it.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.2 rfc2231

RFC2231 defines a syntax for the Content-Type and Content-Disposition headers. Its snappy name is MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations.

In short, these headers look something like this:

Content-Type: application/x-stuff;
 title*2="isn't it!"

They usually aren't this bad, though.

The following functions are defined by this library:

Parse a Content-Type header and return a list describing its elements.

 title*2=\"isn't it!\"")
=> ("application/x-stuff"
    (title . "This is even more ***fun*** isn't it!"))

Takes one of the lists on the format above and returns the value of the specified attribute.

Encode a parameter in headers likes Content-Type and Content-Disposition.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.3 ietf-drums

drums is an IETF working group that is working on the replacement for RFC822.

The functions provided by this library include:

Remove the comments from the argument and return the results.

Remove linear white space from the string and return the results. Spaces inside quoted strings and comments are left untouched.

Return the last most comment from the string.

Parse an address string and return a list that contains the mailbox and the plain text name.

Parse a string that contains any number of comma-separated addresses and return a list that contains mailbox/plain text pairs.

Parse a date string and return an Emacs time structure.

Narrow the buffer to the header section of the current buffer.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.4 rfc2047

RFC2047 (Message Header Extensions for Non-ASCII Text) specifies how non-ASCII text in headers are to be encoded. This is actually rather complicated, so a number of variables are necessary to tweak what this library does.

The following variables are tweakable:

This is an alist of header / encoding-type pairs. Its main purpose is to prevent encoding of certain headers.

The keys can either be header regexps, or t.

The values can be nil, in which case the header(s) in question won't be encoded, mime, which means that they will be encoded, or address-mime, which means the header(s) will be encoded carefully assuming they contain addresses.

RFC2047 specifies two forms of encoding---Q (a Quoted-Printable-like encoding) and B (base64). This alist specifies which charset should use which encoding.

This is an alist of encoding / function pairs. The encodings are Q, B and nil.

When decoding words, this library looks for matches to this regexp.

The boolean variable specifies whether encoded words (e.g. `=?hello?=') should be encoded again.

Those were the variables, and these are this functions:

Narrow the buffer to the header on the current line.

Should be called narrowed to the header of a message. Encodes according to rfc2047-header-encoding-alist.

Encodes all encodable words in the region specified.

Encode a string and return the results.

Decode the encoded words in the region.

Decode a string and return the results.

Encode a parameter in the RFC2047-like style. This is a replacement for the rfc2231-encode-string function. See section 4.2 rfc2231.

When attaching files as MIME parts, we should use the RFC2231 encoding to specify the file names containing non-ASCII characters. However, many mail softwares don't support it in practice and recipients won't be able to extract files with correct names. Instead, the RFC2047-like encoding is acceptable generally. This function provides the very RFC2047-like encoding, resigning to such a regrettable trend. To use it, put the following line in your `~/.gnus.el' file:

(defalias 'mail-header-encode-parameter 'rfc2047-encode-parameter)

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.5 time-date

While not really a part of the MIME library, it is convenient to document this library here. It deals with parsing Date headers and manipulating time. (Not by using tesseracts, though, I'm sorry to say.)

These functions convert between five formats: A date string, an Emacs time structure, a decoded time list, a second number, and a day number.

Here's a bunch of time/date/second/day examples:

(parse-time-string "Sat Sep 12 12:21:54 1998 +0200")
=> (54 21 12 12 9 1998 6 nil 7200)

(date-to-time "Sat Sep 12 12:21:54 1998 +0200")
=> (13818 19266)

(time-to-seconds '(13818 19266))
=> 905595714.0

(seconds-to-time 905595714.0)
=> (13818 19266 0)

(time-to-days '(13818 19266))
=> 729644

(days-to-time 729644)
=> (961933 65536)

(time-since '(13818 19266))
=> (0 430)

(time-less-p '(13818 19266) '(13818 19145))
=> nil

(subtract-time '(13818 19266) '(13818 19145))
=> (0 121)

(days-between "Sat Sep 12 12:21:54 1998 +0200"
              "Sat Sep 07 12:21:54 1998 +0200")
=> 5

(date-leap-year-p 2000)
=> t

(time-to-day-in-year '(13818 19266))
=> 255

  (date-to-time "Mon, 01 Jan 2001 02:22:26 GMT")))
=> 4.146122685185185

And finally, we have safe-date-to-time, which does the same as date-to-time, but returns a zero time if the date is syntactically malformed.

The five data representations used are the following:

An RFC822 (or similar) date string. For instance: "Sat Sep 12 12:21:54 1998 +0200".

An internal Emacs time. For instance: (13818 26466).

A floating point representation of the internal Emacs time. For instance: 905595714.0.

An integer number representing the number of days since 00000101. For instance: 729644.

decoded time
A list of decoded time. For instance: (54 21 12 12 9 1998 6 t 7200).

All the examples above represent the same moment.

These are the functions available:

Take a date and return a time.

Take a time and return seconds.

Take seconds and return a time.

Take a time and return days.

Take days and return a time.

Take a date and return days.

Take a time and return the number of days that represents.

Take a date and return a time. If the date is not syntactically valid, return a "zero" time.

Take two times and say whether the first time is less (i. e., earlier) than the second time.

Take a time and return a time saying how long it was since that time.

Take two times and subtract the second from the first. I. e., return the time between the two times.

Take two days and return the number of days between those two days.

Take a year number and say whether it's a leap year.

Take a time and return the day number within the year that the time is in.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.6 qp

This library deals with decoding and encoding Quoted-Printable text.

Very briefly explained, qp encoding means translating all 8-bit characters (and lots of control characters) into things that look like `=EF'; that is, an equal sign followed by the byte encoded as a hex string.

The following functions are defined by the library:

QP-decode all the encoded text in the specified region.

Decode the QP-encoded text in a string and return the results.

QP-encode all the encodable characters in the specified region. The third optional parameter fold specifies whether to fold long lines. (Long here means 72.)

QP-encode all the encodable characters in a string and return the results.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.7 base64

Base64 is an encoding that encodes three bytes into four characters, thereby increasing the size by about 33%. The alphabet used for encoding is very resistant to mangling during transit.

The following functions are defined by this library:

base64 encode the selected region. Return the length of the encoded text. Optional third argument no-line-break means do not break long lines into shorter lines.

base64 encode a string and return the result.

base64 decode the selected region. Return the length of the decoded text. If the region can't be decoded, return nil and don't modify the buffer.

base64 decode a string and return the result. If the string can't be decoded, nil is returned.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.8 binhex

binhex is an encoding that originated in Macintosh environments. The following function is supplied to deal with these:

Decode the encoded text in the region. If given a third parameter, only decode the binhex header and return the filename.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.9 uudecode

uuencode is probably still the most popular encoding of binaries used on Usenet, although base64 rules the mail world.

The following function is supplied by this package:

Decode the text in the region.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.10 yenc

yenc is used for encoding binaries on Usenet. The following function is supplied by this package:

Decode the encoded text in the region.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.11 rfc1843

RFC1843 deals with mixing Chinese and ASCII characters in messages. In essence, RFC1843 switches between ASCII and Chinese by doing this:

This sentence is in ASCII.
The next sentence is in GB.~{<:Ky2;S{#,NpJ)l6HK!#~}Bye.

Simple enough, and widely used in China.

The following functions are available to handle this encoding:

Decode HZ-encoded text in the region.

Decode a HZ-encoded string and return the result.

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.12 mailcap

The `~/.mailcap' file is parsed by most MIME-aware message handlers and describes how elements are supposed to be displayed. Here's an example file:

image/*; gimp -8 %s
audio/wav; wavplayer %s
application/msword; catdoc %s ; copiousoutput ; nametemplate=%s.doc

This says that all image files should be displayed with gimp, that WAVE audio files should be played by wavplayer, and that MS-WORD files should be inlined by catdoc.

The mailcap library parses this file, and provides functions for matching types.

This variable is an alist of alists containing backup viewing rules.

Interface functions:

Parse the `~/.mailcap' file.

Takes a MIME type as its argument and returns the matching viewer.

[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by XEmacs Webmaster on October, 2 2007 using texi2html