[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

18. Internationalization

Texinfo has some support for writing in languages other than English, although this area still needs considerable work.

For a list of the various accented and special characters Texinfo supports, see Inserting Accents.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

18.1 @documentlanguage ll[_cc]: Set the Document Language

The @documentlanguage command declares the current document locale. Write it on a line by itself, near the beginning of the file, but after @setfilename (see section @setfilename):

 
@documentlanguage ll[_cc]

Include a two-letter ISO 639-2 language code (ll) following the command name, optionally followed by an underscore and two-letter ISO 3166 two-letter country code (cc). If you have a multilingual document, the intent is to be able to use this command multiple times, to declare each language change. If the command is not used at all, the default is en_US for US English.

As with GNU Gettext (see (gettext)Top section ‘Top’ in Gettext), if the country code is omitted, the main dialect is assumed where possible. For example, de is equivalent to de_DE (German as spoken in Germany).

For Info and other online output, this command changes the translation of various document strings such as “see” in cross-references (see section Cross References), “Function’ in defuns (see section Definition Commands), and so on. Some strings, such as “Node:”, “Next:”, “Menu:”, etc., are keywords in Info output, so are not translated there; they are translated in other output formats.

For TeX, this command causes a file ‘txi-locale.tex’ to be read (if it exists). If @setdocumentlanguage argument contains the optional ‘_cc’ suffix, this is tried first. For example, with @setdocumentlanguage de_DE, TeX first looks for ‘txi-de_DE.tex’, then ‘txi-de.tex’.

Such a ‘txi-*’ file is intended to redefine the various English words used in TeX output, such as ‘Chapter’, ‘See’, and so on. We are aware that individual words like these cannot always be translated in isolation, and that a very different strategy would be required for ideographic (among other) scripts. Help in improving Texinfo’s language support is welcome.

It would also be desirable for this command to also change TeX’s ideas of the current hyphenation patterns (via the TeX primitive \language), but this is unfortunately not currently implemented.

In September 2006, the W3C Internationalization Activity released a new recommendation for specifying languages: http://www.rfc-editor.org/rfc/bcp/bcp47.txt. When Gettext supports this new scheme, Texinfo will too.

Since the lists of language codes and country codes are updated relatively frequently, we don’t attempt to list them here. The valid language codes are on the official home page for ISO 639, http://www.loc.gov/standards/iso639-2/. The country codes and the official web site for ISO 3166 can be found via http://en.wikipedia.org/wiki/ISO_3166.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

18.2 @documentencoding enc: Set Input Encoding

The @documentencoding command declares the input document encoding. Write it on a line by itself, with a valid encoding specification following, near the beginning of the file but after @setfilename (see section @setfilename):

 
@documentencoding enc

At present, Texinfo supports only these encodings:

US-ASCII

This has no particular effect, but it’s included for completeness.

UTF-8

The vast global character encoding, expressed in 8-bit bytes. The Texinfo processors have no deep knowledge of Unicode; for the most part, they just pass along the input they are given to the output.

ISO-8859-1
ISO-8859-15
ISO-8859-2

These specify the standard encodings for Western European (the first two) and Eastern European languages (the third), respectively. ISO 8859-15 replaces some little-used characters from 8859-1 (e.g., precomposed fractions) with more commonly needed ones, such as the Euro symbol (€).

A full description of the encodings is beyond our scope here; one useful reference is http://czyborra.com/charsets/iso8859.html.

koi8-r

This is the commonly used encoding for the Russian language.

koi8-u

This is the commonly used encoding for the Ukrainian language.

Specifying an encoding enc has the following effects:

In Info output, unless the option ‘--disable-encoding’ is given to makeinfo, a so-called ‘Local Variables’ section (see (xemacs)File Variables section ‘File Variables’ in XEmacs User’s Manual) is output including enc. This allows Info readers to set the encoding appropriately.

 
Local Variables:
coding: enc
End:

Also, in Info and plain text output (barring ‘--disable-encoding’), accent constructs and special characters, such as @'e, are output as the actual 8-bit character in the given encoding.

In HTML output, a ‘<meta>’ tag is output, in the ‘<head>’ section of the HTML, that specifies enc. Web servers and browsers cooperate to use this information so the correct encoding is used to display the page, if supported by the system.

 
<meta http-equiv="Content-Type" content="text/html;
     charset=enc">

In split HTML output, if ‘--transliterate-file-names’ is given (see section HTML Cross-reference 8-bit Character Expansion), the names of HTML files are formed by transliteration of the corresponding node names, using the specified encoding.

In XML and Docbook output, the given document encoding is written in the output file as usual with those formats.

In TeX output, the characters which are supported in the standard Computer Modern fonts are output accordingly. (For example, this means using constructed accents rather than precomposed glyphs.) Using a missing character generates a warning message, as does specifying an unimplemented encoding.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.