22. Generating HTML

makeinfo generates Info output by default, but given the ‘--html’ option, it will generate HTML, for web browsers and other programs. This chapter gives some details on such HTML output.

makeinfo can also write in XML and Docbook format, but we do not as yet describe these further. See section Output Formats, for a brief overview of all the output formats.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.1 HTML Translation

makeinfo will include segments of Texinfo source between @ifhtml and @end ifhtml in the HTML output (but not any of the other conditionals, by default). Source between @html and @end html is passed without change to the output (i.e., suppressing the normal escaping of input ‘<’, ‘>’ and ‘&’ characters which have special significance in HTML). See section Conditional Commands.

The ‘--footnote-style’ option is currently ignored for HTML output; footnotes are always linked to the end of the output file.

By default, a navigation bar is inserted at the start of each node, analogous to Info output. The ‘--no-headers’ option suppresses this if used with ‘--no-split’. Header <link> elements in split output can support info-like navigation with browsers like Lynx and Emacs W3 which implement this HTML 1.0 feature.

The HTML generated is mostly standard (i.e., HTML 2.0, RFC-1866). One exception is that HTML 3.2 tables are generated from the @multitable command, but tagged to degrade as well as possible in browsers without table support. The HTML 4 ‘lang’ attribute on the ‘<html>’ attribute is also used. (Please report output from an error-free run of makeinfo which has browser portability problems as a bug.)

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.2 HTML Splitting

When splitting output (which is the default), makeinfo writes HTML output into (generally) one output file per Texinfo source @node.

The output file name is the node name with special characters replaced by ‘-’’s, so it can work as a filename. In the unusual case of two different nodes having the same name after this treatment, they are written consecutively to the same file, with HTML anchors so each can be referred to separately. If makeinfo is run on a system which does not distinguish case in filenames, nodes which are the same except for case will also be folded into the same output file.

When splitting, the HTML output files are written into a subdirectory, with the name chosen as follows:

makeinfo first tries the subdirectory with the base name from @setfilename (that is, any extension is removed). For example, HTML output for @setfilename gcc.info would be written into a subdirectory named ‘gcc’.
If that directory cannot be created for any reason, then makeinfo tries appending ‘.html’ to the directory name. For example, output for @setfilename texinfo would be written to ‘texinfo.html’.
If the ‘name.html’ directory can’t be created either, makeinfo gives up.

In any case, the top-level output file within the directory is always named ‘index.html’.

Monolithic output (--no-split) is named according to @setfilename (with any ‘.info’ extension is replaced with ‘.html’) or --output (the argument is used literally).

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.3 HTML CSS

Cascading Style Sheets (CSS for short) is an Internet standard for influencing the display of HTML documents: see http://www.w3.org/Style/CSS/.

By default, makeinfo includes a few simple CSS commands to better implement the appearance of some of the environments. Here are two of them, as an example:

pre.display { font-family:inherit }
pre.smalldisplay { font-family:inherit; font-size:smaller }

A full explanation of CSS is (far) beyond this manual; please see the reference above. In brief, however, this specification tells the web browser to use a ‘smaller’ font size for @smalldisplay text, and to use the ‘inherited’ font (generally a regular roman typeface) for both @smalldisplay and @display. By default, the HTML ‘<pre>’ command uses a monospaced font.

You can influence the CSS in the HTML output with two makeinfo options: ‘--css-include=file’ and ‘--css-ref=url’.

The option ‘--css-ref=url’ adds to each output HTML file a ‘<link>’ tag referencing a CSS at the given url. This allows using external style sheets.

The option ‘--css-include=file’ includes the contents file in the HTML output, as you might expect. However, the details are somewhat tricky, as described in the following, to provide maximum flexibility.

The CSS file may begin with so-called ‘@import’ directives, which link to external CSS specifications for browsers to use when interpreting the document. Again, a full description is beyond our scope here, but we’ll describe how they work syntactically, so we can explain how makeinfo handles them.

There can be more than one ‘@import’, but they have to come first in the file, with only whitespace and comments interspersed, no normal definitions. (Technical exception: an ‘@charset’ directive may precede the ‘@import’’s. This does not alter makeinfo’s behavior, it just copies the ‘@charset’ if present.) Comments in CSS files are delimited by ‘/* ... */’, as in C. An ‘@import’ directive must be in one of these two forms:

@import url(http://example.org/foo.css);
@import "http://example.net/bar.css";

As far as makeinfo is concerned, the crucial characters are the ‘@’ at the beginning and the semicolon terminating the directive. When reading the CSS file, it simply copies any such ‘@’-directive into the output, as follows:

If file contains only normal CSS declarations, it is included after makeinfo’s default CSS, thus overriding it.
If file begins with ‘@import’ specifications (see below), then the ‘import’’s are included first (they have to come first, according to the standard), and then makeinfo’s default CSS is included. If you need to override makeinfo’s defaults from an ‘@import’, you can do so with the ‘! important’ CSS construct, as in:
pre.smallexample { font-size: inherit ! important }
If file contains both ‘@import’ and inline CSS specifications, the ‘@import’’s are included first, then makeinfo’s defaults, and lastly the inline CSS from file.
Any @-directive other than ‘@import’ and ‘@charset’ is treated as a CSS declaration, meaning makeinfo includes its default CSS and then the rest of the file.

If the CSS file is malformed or erroneous, makeinfo’s output is unspecified. makeinfo does not try to interpret the meaning of the CSS file in any way; it just looks for the special ‘@’ and ‘;’ characters and blindly copies the text into the output. Comments in the CSS file may or may not be included in the output.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4 HTML Cross-references

Cross-references between Texinfo manuals in HTML format amount, in the end, to a standard HTML <a> link, but the details are unfortunately complex. This section describes the algorithm used in detail, so that Texinfo can cooperate with other programs, such as texi2html, by writing mutually compatible HTML files.

This algorithm may or may not be used for links within HTML output for a Texinfo file. Since no issues of compatibility arise in such cases, we do not need to specify this.

We try to support references to such “external” manuals in both monolithic and split forms. A monolithic (mono) manual is entirely contained in one file, and a split manual has a file for each node. (See section HTML Splitting.)

Acknowledgement: this algorithm was primarily devised by Patrice Dumas in 2003–04.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4.1 HTML Cross-reference Link Basics

For our purposes, an HTML link consists of four components: a host name, a directory part, a file part, and a target part. We always assume the http protocol. For example:

http://host/dir/file.html#target

The information to construct a link comes from the node name and manual name in the cross-reference command in the Texinfo source (see section Cross References), and from external information, which is currently simply hardwired. In the future, it may come from an external data file.

We now consider each part in turn.

The host is hardwired to be the local host. This could either be the literal string ‘localhost’, or, according to the rules for HTML links, the ‘http://localhost/’ could be omitted entirely.

The dir and file parts are more complicated, and depend on the relative split/mono nature of both the manual being processed and the manual that the cross-reference refers to. The underlying idea is that there is one directory for Texinfo manuals in HTML, and a given manual is either available as a monolithic file ‘manual.html’, or a split subdirectory ‘manual/*.html’. Here are the cases:

If the present manual is split, and the referent manual is also split, the directory is ‘../referent/’ and the file is the expanded node name (described later).
If the present manual is split, and the referent manual is mono, the directory is ‘../’ and the file is ‘referent.html’.
If the present manual is mono, and the referent manual is split, the directory is ‘referent/’ and the file is the expanded node name.
If the present manual is mono, and the referent manual is also mono, the directory is ‘./’ (or just the empty string), and the file is ‘referent.html’.

One exception: the algorithm for node name expansion prefixes the string ‘g_t’ when the node name begins with a non-letter. This kludge (due to XHTML rules) is not necessary for filenames, and is therefore omitted.

Any directory part in the filename argument of the source cross-reference command is ignored. Thus, @xref{,,,../foo} and @xref{,,,foo} both use ‘foo’ as the manual name. This is because any such attempted hardwiring of the directory is very unlikely to be useful for both Info and HTML output.

Finally, the target part is always the expanded node name.

Whether the present manual is split or mono is determined by user option; makeinfo defaults to split, with the ‘--no-split’ option overriding this.

Whether the referent manual is split or mono is another bit of the external information. For now, makeinfo simply assumes the referent manual is the same as the present manual.

There can be a mismatch between the format of the referent manual that the generating software assumes, and the format it’s actually present in. See section HTML Cross-reference Mismatch.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4.2 HTML Cross-reference Node Name Expansion

As mentioned in the previous section, the key part of the HTML cross-reference algorithm is the conversion of node names in the Texinfo source into strings suitable for XHTML identifiers and filenames. The restrictions are similar for each: plain ASCII letters, numbers, and the ‘-’ and ‘_’ characters are all that can be used. (Although HTML anchors can contain most characters, XHTML is more restrictive.)

Cross-references in Texinfo can actually refer either to nodes or anchors (see section @anchor: Defining Arbitrary Cross-reference Targets), but anchors are treated identically to nodes in this context, so we’ll continue to say “node” names for simplicity.

(@-commands and 8-bit characters are not presently handled by makeinfo for HTML cross-references. See the next section.)

A special exception: the Top node (see section The ‘Top’ Node and Master Menu) is always mapped to the file ‘index.html’, to match web server software. However, the HTML target is ‘Top’. Thus (in the split case):

@xref{Top, Introduction,, xemacs, XEmacs User's Manual}.
⇒ <a href="xemacs/index.html#Top">

The standard ASCII letters (a-z and A-Z) are not modified. All other characters are changed as specified below.
The standard ASCII numbers (0-9) are not modified except when a number is the first character of the node name. In that case, see below.
Multiple consecutive space, tab and newline characters are transformed into just one space. (It’s not possible to have newlines in node names with the current implementation, but we specify it anyway, just in case.)
Leading and trailing spaces are removed.
After the above has been applied, each remaining space character is converted into a ‘-’ character.
Other ASCII 7-bit characters are transformed into ‘_00xx’, where xx is the ASCII character code in (lowercase) hexadecimal. This includes ‘_’, which is mapped to ‘_005f’.
If the node name does not begin with a letter, the literal string ‘g_t’ is prefixed to the result. (Due to the rules above, that string can never occur otherwise; it is an arbitrary choice, standing for “GNU Texinfo”.) This is necessary because XHTML requires that identifiers begin with a letter.

For example:

@node A  node --- with _'%
⇒ A-node-_002d_002d_002d-with-_005f_0027_0025

Notice in particular:

‘_’ ⇒ ‘_005f’
‘-’ ⇒ ‘_002d’
‘A node’ ⇒ ‘A-node’

On case-folding computer systems, nodes differing only by case will be mapped to the same file.

In particular, as mentioned above, Top always maps to the file ‘index.html’. Thus, on a case-folding system, Top and a node named ‘Index’ will both be written to ‘index.html’.

Fortunately, the targets serve to distinguish these cases, since HTML target names are always case-sensitive, independent of operating system.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4.3 HTML Cross-reference Command Expansion

In standard Texinfo, node names may not contain @-commands. makeinfo has an option ‘--commands-in-node-names’ which partially supports it (see section Running makeinfo from a Shell), but it is not robust and not recommended.

Thus, makeinfo does not fully implement this part of the HTML cross-reference algorithm, but it is documented here for the sake of completeness.

First, comments are removed.

Next, any @value commands (see section @set and @value) and macro invocations (see section Invoking Macros) are fully expanded.

Then, for the following commands, the command name and braces are removed, the text of the argument is recursively transformed:

@asis @b @cite @code @command @dfn @dmn @dotless
@emph @env @file @indicateurl @kbd @key
@samp @sc @slanted @strong @t @var @w

For @sc, any letters are capitalized.

The following commands are replaced by constant text, as shown. If any of these commands have non-empty arguments, as in @TeX{bad}, it is an error, and the result is unspecified. ‘(space)’ means a space character, ‘(nothing)’ means the empty string, etc. The notation ‘U+xxxx’ means Unicode code point xxxx (in hex, as usual). There are further transformations of many of these expansions for the final file or target name, such as space characters to ‘-’, etc., according to the other rules.

`@(newline)`	(space)
`@(space)`	(space)
`@(tab)`	(space)
`@!`	‘`!`’
`@*`	(space)
`@-`	(nothing)
`@.`	‘`.`’
`@:`	(nothing)
`@?`	‘`?`’
`@@`	‘`@`’
`@{`	‘`{`’
`@}`	‘`}`’
`@LaTeX`	‘`LaTeX`’
`@TeX`	‘`TeX`’
`@arrow`	U+2192
`@bullet`	U+2022
`@comma`	‘`,`’
`@copyright`	U+00A9
`@dots`	U+2026
`@enddots`	‘`...`’
`@equiv`	U+2261
`@error`	‘`error-->`’
`@euro`	U+20AC
`@exclamdown`	U+00A1
`@expansion`	U+2192
`@geq`	U+2265
`@leq`	U+2264
`@minus`	U+2212
`@ordf`	U+00AA
`@ordm`	U+00BA
`@point`	U+2605
`@pounds`	U+00A3
`@print`	U+22A3
`@questiondown`	U+00BF
`@registeredsymbol`	U+00AE
`@result`	U+21D2
`@textdegree`	U+00B0
`@tie`	(space)

Quotation mark commands are likewise replaced by their Unicode values (see section Inserting Quotation Marks).

An @acronym or @abbr command is replaced by the first argument, followed by the second argument in parentheses, if present. See section @acronym{acronym[, meaning]}.

An @email command is replaced by the text argument if present, else the address. See section @email{email-address[, displayed-text]}.

An @image command is replaced by the filename (first) argument. See section Inserting Images.

A @verb command is replaced by its transformed argument. See section @verb{<char>text<char>}.

Any other command is an error, and the result is unspecified.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4.4 HTML Cross-reference 8-bit Character Expansion

Usually, characters other than plain 7-bit ASCII are transformed into the corresponding Unicode code point(s) in Normalization Form C, which uses precomposed characters where available. (This is the normalization form recommended by the W3C and other bodies.) This holds when that code point is 0xffff or less, as it almost always is.

These will then be further transformed by the rules above into the string ‘_xxxx’, where xxxx is the code point in hex.

For example, combining this rule and the previous section:

@node @b{A} @TeX{} @u{B} @point{}@enddots{}
⇒ A-TeX-B_0306-_2605_002e_002e_002e

Notice: 1) @enddots expands to three periods which in turn expands to three ‘_002e’’s; 2) @u{B} is a ‘B’ with a breve accent, which does not exist as a pre-accented Unicode character, therefore expands to ‘B_0306’ (B with combining breve).

When the Unicode code point is above 0xffff, the transformation is ‘__xxxxxx’, that is, two leading underscores followed by six hex digits. Since Unicode has declared that their highest code point is 0x10ffff, this is sufficient. (We felt it was better to define this extra escape than to always use six hex digits, since the first two would nearly always be zeros.)

This method works fine if the node name consists mostly of ASCII characters and contains only few 8-bit ones. If the document is written in a language whose script is not based on the Latin alphabet (such as, e.g. Ukrainian), it will create file names consisting entirely of ‘_xxxx’ notations, which is inconvenient.

To handle such cases, makeinfo offers ‘--transliterate-file-names’ command line option. This option enables transliteration of node names into ASCII characters for the purposes of file name creation and referencing. The transliteration is based on phonetic principle, which makes the produced file names easily readable.

For the definition of Unicode Normalization Form C, see Unicode report UAX#15, http://www.unicode.org/reports/tr15/. Many related documents and implementations are available elsewhere on the web.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

22.4.5 HTML Cross-reference Mismatch

As mentioned earlier (see section HTML Cross-reference Link Basics), the generating software has to guess whether a given manual being cross-referenced is available in split or monolithic form—and, inevitably, it might guess wrong. However, it is possible when the referent manual itself is generated, it is possible to handle at least some mismatches.

In the case where we assume the referent is split, but it is actually available in mono, the only recourse would be to generate a ‘manual/’ subdirectory full of HTML files which redirect back to the monolithic ‘manual.html’. Since this is essentially the same as a split manual in the first place, it’s not very appealing.

On the other hand, in the case where we assume the referent is mono, but it is actually available in split, it is possible to use JavaScript to redirect from the putatively monolithic ‘manual.html’ to the different ‘manual/node.html’ files. Here’s an example:

function redirect() {
  switch (location.hash) {
    case "#Node1":
      location.replace("manual/Node1.html#Node1"); break;
    case "#Node2" :
      location.replace("manual/Node2.html#Node2"); break;
    …
    default:;
  }
}

Then, in the <body> tag of ‘manual.html’:

<body onLoad="redirect();">

Once again, this is something the software which generated the referent manual has to do in advance, it’s not something the software generating the actual cross-reference in the present manual can control.

Ultimately, we hope to allow for an external configuration file to control which manuals are available from where, and how.

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Aidan Kehoe on December 27, 2016 using texi2html 1.82.

22.1 HTML Translation		Details of the HTML output.
22.2 HTML Splitting		How HTML output is split.
22.3 HTML CSS		Influencing HTML output with Cascading Style Sheets.
22.4 HTML Cross-references		Cross-references in HTML output.