TTH: a "TEX to HTML" translator.

TtH icon
Version 3.87

Abstract

TTH translates TEX documents that use the Plain macro package or LATEX, into HTML. It is extremely fast and completely portable. It produces web documents that are more compact and managable, and faster-viewing, than those from other converters, because it really translates the equations, instead of converting them into images.

Contents

1  Capabilities
    1.1  Plain TEX
        1.1.1  Mathematics
        1.1.2  Formatting and Macro Support
    1.2  LATEX
        1.2.1  Environments:
        1.2.2  LATEX Commands:
    1.3  Special TEX usage for TTH
    1.4  Unsupported Commands
2  Installation
3  Usage
4  Messages
5  Mathematics
    5.1  Equations
    5.2  In-line Equation Limitations
    5.3  Mathematics Layout Style Improvement using CSS
6  Features dependent on external programs.
    6.1  Independence of [La]TEX installation and the -L switch.
    6.2  BibTeX bibliographies
    6.3  Indexing
        6.3.1  Glossaries.
    6.4  Graphics Inclusion: epsfbox/includegraphics
    6.5  Picture Environments
7  Tabular Environment or Halign for Tables
    7.1  Tabular
    7.2  Halign
    7.3  Longtables
8  Boxes, Dimensions, and fills
TEX command definitions and other extensions
    9.1  Delimited-parameter macros and Conditionals
    9.2  Macro- and Style-file inclusion
    9.3  Layout to include arguments of unknown commands
    9.4  Restrictions on redefinition of internal commands
        9.4.1  Footnotes
10  Color
    10.1  LATEX Color
    10.2  Plain Color
    10.3  Limitations
11  HTML and output
    11.1  Formal HTML validation
    11.2  HTML Styles
12  Browser and Server Problems
    12.1  Accessing Symbol Fonts: Overview
    12.2  Accessing Symbol Fonts: Details
    12.3  Printing
    12.4  Netscape/Mozilla Composer
    12.5  Other Browser Bugs
    12.6  Web server problems
13  Code Critique
14  License
15  Acknowledgements
A  Appendix: Non-Standard TEX Macros
B  Appendix: Frequently Asked Questions
    B.1  Building and Running TTH
    B.2  [La]TeX constructs TTH does not seem to recognize
    B.3  HTML output that does not satisfy
    B.4  How to write TEX designed for Web publishing
    B.5  Formerly Frequently Asked Now Rarely Asked
Index

1  Capabilities

1.1  Plain TEX

1.1.1  Mathematics

Almost all of TEX's mathematics is supported with the exception of a few obscure symbols that are absent from the fonts normally available to browsers. Support includes, for example, in-line equations with subscripts and superscripts, display equations with built-up fractions, over accents, large delimiters, operators with limits; matrix, pmatrix, cases, [but not bordermatrix]; over/underbrace [but using a rule, not a brace].

1.1.2  Formatting and Macro Support

1.2  LATEX

LATEX support includes essentially all mathematics plus the following

1.2.1  Environments:

em, verbatim, center, flushright, verse, quotation, quote, itemize, enumerate, description, list [treated as if description], figure, table, tabular[*,x], equation, displaymath, eqnarray, math, array [not generally in in-line equations], thebibliography, [raw]html, index [as description], minipage [ignoring optional argument], longtable [but see 7.3].

1.2.2  LATEX Commands:

[re]newcommand, newenvironment, chapter, section, subsection, subsubsection, caption, label, ref, pageref [no number], emph, textit, texttt, textbf, centering, raggedleft, includegraphics, [e]psfig, title, author, date [maketitle ignored: title etc inserted when defined], lefteqn, frac, tableofcontents, input, include [as input, includeonly ignored], textcolor, color, footnote [ignoring optional arg], cite, bibitem, bibliography, tiny ... normalsize ... Huge, newcounter, setcounter, addtocounter, value [inside set or addto counter], arabic, the, stepcounter, newline, verb[*] [can't use @ as separator], bfseries, itshape, ttfamily, textsc, ensuremath, listoftables, listoffigures, newtheorem [no optional arguments permitted], today, printindex, boldmath, unboldmath, newfont, thanks, makeindex, index, @addtoreset, verbatiminput, paragraph, subparagraph, url, makebox, framebox, mbox, fbox, parbox [ignoring optional argument], definecolor, colorbox, fcolorbox [not in equations], pagecolor [discouraged], savebox, sbox, usebox.
These cover most of the vital LATEX constructs. Internal hypertext cross-references are automatically generated (e.g. by ref and tableofcontents) provided LATEX has previously been run on the document and the appropriate command-line switch is used.

1.3  Special TEX usage for TTH

A few non-standard TEX commands are supported as follows 3. See also 6.4.
\epsfbox{file.[e]ps} Puts in an anchor called "Figure" linked to 
    file.[e]ps (default), or alternatively calls user-supplied script 
    to convert the [e]ps file to a gif image and optionally inline it.  
\special{html:"tags"} inserts ``tags'' into the HTML e.g. for images etc.  
\href{reference}{anchor} highlights ``anchor'' with href=``reference''.
\url{URL} like \href but with URL providing both reference and anchor.
\begin{[raw]html} ... \end{[raw]html} environment passed direct to output.
\tthtensor Subscripts and superscripts immediately following, on simple
    characters, are stacked up in displaystyle equations, not staggered. 
\tthdump{...} The group is omitted by tth. Define \tthdump as a nop for TeX.
%%tth:... The rest of the comment line is passed to tth (not TeX) for parsing.

1.4  Unsupported Commands

When TTH encounters TEX constructs that it cannot handle either because there is no HTML equivalent, or because it is not clever enough, it tries to remove the mess they would otherwise cause in the HTML code, generally giving a warning of the action if it is not sure what it is doing. The following are not translated.
 \magnification \magstep etc : Removes the whole construct.
 Some boxes in equations.
 \raisebox, \lowerbox and similar usages.
 \accent, \mathaccent. 

2  Installation

The source for TTH is flex code which is processed to produce a C program tth.c which comprises the distribution. This file is compiled by
	gcc -o tth tth.c

or whatever C compiler you are using. Compilation takes typically less than a minute on a modern PC.
The executable should then be copied to whatever directory you want (preferably on your path of course). That's all!
Alternatively you may be able to obtain a precompiled executable from wherever you accessed this file. The Wind@ws executable comes with a batch file for installation. Just run it by the command "install".

3  Usage

Command line is as follows. The order of the parts is irrelevant. Switches (preceded by a minus sign -) can appear anywhere on the line. Square brackets should not be entered, they simply indicate optional parts of the command line.
Either filter style (the < and > are file redirection operators):
  tth [-a -c -d ... ] <file.tex [>file.html] [2>err]

Or, specifying the input file as an argument (output is then implied):
  tth [-a -c -d ... ] file[.tex] [2>err]

Switches:
   -a automatic picture environment conversion using latex2gif (default omit). 
   -c prefix header "Content-type: text/HTML" (for direct web serving).
   -d disable delimited definitions.
   -e? epsfbox handling: -e1 convert figure to gif using user-supplied ps2gif.
       -e2 convert and include inline. -e0 (default) no conversion, just ref. 
   -f? sets the depth of grouping to which fractions are constructed built-up
    f5 (default) allows five levels built-up, f0 none, f9 lots. 
   -g don't guess an HTML equivalent for font definitions, just remove.
   -h print help. -? print usage.
   -i use italic as default math font.
   -Lfile  tells tth the base file (no extension) for LaTeX auxiliary input, 
      enables LaTeX commands (e.g. \frac) without a \documentclass line.
   -n? HTML title format control. 0 raw. 1 expand macros. 2 expand equations.
   -ppath specify additional directories (path) to search for input files.
      -pNULL is a special switch that disables all \input or \includes.
   -r output raw HTML (no preamble or postlude) for inclusion in other HTML.
	-r2 omit just the time stamp. -r1 is equivalent to -r.
   -t display built-up items in textstyle equations (default in-line).
   -u? unicode character encoding. Default 2 (unicode 3.2). 0 (iso8859-1)
   -v give verbose commentary. 
   -w? html writing style: 0 no title construction. 1 use head/body. 2 XHTML.
   -y? equation style: bit 1 compress vertically; bit 2 inline overaccents.
   -xmakeindx  specify a non-standard makeindex command line.

With no arguments other than switches starting with a "-", the program is a filter, i.e. it reads from stdin and writes to stdout. In addition, diagnostic messages concerning its detection of unknown or untranslated constructs are sent to stderr. If these standard channels are not redirected using < and >, then the input is read from the command line, and both output and error messages are printed on the screen.
If a non-switch argument is present, it is assumed to be the name of the input file. The file must have extension ".tex" but the extension may be omitted. The output file is then constructed from the argument by removing the extension ".tex" if specified, and adding ".html".
TTH is extremely fast in default mode on any reasonable hardware. Conversion of even large TEX files should be a matter of a second or two. This makes it possible to use TTH in a CGI script to output HTML directly from TEX source if desired; (stderr may then need to be redirected.)

4  Messages

Messages about TTH's state and its assessment of the TEX it is translating are output always to the stderr stream, which normally displays on the console, but under Un*x type systems can be redirected to a file if necessary. Normally these messages are one of three types:

Error Messages

These start **** Error: and indicate some improper condition or error either in TTH or in the TEX of the file being translated. Some errors are fatal and cause TTH to stop. On others it will continue, but the TEX file probably should be corrected in order to get correct output.

Warnings

These start **** but without reporting Error. They are messages by which TTH indicates aspects of the translation process that may not be fully satisfactory, usually because of known limitations, but which quite likely will not prevent the translated file from displaying correctly, and so do not necessarily require intervention. Examples include the use of some dimensions, glue, or similar TEX commands that have no HTML equivalent.

Informational and external

Lines with no **** are either informational, meaning the state of the translation is not considered abnormal, or else they may come from external programs (e.g. makeindex), over which TTH has no control.
The switch -v causes more verbose messages to be output, which may be helpful for understanding why errors are reported. A higher level of verbosity -V can be invoked, but is intended primarily for internal debugging of TTH and will rarely be comprehensible!
The presumption that lies behind TTH message design is that the file being translated has been debugged using TEX or LATEX to remove syntax errors. TTH is not good at understanding or reporting TEX syntax errors and counts only the lines in the main TEX file, not those in files read by \input. Therefore error reporting by TTH does not reach even the low standard of clarity set by TEX and LATEX error messages. Although TEX files can be debugged using TTH alone, since it is very fast, the process is not recommended for inexpert TEX users. Moreover, since TTH understands both TEX and LATEX simultaneously, it can parse some files that TEX or LATEX  separately cannot.

5  Mathematics

5.1  Equations

Equations are translated internally into HTML. TTH uses HTML tables for layout of built-up fractions in display equations. It also uses the HTML tag < font face="symbol" > to render Greek and large delimiters etc. Untranslatable TEX math tokens are inserted verbatim.
The internal approach to equation translation is a major area where TTH departs from the philosophy of LATEX2html and its derivatives. TTH  does not use any images to try to represent hard-to-translate constructs like equations. Instead it uses the native ability of HTML to the fullest in providing a semantically correct rendering of the equation. The aesthetic qualities obtained are in practice no worse on average than LATEX2html's inlined images, which are generally slightly misaligned and of uncertain scaling relative to the text. Some limitations in the HTML code are inevitable, of course, but one ends up with a compact representation that can be rendered directly by the browser without the visitor having to download any additional helper code (e.g. Java equation renderer).
The option [-i]   to TTH makes italic the default font within equations, and thus the style more TEX-like. The italic font appearance in browsers is not as satisfactory as TEX's math italic, so for many documents roman looks better.
Spacing   in equations is handled slightly differently by TTH than by TEX. The reason is that most browsers use fonts that will crowd the characters horizontally too close for comfort in many cases (for example: M||/2). Also, built-up HTML equations are more spread out vertically than in TEX. Therefore TTH equations look better if spaces are added between some characters. So TTH  does not remove spaces in the original TEX file between characters in equations. The author is thus able to control this detail of layout in the HTML without messing up their TEX file - since TEX will ignore any spaces inserted. Legacy TEX code that contains a lot of spurious whitespace (ignored by TEX) may, as a result, occasionally become too spread-out when translated.

5.2  In-line Equation Limitations

Some TEX capabilities are extremely difficult or impossible to translate into HTML, because of browser limitations, and are best avoided if possible. Arrays or matrices or built-up fractions in in-line equations cannot be properly supported because tables cannot be placed in-line in HTML. TTH output often will be strangely disjointed. As an option, TTH provides switch -t to convert inline equations that use built-up constructs into a sort of half display, half in-line equation. Each time a new in-line equation is encountered [$ ... $ or \( ... \)] that needs to be built up, an HTML table that starts a new line is begun, and the text then flows on afterwards. For example
1
2 + n

This option gives a slightly strange layout for simple equations but is a big improvement for some situations, e.g. in-line matrices.
Likewise most over- and under-accents, and indeed anything that requires specific placement on the page other than simple subscript or superscript and underline, cannot be rendered in-line in plain HTML, although TTH will render them well in displaystyle. These latter constructs are nevertheless commonly used in in-line TEX. By default TTH renders these constructs in a relatively intuitive way. For example $\hat{a}$ is rendered [^a]. The result is rarely elegant but it is unambiguous.

5.3  Mathematics Layout Style Improvement using CSS

Some of the mathematics rendering limitations just mentioned can be overcome using Cascading Style Sheets. These are an extension of HTML that allows finer control over the layout. Most modern browsers support some fraction of the CSS specification, but historically its implementation has been slow and buggy. So it used to be awkward and dangerous to adopt CSS for authoring because of never knowing whether one was producing HTML that would simply be broken on a particular browser. Moreover using style sheets slows down the browser's rendering.
Vertical Compression   of the otherwise sometimes rather spread-out mathematics is implemented on TTH using a simple built-in style sheet to reduce unwanted vertical space. The implementation works around the different idiosyncrasies of the browsers' implementations as well as it can, and is designed to degrade gracefully in browsers without CSS support or with the support switched off. This compression can be controlled by the switch -y, which permits a numeric argument, e.g. -y1. Compression is on by default, which corresponds to the first bit of the -y switch value being 1 (in other words the value is odd). It may be switched off by using the switch -y with an even numeric argument (or none at all), e.g. -y2 or -y.
In-line over-accents   can be rendered explicitly using the relative positioning available in CSS2. The result is visually preferable to the the indicative base rendering. However, it does not fall-back gracefully, and the application of over accents to multiple-character groups produces a poorly aligned result. Nevertheless, since TTH version 3.87 it is used as default. It can be turned off using the -y switch with the second bit (2) zeroed in the numeric argument. For example -y1 or -y0 turns it off.

6  Features dependent on external programs.

6.1  Independence of [La]TEX installation and the -L switch.

A major difference between TTH and LaTeX2HTML is that TTH does not call the LATEX or tex programs at all by default, and is not specifically dependent upon these, or indeed any other (e.g. PERL), programs being installed on the translating system. Its portability is therefore virtually universal.
Forward references in LATEX are handled by multiple passes that write auxiliary files. TTH does only a single pass through the source. If you want TTH to use LATEX constructs (e.g. tableofcontents, bibliographic commands, etc.) that depend on auxiliary files, then you do need to run LATEX on the code so that these files are generated. Alternatively, the TTH switch -a causes TTH automatically to attempt to run latex on the file, if no auxiliary file .aux exists.
When run specifying a filename on the command line as a non-switch argument, TTH constructs the name of the expected auxiliary LATEX files in the usual way and looks for them in the same directory as the file. If you are using TTH as a filter, you must tell TTH, using the switch -Lfilename, the base file name of these auxiliary files (which is the name of the original file omitting the extension). If TTH cannot find the relevant auxiliary file because you didn't run LATEX and generate the files or didn't include the switch, then it will omit the construct and warn you. Forward references via ref will not work if the .aux file is unavailable, but backward references will. The -L switch with no filename may be used to tell TTH that the document being translated is to be interpreted as a LATEX file even though it lacks the usual LATEXheader commands. This may be useful for translating single equations that (unwisely) use the \frac command.

6.2  BibTeX bibliographies

TTH supports bibliographies that are created by hand using \begin{thebibliography} etc. Such bibliographies do not require anything beyond the .aux file. TTH also supports bibliographies created using BibTEX from a biblography database. The filename.bbl file is input at the correct place in the document. However, this filename.bbl is not created automatically by LATEX. In addition to running LATEX on the source file to create the auxiliary file, you must also execute bibtex filename in the same directory, to create the filename.bbl file, and then run LATEX again to get the references right. (This is, of course, no more than the standard procedure for using BibTEX with LATEX but it must be done if you want TTH to get your bibliography right). If you don't create the .bbl file, or if you create it somewhere else that TTH does not search, then naturally TTH won't find it. Since the BibTEX process is relatively tortuous, TTH offers an alternative. Using the -a switch with TTH will cause it to attempt to generate the required .bbl file automatically using BibTEX and LATEX.
There are many different styles for bibliographies and a large number of different LATEX extension packages has grown up to implement them, which TTH does not support. More recently, a significant rationalization of the situation has been achieved by the package natbib. TTH has rudimentary support built in for its commands \citep and citet in the default author-date form without a second optional argument. A style file for natbib is distributed with TTHgold which makes it possible to accommodate most of its more useful styles and commands and easily switch from author-date citation to numeric citation.

6.3  Indexing

TTH can make an extremely useful hyperlinked index using LATEX automatic indexing entries. But indexing an HTML document is different from indexing a printed document, because a printed index refers to page numbers, which have no meaning in HTML because there are no page breaks. TTH indexes LATEXdocuments by section number rather than by page; assuming, of course, that they have been prepared with index entries in the standard LATEX fashion.
When processing a LATEX file that contains the \makeindex command in its preamble, TTH will construct an appropriately cross-hyperlinked index that will be input when the command \printindex is encountered, which must be after all the index references \index{ ... } in the document. TTH does this independently of LATEX, but not of the subsidiary program makeindex that is normally used with LATEX to produce the final index. TTH creates its index entries in a file with extension .tid (Tth InDex). Unfortunately the standard form that makeindex expects for compound numbering of its sections or pages is "1-2", separated by a dash. TtH changes that to "1.2" using a point, and has to output a style file filename.mst , where filename is the base filename of the latex file being processed, to enable makeindex to handle this form. When the \printindex command is encountered, TTH closes the .tid file and runs the command
makeindex -o filename.tin filename.tid

on it. This creates an output file filename.tin, and then TTH reads that file in as its index. If, instead of creating an index file during TTH processing, one wants to use with TTH an index file already created, all that is needed is to remove the \makeindex command from the top of the LATEX source and copy the existing .ind file to a .tin file that will be input by \printindex. No indexing files will be written or deleted without a \makeindex command in the document.
The \makeindex command, if present, will also cause TTH to add a linked entry called "Index" to the end of any table of contents. This entry is a highly desirable feature for an HTML file, but if there is no \printindex command at the end of the document, the index will not exist, so the reference will be non-existent.
On some operating systems with file name length restrictions, the makeindex program is called makeindx. Therefore a TTH switch is provided: -xcommandline, which substitutes commandline for the default call makeindex. Therefore, -xmakeindx will switch to the correct program name on one of these limited operating systems. This switch also allows additional parameters or switches to be passed to makeindex. If the -xcommandline contains any spaces, then it is interpreted as the complete command-line (not just the first word of the command-line), in which the base filename may be referenced up to 3 times as "%s". For example -x"makeindex -s style.sty -o %s.tin %s.tid" will handle the index using a different style file "style.sty". If you don't have the makeindex program, you can't create indexes with TTH or LATEX, except by hand.
All of the index file processing naturally requires that TTH have write permission for the directory in which the original LATEX file (specified by the -L switch) resides.
Layout of the index   can be controlled with the switch -j with an immediately following argument that specifies the minimum number of lines in a column before the column will be terminated. Because index entries are usually short, books almost always adopt a two-column format for the index. TTH will also do so by default, but since an HTML document has no page breaks, the question arises how long the individual columns are allowed to be. The default (no switch) is equivalent to -j20. A switch -j with no argument is equivalent to specifying a very large number of lines, with the result that only one column is used. A switch -j1 will cause the columns to break at every indexspace, that is generally at every new letter, so letter lists will alternate between columns.

6.3.1  Glossaries.

LATEX has a parallel set of commands for glossary construction, replacing "index" with "glossary". However, there is no \printglossary command and the .glo file that LATEX produces cannot be handled by the makeindex program without a specific style file being defined. Therefore glossary entries are highly specialized and rarely used. TTH does not support a glossary separate from the index. Instead it simply defines the command as \def\glossary{\index} with the result that glossary entries are placed in the index. It may be necessary to add \makeindex and \printindex commands to make TTH handle the glossary entries for a file that has only a \makeglossary command.

6.4  Graphics Inclusion: epsfbox/includegraphics

The standard way in plain TEX to include a graphic is using the epsf macros. The work is done by \epsfbox{file.[e]ps} which TTH can parse. By default TTH produces a simple link to such a postscript file, or indeed any format file.
Optionally TTH can use a more appropriate graphics format, possibly using a user-supplied (script or) program called ps2png or ps2gif to convert the postscript file to a png4 or gif file, "file.png" or "file.gif". ["file" is the name of the original postscript file without the extension and png or gif are interchangeable as far as matters for this description]. When the switch -e1 or -e2 is specified, if "file.png", "file.gif" or "file.jpg" already exists in the same directory as implied by the reference to "file.ps" then no conversion is done and the file found is used instead. That graphics file is then automatically either linked (-e1) or inlined (-e2) in the document. If no such file is found, TTH tries to find a postscript file with extension that starts either .ps or .eps and convert it, first using ps2png then, if unsuccessful, ps2gif. Linux (un*x) ps2png and ps2gif scripts using Ghostscript and the netpbm utilities for this purpose are included with the distribution. A comparable batch program can be constructed to work under other operating systems 5 or else the conversion can be done by hand. Naturally you need these utility programs or their equivalent on your system to do the conversion. The calling command-line for whatever ps2png (or gif) is supplied must be of the form:
ps2png inputfile.ext outputfile.ext
The program must have permission to write the outputfile (file.png) in the directory in which the file.ps resides.
By popular request, a third graphics option -e3 for generating icons is now available. If no previously translated graphics file, e.g. "file.png" exists, TTH passes to ps2gif (or png) a third argument consisting of the name, "file_icon.gif", of an icon file. ps2gif is expected to create it from the same postscript file. In other words the call becomes
ps2gif file.eps file.gif file_icon.gif
This third argument is then the file that is inlined, while the larger gif file named "file.gif" is linked such that clicking on the icon displays the full-size gif file. The icon will not be created if "file.gif" already exists, because ps2gif will not then be called.
The LATEX2e command \includegraphics{...} and the older \[e]psfig{file=...} are treated the same as \epsfbox. Their optional arguments are ignored.
If the extension is omitted for the graphics file specification, then .ps or .eps is tried. If the extension of the file specified is non-null and not .ps or .eps, no conversion is done but the file is referenced or in-lined as an image. In effect, then, TTH supports postscript, encapsulated postscript, gif, and jpeg, plus any future formats that become supported by common browsers. However, LATEX does not support these other formats, so it will give an error message if it can't find a postscript file, unless you specify the bounding box, thus preventing LATEX interrogating the file.

6.5  Picture Environments

The picture environment cannot be translated to HTML. Pictures using the built-in LATEX commands must be converted to a graphics file such as a gif, and then included using \includegraphics, see 6.4. The switch -a, causes TTH to attempt automatic picture conversion using a user-supplied routine latex2gif. When this switch is used, TTH outputs the picture to a file picn.tex, where n is the number of the picture (if there does not already exist a file picn.gif). It then calls the command latex2gif picn which must be a command (e.g. a script using LATEX, dvips, etc.) on the system, which converts the file picn.tex to a file picn.gif. An example linux script is included in the distribution but this conversion script is dependent on the system and so is entirely the user's responsibility. For viewing the results, the files picn.gif must be accessible to the browser in the same directory as the HTML files, then they will be included in-line. It is impossible for a picture environment to be converted in this automatic fashion if it contains macros defined somewhere else in the original LATEX file, because the macros will then be undefined in the picture file that is extracted, and LATEX will be stumped. In that case, manual intervention is necessary.

7  Tabular Environment or Halign for Tables

The tabular environment is the recommended way to construct tables in LATEX. In plain TEX, although \settabs etc. is supported, the \halign{ ... } command is recommended. (The LATEX tabbing environment is not supported by TTH because it is antithetical to the spirit of HTML document description, and because it is an extremely complicated construct. If you are lucky, TTH will not mess up your tabbing environment too much, but it makes no attempt to interpret it properly.) Considerable effort has been expended to translate the tabular environment, including interpreting the alignment argument of the environment, into as near an equivalent in HTML as reasonably achievable6. However, the limitations of HTML tables impose the following limitations on the translation.

7.1  Tabular

7.2  Halign

7.3  Longtables

8  Boxes, Dimensions, and fills

Boxes, dimensions, and fills are rarely appropriate for web documents because they imply an attempt to control the fine details of layout. Browsers make their own choices about layout of a document in HTML. For example they make the lines fit whatever size of window happens to be present. This dynamic formatting makes mincemeat of most detailed TEX layout. In fact, if you want your readers to see exactly what you see, that is impossible with HTML, and you should use some other representation of your document.
There are nevertheless many cases when a TEX document containing boxes, dimensions, and fills needs to be translated. Limited translation of these constructs is supported. They are translated, where appropriate and possible, into HTML tables with widths and vertical skips estimated to give a reasonable result on a browser. It must be stressed that accurate translation is inherently impossible because browsers deal in pixel sizes and default font sizes that vary and are out of the control of the publisher.
The types of box usage that translate quite well are when things like
\hbox to \hsize{The left \hfil the Right}
\vbox{\hsize=2in Matter to be set in horizontal mode to a 
  limited hsize}
\makebox[0.6\hsize][r]{Stuff to the right of the makebox.}
\framebox{check}

are on a line by themselves. You get:
The left the Right

Matter to be set in horizontal mode to a limited hsize
Stuff to the right of the makebox.

check
Usages that translate poorly tend to be boxes within a line of text. That is because current HTML table implementations have to start a new line unless they happen to be adjacent to a table already. Thus an hbox in a line will give a line break that you might not have wanted. This behaviour is really a bug in the browsers, but we are currently stuck with it. The behaviour of HTML tables is buggy [see 12.5] when their alignment is specified, which means that strange results are likely if more than one box on a line is being set. Boxes in equations are troublesome. The only type that is reasonably supported is \mbox which is often used in LATEX for introducing text inside equations.
Negative skips are not supported at all.
The only important dimension parameter that is currently interpreted is \hsize. It is what controls the width of a vbox. It can be reset using the plain TEX format e.g. \hsize = 3in or scaled or advanced e.g. \hsize=0.6\hsize but only within a group. It makes no sense for the HTML file to try to specify the width of the line at the outermost level. That is the browser's business.
New dimensions can be defined, set, advanced, scaled and used to set other dimensions including \hsize.
TTH trys valiantly to mimic the sort of text alignment that is obtained using glue such as \hfil and \hss, provided it is inside a box. However, the alignment algorithm of HTML tables makes it impossible to obtain fills with exactly equal sizes. So don't be surprised if some results looks disagreeable. Moreover, TTH will completely ignore the glue outside an hbox, and it doesn't know the difference between \hfil and \hfill, etc.

9  TEX command definitions and other extensions

9.1  Delimited-parameter macros and Conditionals

Delimited parameter definitions are fully supported. However, macros in some style files are written in such a way that the recognition of the delimited parameter depends on other TEX behaviour (e.g. dimensions) that are not supported or handled differently by TTH. In such cases it is all too possible for the delimited parameter not to be matched, resulting in a runaway argument situation. Thus, delimited parameter macros are especially dangerous when using TTH, or indeed any process other than TEX itself. (And they are never exactly "safe TEX"). The recognition of these definitions can be disabled using the -d switch, in which case the definitions are simply discarded.
Conditionals such as \if, \ifnum and so on are supported, as listed above (1.1.2). In TTH they have one syntax limitation. Further `if' commands are not permitted in, or as part of a command expanded in, the tokens, characters, or numbers being tested. Thus, an example of truly perverse usage such as
\ifnum 1=\if ab 1\else 2\fi  True \else False \fi
will likely break. Nested `if' constructs are permitted in the conditional text, however, so
\ifnum 1=1 True\if ab -true\else -false\fi \else False \fi
is fine. Because TTH does not internally resemble TEX, whereas the result of conditionals such as \if and \ifx may depend on internal representations, there cannot be 100% compatibility of such tests at the lowest level. Still, tests on externally defined commands ought generally to give correct results. When authoring documents in TEX one is generally well advised to avoid conditionals.
Although TTH supports a remarkably complete subset of LATEX, it does not support all of the complicated primitive details of TEX, partly because that would be unnecessary. For example, practically any TEX that redefines category codes (other than @ which TTH treats universally as a letter) will break because TTH knows nothing about the concept of category codes. (If you don't know much either, about this unfortunate aspect of TEX, join the vast majority of TEX users!) A related example is that TTH expects only letters or @ in user-defined command names, not punctuation characters etc.

9.2  Macro- and Style-file inclusion

Macro definitions are fully supported by TTH. However, special macro packages designed for a specific layout of journal or conference, for example, often use unsupported constructs such as catcode changes. It may then be inadvisable to use the macro package. TTH does not recognize the \usepackage command by default because the LATEX macros that are input by this command almost always contain catcode changes or other usages incompatible with TTH. That is another reason why TTH does not normally have directory paths defined the same as TEX. If a macro package is on the TEXINPUTS path it will be found by TEX but not by TTH. Thus, the macro definitions are included when "TEX"ing the file, but not when "TTH"ing it. It should be clear from this discussion, however, that TTH generally does not support any of the enormous number of extensions to LATEX unless they are mentioned in this manual, because most extension packages are incompatible with TTH.
TTH will find an input file if
  1. the full path is specified relative to the directory from which TTH is run, e.g.
    \input /home/myhome/mytexdir/mymacro.tex
  2. the -p switch specifies a path on which the file is found, or
  3. the TTHINPUTS environment variable is defined to be a path on which the file is found.
Paths are searched in this order until an appropriate file is found or all directory options are exhausted
This policy provides a mechanism for making available the alternative package for TTH, without alteration of the original TEX files, by placing the (simplified) version of the macro package on the path TTH searches. An example using the -p switch might be
tth >file.html <file.tex -p/usr/local/tthinputs:~/mytthinputs

Since it is impossible to anticipate all style file incompatibilities, it must be the responsibility of the user (or the journal) to decide how to translate the concepts implemented in the original complicated macro package into simpler, TTH-compatible, TEX macros.
When TTH is used within a CGI script accepting arbitrary TEX for translation, its ability to input any file on the system is a serious security hole. It can be used to view all sorts of files on the system by \inputing them. Therefore a special switch -pNULL is provided that disables all \input or \include files.

9.3  Layout to include arguments of unknown commands

Unrecognized or undefined commands of the form \dothis{one}{two}{three}, are treated by discarding all the following adjacent brace groups. A space between the close and open braces will terminate the discarded arguments and cause the following brace group(s) to be scanned as if just the text. This makes it possible to use formatting to make TEX code come out right in both TEX and HTML. For example if TTH encounters a command written "\boxthis{width} {boxed material}" which might be designed in TEX to provide a width to a defined command, written with a space after the first argument, it will ignore the width and scan the boxed material into the text.

9.4  Restrictions on redefinition of internal commands

In TTH (unlike TEX) most internal commands can not normally be redefined; any redefinition will simply be ignored (except inside edef and a few other places). This prevents TTH from safely allowing use of major packages that redefine standard TEX commands. For example amsTEX redefines footnote to have just one argument, which will cause problems. This particular example is potentially a problem with LATEX too, which also redefines footnote. TTH handles this by keeping track of whether the file is LATEX or TEX; therefore you should not mix the two dialects in a single file even though there is no need to tell TTH explicitly which type the file is. (Besides, a mixed file will play havoc with TEX itself.)

9.4.1  Footnotes

Footnotes are placed together at the end of the document, or, in the case of TTHgold splitting files, in a separate file called footnote.html. The title of this end section is determined by the macro \tthfootnotes. By default this is "Footnotes", but can be redefined by the user at will, e.g. by \def\tthfootnotes{Tailnotes}.

10  Color

TTH supports the coloring of text using the color package macros for LATEX, supported by dvips (but not xdvi). TTH also supports the Plain TEX colordvi macros contained in the package colordvi.tex that do the same thing.

10.1  LATEX Color

The LATEX syntax is recommended because the 68 standard named colors7 are directly supported internally by TTH using the named model. Any numerical CMYK, RGB and Gray color can also be prescribed. For example the following commands are enclosed in themselves: \textcolor[named]{BrickRed}{...}, \textcolor[rgb]{0.,.5,0.}{...}, \textcolor[cmyk]{0.,.5,0.,0.3}{...}. You can define custom colors in the usual way using, for example
{\definecolor{Puce}{rgb}{1.,.5,.8}
\color{Puce} This is my own Puce.}

Which gives " This is my own Puce."
The command \pagecolor is supported but discouraged. It is highly likely to give rise to an HTML file that will fail validation because it inserts an HTML tag <body bgcolor=...> which will not be in its correct position (immediately following the title). The only way to be certain to produce an HTML file that passes validation is to put the title and body commands in by hand, using e.g. \special{html:<title>...</title><body ...>} Netscape seems not to mind a body tag out of order, but only the first one is able to set the page background color.
The commands \colorbox and \fcolorbox are supported via CSS style sheet commands. They will only work to set the background color of included text if the browser is set to use style sheets. "This sentence" is the result of the command \colorbox{green}{``This sentence''}. If it is colored, then your browser supports style sheets to this extent. If not, check your preferences settings.

10.2  Plain Color

The Plain TEX syntax using commands such as \Red{red text} requires the file colordvi.tex to be input prior to their use. But because TTH does not search the standard TEX paths, that file will not usually be found unless the full path is explicitly specified. If the file is not found, only the 8 standard colors
\Red, \Green, \Blue, \Cyan, \Magenta, \Yellow, \Black, and \White

are recognized internally by TTH. You can use the user-defined CMYK numeric style
\Color{0. .5 .5 0.}{pale red}

without the colordvi file. It gives the result "pale red" but the notation becomes cumbersome unless you define your color e.g. like
\def\redcolor{0. .5 .5 0.}
\Color{\redcolor}{The stuff that is red.}

Another difficulty with the colordvi command \textColor (which is the color switch - LATEX syntax reversed that usage and changed to comma-delimited arguments just to confuse us) is that it is a global setting. It then becomes almost impossible to maintain proper nesting of the closure of the font commands used for colors in HTML. As a result, use of \textColor often gives HTML files that won't pass HTML validation.

10.3  Limitations

Color commands do not propagate into different cells of HTML tables because of what may be regarded as a browser bug [12.5]. For that reason, tables and equations will not color correctly if the color commands enclose more than one cell (for tables) or equation element. Remember also that some computers may be limited in their color display capability, so the subtleties of colors will be lost in some circumstances.

11  HTML and output

11.1  Formal HTML validation

TTH takes as its standard HTML that can be rendered by Netscape and IE browsers versions 4 and higher (with the caveats above). The formal standard that TTH-translated documents follow is strictly HTML4.0[1]8 Transitional. However, TTH does not formally validate its documents, and can be made to violate the standard by some TEX usage.
One reason for violation arises because HTML4.0 requires a <title>...</title> for every document. A title is constructed from LATEX files that contain the \title{...} command, in which case HTML conformance is ensured by putting the \title command before any text (i.e. in the preamble, where it belongs). If the \title command is not desired in the TEX file, for example because it is a plain TEX document, a title can be provided by the author for the HTML document by putting a line like this at the top of the TEX file.
%%tth:\begin{html}<title>Put the title here</title>\end{html}

This line will be ignored by TEX. Actually, any raw HTML output at the start of the file is assumed by TTH to indicate that the author has explicitly output a title. If no title indication of any of the above types is present, TTH attempts to construct a title from the first few plain words in the document, in much the way that the first line can become the title of a hymn.
If commands like \item, that output material to the HTML file occur before the title has been constructed, the HTML title command will be out of order and the formal standard will be violated.
In the case where the title construction fails, or if some other TEX usage causes a violation of the formal standard, browsers will still render the output correctly if this manual is followed.

11.2  HTML Styles

There are good reasons why the <head> and <body> tags are by default omitted by TTH. See the FAQ [B.3] for a brief discussion. However, the evolution of HTML standards (not yet browsers) is towards imposing more restrictions on the freedom to omit tags. For example XHTML requires that containers have both opening and closing tags. Therefore TTH has a switch -w? (where the question mark denotes an optional integer) that controls its writing style as follows.
Default
Construct title. Do not enter head and body tags.
-w -w0
Do not construct title or enter head/body tags.
-w1
Enter head and body tags assuming that the title is the dividing point.
-w2
Use XHTML syntax.
-w4
Use block level font size commands between paragraphs.
At present, in addition to the default style that attempts to construct a title but does not enter head and body tags, -w or equivalently -w0 prevents TTH from attempting to construct a title or anything else in the way of head/body divisions. This style is best used for documents where the author has explicitly entered the required HTML tags. The switch -w1 invokes pedantic HTML style which enters head and body tags under the assumption that the title (possibly constructed automatically) is the last thing in the head section. A style -w2 produces XHTML documents but requires cascading style sheet (CSS) support in the browser otherwise the rendering will not be as satisfactory as the default.
Addition of four to the writing style index (e.g. -w4) makes TTH employ block-level font size commands if the size is changed immediately after a \par or implied paragraph. An additional CSS style sheet is inserted and, of course, the browser must support CSS. The purpose of this writing style is to accommodate tables and equations inside sections of larger or smaller text in a manner that will pass standards validation. According to the standard, HTML font changing commands like most others, are either of inline type, in which case they are forbidden to contain block level constructs like tables, or block type, in which case they force a new line and so can't be used within a paragraph. The -w4 switch can't universally fix this unnecessarily restrictive requirement of the standard (which most browsers wisely do not honor). There are situations where TEX usage is simply impossible to express in HTML. However, the -w4 switch does fix the vast majority of sensible usages.

12  Browser and Server Problems

TTH translates TEX into standard HTML and takes account as far as possible of the idiosyncrasies of the major browsers. Nevertheless, there are several problems that are associated with the browsers, and a few that are associated with web servers. Authors and publishers should recognize that these are not TTH bugs. Font-related problems are complicated. If you don't need all the gory details, you might want to read section 12.1 and then skip to .

12.1  Accessing Symbol Fonts: Overview

Many of the most serious difficulties of Mathematics rendering in HTML are associated with the need for extra symbols. In addition to various Greek letters and mathematical operators, one needs access to the glyphs used to build up from parts the large brackets matching the height of built-up fractions. These symbols are almost universally present on systems with graphical browsers, which all have a "Symbol" font, generally based on that made freely available by Adobe. The problem lies in accessing the font because of shortcomings in the browsers and the HTML standards that relate to font use.
In brief, there are three ways to access the symbol fonts; these will be described in more detail below. The following table indicates which of these approaches to accessing the symbol fonts works with which browser. It also outlines which of the mathematics rendering improvements via CSS positioning are satisfactory.
Symbol Encoding CSS Positioning
8-bit numeric Adobe Private Unicode 3.2 relative height compress
TTH switch -u0-u1-u2 -y2-y1
Browser:
MSIE 5.0 Yes No No Yes Buggy
Mozilla 1.x X Alias/Font Buggy Buggy Yes Yes
Firefox 1.x X Alias/Font Buggy Buggy Yes Yes
Firefox 1.x Win Yes Buggy Buggy Yes Yes
Konqueror 1.9.8Alias No No Yes Yes
Firefox 3.5 X No Buggy Ugly Yes Yes
Chrome 4.0 X No Buggy Ugly Yes Yes
Firefox 3.5 Win Yes No Buggy Yes Yes
MSIE 8.0 Win Yes No Ugly Yes Yes
This situation is painful. The 8-bit numeric style symbol access method, which was the approach originally pioneered by TTH, used to work with a significant number of browsers but needed additional font settings for X-window systems. This is the approach that TTH used to use by default. However Mozilla and Firefox have systematically moved towards disabling this method under linux and OSX, presumably because they consider it not standards-compliant. They have not properly implemented the unicode 3.2 alternative, because the glyphs they use for built-up delimiters are incorrectly sized and leave ugly gaps. In some cases the spacing is completely erroneous. One is left with the choice between the traditional 8-bit approach, which works well with all MSWindows systems up to Vista, but does not work with most recent X-based operating systems; or Unicode 3.2 which works with most browsers, but is badly buggy in Windows Firefox and ugly everywhere.
In the interests of an eventual rationalization of this situation, TtH has changed to make the Unicode 3.2 coding its default from the 2010 version 3.87 on, but this by no means universally satisfactory.

12.2  Accessing Symbol Fonts: Details

Prior to HTML4.0, that is, during the major phase of the evolution of HTML, the default encoding for HTML documents was ISO-8859-1 (sometimes called ISO Latin-1). The document encoding defines a mapping between the bytes of the file itself and characters. The HTML4.0 standard draws a strict (but often confused) distinction between the document "character set", sometimes referred to more recently as the character "repertoire"(which refers to all the characters that might be used in it) and the "document encoding" (which encodes a subset of the character set by mapping them to bytes). The confusion is compounded by the entrenched usage of the term "charset" to refer to the "document encoding" (not the character set). This usage is presumably a reflection of the prior lack of any significant distinction between the two.
Purists since the adoption of HMTL4.0 regard the selection of a glyph as governed by the process: (byte) code ®glyph-name ® font-glyph. In this view, even though the font contains the glyphs in a well defined order, the glyph is accessed not by its position in the font but by its name. For example, in a document with ISO-8859-1 encoding, the byte with decimal value 97 maps to the "latin small letter a" which is accessed from the font on that basis. On this view, it is not possible, or rather ought not to be possible, to access the Greek letter alpha by specifying that the font is Symbol and the byte coding decimal value is 97, despite the fact that the Greek alpha is indeed in the same position in the Symbol font as the lower case a in its font. This is because (the story goes) 97 means "latin small letter a" and the Symbol font simply does not contain the latin small letter a.
In practice, of course, most browsers, including Internet Explorer (to 8.x), have not taken so pedantic an approach. In a document that is encoded in the same order as the fonts on the system, as is the case for ISO-8859 on systems other than the (old) MacIntosh, the browser maps code to glyph directly on the basis of numeric position in the font. Therefore it is perfectly sensible to specify eight-bit code 97 and Symbol font to obtain alpha. In other words, the browsers treat the Symbol font as if it were an ISO-8859 font even though, as far as the glyph names are concerned, it is not. It can be argued, even within the world-view of standards lawyers, that a document that does not explicitly specify its encoding (and TTH documents do not) could be considered to obey its own font encoding or some unspecified encoding, in which case, bytes ought to be permitted to refer directly to numeric font positions, in just this fashion, regardless of whether the font is identified as ISO 8859. But such arguments are usually a waste of breath. In any case, recent versions of Mozilla and its derivatives on the Windows operating system will properly render symbols provided they are told that the DOCTYPE is HTML 4.0, not HTML 4.01. This is the reason why TTH has reverted to giving its documents this rather out of date DOCTYPE.
On the X-windows system, a distinction between fonts is provided directly in the system via the font naming conventions. Mozilla takes notice of this font allocation by permitting access only to fonts whose names end 8859-1, for default encoded documents. The symbol font is not one of those fonts unless additional steps are taken. The enabling of the symbol font requires specification of some system font aliases, or installation of a specially encoded Symbol font, which then ensures that the Symbol font is treated as if it were ISO-8859-1 encoded. Notice that this type of problem arises for any document that wants to access more than one language of font. Thus, any document desiring a mixture of, for example, western and cyrilic characters would face the same problem.
To summarise, the symbol font is present on practically every computer on the planet that runs a graphical browser. Under the MSWindows operating system, IE to version 8.x, and Mozilla (gecko)-based browsers treat the symbol font as if it were a numerically encoded font and compatible with ISO 8859-1 encoding, provided the DOCTYPE is HTML 4.0 Transitional. Treating the font as such enables the glyphs to be accessed using either eight-bit codes in just the same way as standard ASCII characters. This is the way that documents have accessed these glyphs for years.
The HTML4.01 standard says that unicode (ISO 10646, also called UCS) is the character set of HTML, and that the way characters outside the current document encoding should be accessed is through unicode points. Unicode is backwardly compatible with ISO 8859-1 in a way that we need not dwell on. Unicode is supposed to fix all the font problems that are described here, and with luck eventually it will indeed help. The problem is that (1) Unicode is enormous, so only a tiny fraction of it is so far supported, and (2) in its original incarnation unicode does not even assign points to the parts of large delimiters that are needed for mathematics. They are present in the new version of unicode, 3.2, becoming current. However, as the table above shows, no browser cleanly supports the new unicode assignments. Mozilla used to support some assignments of points in unicode's designated "private usage area" to the glyphs we need. Apparently these assignments have become de-facto standards for the Adobe Symbol font in typographic circles. No other browser supports them. They are not and, according to unicode principles, never will be part of the unicode standard, and appear to be on the way out.
The option that mathematics web publishing currently has, then, is either an approach that works with Windows browsers but which purists say is not consistent with latest standards, or a representation that is consistent with the standard but useless with some browsers. It would be really nice if the browsers would get their act together on mathematical symbols.

12.3  Printing

In many browsers, the printing fonts are hard coded into the browser and the font-changing commands are ignored when printing. For that reason, visitors viewing TTH documents will often not be able to print readable versions of documents with lots of mathematics. This problem could, and should, be fixed in the browsers. However, if you want your readers to be able to print a high-quality paper copy of the file, then you probably want to make available to them either the TEX source or a common page-description format such as Postscript or PDF. Since HTML documents download and display so much faster and better than these other formats on the screen, TTH's translation provides the natural medium for people to browse, but not necessarily the best medium for paper production.

12.4  Netscape/Mozilla Composer

Netscape Composer and Mozilla Composer is too clever for its own good. If you run an HTML document produced by TTH through Netscape Composer, all sorts of internal translations are performed that are detrimental to its eventual display. For example, if you subsequently save the document with the usual encoding set (Western), the eightbit codes that work with Macs are replaced with HTML4.0 entities such as [&]ograve; or [&]pound;. This effectively breaks the document for viewing on Macs because it undoes everything just explained. Even if you use User-Defined encoding, which prevents this particular substitution, Composer will rearrange the document in various ways that it thinks are better, but that make the display of the document worse. The moral is, don't run TTH documents through Netscape Composer. You therefore cannot use the "publish" facility of Composer. Transfering the document to the server with plain old ftp will keep it away from Composer's clutches.

12.5  Other Browser Bugs

Font changing commands do not propagate from cell to cell of HTML tables. In rendering equations (using tables) TTH circumvents this bug (excuse me, feature) at the cost of significant extra effort and slightly verbose HTML. However, for tables generated by \halign or \begin{tabular} TTH takes no special steps to avoid this problem. A change of font face in a cell, for example by \it will not carry over to the next cell. A document containing this problem will not pass some HTML validations. It is prevented if every cell of a TEX table is enclosed in braces and the required style applied separately to every cell - a serious annoyance.
Tables are incapable of being properly embedded within a line of text. They generally force a new line. This is quite a significant handicap when translating in-line material that could use a table. It can be argued that this behaviour is required by the HTML standard. Specifically, the <p> element is defined as having in-line attributes which prevent it from containing any elements defined as being block type, of which <table> or actually strictly <td> is one. However, even if you ensure that text is not inside a <p>, most browsers force a new line.

12.6  Web server problems

The HTML files that TTH produces are encoded using the charset ISO-8859-1, like most web files. In newer linux systems the default file encoding on the computer is in many cases now UTF-8. For the characters with codes above 128, this can cause problems with the web server. The web server may wrongly assume that the HTML file is a UTF-8-encoded file, and declare this assumption in the http content-type header that it sends to browsers when they access the file. For gecko-based browsers, the http content-type declaration overrides any internal file declaration of the encoding of the file. Consequently, the browser treats this file as if it is UTF-8 encoded, with the result that codes higher than 128 are misinterpreted. This is an inadequacy in the web server (apache is known to behave this way in some situations).
There are several options to work around this problem.
It is possible to convert all files from ISO-8859-1 to UTF-8 encoding, using a utility called iconv, present on most modern linux installations. This is not an attractive solution because then when the files are browsed locally (via file://...) they will display incorrectly. Locally, the browser does not have the http content-type declaration to guide (or misguide) it, and it thinks the files are ISO-8859-1 encoded. But if they've been converted, they are not.
The better approach seems to be to fix the web server so that it gets the file content-type right. This can be done on a per-directory basis by creating a file called .htaccess in the directory. This file should contain the line:
  AddType text/html;charset=ISO-8859-1 html

This tells the server that all files in this directory and its subdirectories that have extension html are to be considered of type HTML and encoded with the ISO-8859-1 charset.
Unfortunately some web servers are configured not to pay attention to the .htaccess file. If yours is one, you have to get the web master to edit the server configuration file (/etc/httpd/conf/httpd.conf). The lines that read AllowOverride None must read instead AllowOverride FileInfo. Alternatively, get the webmaster to change the line in that configuration file that reads AddDefaultCharset UTF-8 to read instead
AddDefaultCharset ISO-8859-1

and once the server is restarted all your troubles will be over without any of those pesky .htaccess files.
There are other ways of accomplishing the same thing in the web server, if you are a guru. Information is available at the W3C FAQ.

13  Code Critique

If you think you have found a bug, you can report it to tth(at)hutchinson.belmont.ma.us (with the usual character substituted in the email address). You are most likely to get help if your report is accompanied by the brief section of TEX code that causes the problem. Let me repeat, in addition to a brief description of the problem, send the TEX code, preferably a short section isolating the problem, in a document that can be processed by TEX. It is the only way for me to establish what the problem is. But please don't send LATEX2.09 files or files that do not conform to the (1994) LATEX users' guide. And please check this TTH manual and especially the FAQ (B) first.
The code has been compiled and run on Linux, MSDOS, Wind*ws, Open VMS, and sundry other operating systems. See http://hutchinson.belmont.ma.us/tth/platform.html.

14  License

TTH is copyright © Ian Hutchinson, 1997-2010.
You may freely use TTH. If you distribute any copies, you must include this file and these conditions must apply to the recipient. No warranty of fitness for any purpose whatever is given, intended, or implied. You use this software entirely at your own risk. If you choose to use TTH, or TTHgold, by your actions you acknowledge that any direct or consequential damage whatever is your responsibility, not mine.

15  Acknowledgements

Many thanks for useful discussions and input to Robert Curtis, Ken Yap, Paul Gomme, Michael Sanders, Michael Patra, Bryan Anderson, Wolfram Gloger, Ray Mines, John Murdie, David Johnson, Jonathan Barron, Michael Hirsch, Jon Nimmo, Alan Flavell, Ron Kumon, Magne Rudshaug, Rick Mabry, Andrew Trevorrow, Guy Albertelli II, Steve Schaefer and for bug reports from others too numerous to mention.

A  Appendix: Non-Standard TEX Macros

The following macro definitions, although not needed for TTH, will enable a TEX file that uses the non-standard TTH commands to be correctly parsed by Plain TEX.
\def\hyperlink#1#2{\special{html:<a href="\##1">}#2\special{html:</a>}}
  % Incorrect link name in \TeX\ because # can't be passed properly to a special.
\def\hypertarget#1#2{\special{html:<a name="#1">}#2\special{html:</a>}}
\long\def\tthdump#1{#1} % Do nothing. The following are not done for TtH.
\tthdump{%
\def\title#1{\bgroup\leftskip 0 pt plus1fill \rightskip 0 pt plus1fill
\pretolerance=100000 \lefthyphenmin=20 \righthyphenmin=20
\noindent #1 \par\egroup}% Centers a possibly multi-line title.
 \let\author=\title % Actually smaller font than title in \LaTeX.
 \input epsf     % PD package defines \epsfbox for figure inclusion
  % Macro for http reference inclusion, per hypertex.
 \def\href#1#2{\special{html:<a href="#1">}#2\special{html:</a>}}
 \def\urlend#1{#1\endgroup}
 \def\url{\begingroup \tt 
  \catcode`\_=13 % Don't know why this works.
  \catcode`\~=11 \catcode`\#=11 \catcode`\^=11 
  \catcode`\$=11 \catcode`\&=11 \catcode`\%=11
\urlend}% \url for plain \TeX.
}

B  Appendix: Frequently Asked Questions

B.1  Building and Running TTH

Why does my compiler crash when compiling TTH?  
TTH comes in the form of a single C source file because it is mostly one very large function which is produced by flex. It is completely standard C code but the size challenges compilers' capabilities, especially if you try to optimize using the -O switch. With gcc under linux it is possible to compile an optimized version, but optimization hardly affects the speed and reduces the disk size of the already modest executable only by about 20%. Therefore it is no significant loss to compile without optimization. Under DOS, even unoptimized compilation can cause DJGPP to crash if its stack size is less than about 1024k. The fix (using stubedit on cc1.exe) for this DJGPP bug is described in its FAQ.
Why does my TTH executable, which I compiled myself, crash?  
Assuming that this is not a problem caused by invalid TEX, or by you poking around inside the C code, it is probably a compiler shortcoming. Some default settings of some compilers give TTH too little stack space and cause it to crash. Most self-respecting compilers have switches or settings to increase that space. Try increasing it, or get one of the binary distributions.
Why won't TTH run from Program Manager in Wind*ws?  
You need a command line. Call up the DOS prompt. If you feel the need for a drag and drop facility, get TTHgold.

B.2  [La]TeX constructs TTH does not seem to recognize

TTH does not recognize tableofcontents, backward references, listoffigures, ...  
Yes it does, see section 6.1, and use the -L switch.
TTH does not insert my picture environments.  
If picture environment pictures are to be included, conversion to a gif file is needed. See 6.5.
TTH messes up my tabbing environment.  
Tabbing is not currently supported. It is alien to the HTML document mark-up approach. See section 7.
Why doesn't \frac work in equations?  
It does, but only in LATEX documents because \frac is not a plain TEX command. The document you are presenting to TTH doubtless has no \documentclass command and other LATEX blurb at the top. If you insist on having LATEX commands available in such a document, you can use the -L switch. But note that other changes in interpretation (e.g. in footnotes) are implied by using this switch to tell TTH that this is a LATEX file.
Why does TTH not recognize ... command from ... style package?  
Let's be perfectly clear here. TTH does not currently recognize \usepackage and, with the exception of commands explicitly mentioned in this manual, does not support any of the zillions of extensions to LATEX that exist, even if they are part of the "standard distribution". TTH does support macro definitions, see section 9.2, and you might find that if you explicitly \input the style file that you need it will recognize the macro. However, many LATEX extension packages are written in a complicated manner such that they depend on changes in catcodes, which TTH does not support. Therefore no guarantee can be given. This is one reason why TTH deliberately does not recognize \usepackage.
Why does TTH not recognize my ends of lines properly?  
If you transfer a file from one operating system to another as a binary file, the line-end codes are likely to be messed up. They use different codes on Un*x, DOS, and Mac. Usually TEX is not bothered by this. TTH is somewhat more sensitive. Use ASCII transfer.
Why does TTH complain about my skip, space, ... command?  
Dimensions are often inappropriate for HTML. TTH tries do something sensible with dimension, space, and glue commands. Usually it is successful. If so, you need do nothing. In some rare cases, you might see some irrelevant left-over characters from the dimension command that have to be removed by hand.
Can TTH be made to support BibTEX bibliographies?  
It already does; see 6.2. If TTH is not finding the .bbl file even though you used the -L switch, then you probably forgot to generate it using LATEX and BibTEX, or perhaps it is in the wrong place. Try using the TTH switch -a.
Does TTH support ...?  
Probably yes if it is part of LATEX. But if you want a specific additional capability, and find that it is not supported, why not write a TeX macro to support it and translate it into suitable HTML using the functions described in this manual. Then you will have your support and if you send it to tth(at)hutchinson.belmont.ma.us (with the usual character substituted in the email address), it may be possible to include it into the standard TTH executable and you'll have helped all the other users of TTH.

B.3  HTML output that does not satisfy

Why doesn't TTH automatically generate   <head> and <body> HTML tags?
First, the <head> and <body> tags are optional in the HTML specification. There is no need for TTH to generate them to statisfy the standard. Second, TEX and LATEX files do not have a corresponding structural division into separate head and body sections. It might seem as if LATEX does, with \begin{document} being the divider, but there are many cases where this mapping is incorrect. For example title may not be defined until after \begin{document}, corresponding to the HTML body section, whereas it must be in the head section. Finally, if TTH automatically entered <head> and <body> tags, then the thoughtful author would not be able to enter them where they ought to be by using, for example:
%%tth: \begin{html} <head> \end{html}
Therefore, the choice not to produce these tags automatically is a deliberate one based on a careful consideration of the advantages and disadvantages. An author can always adjust their TEX code to include them, if they wish to be pedantic about the division. See also the section on HTML style [11.2].
Why don't TEX commands get expanded in the HTML title?  
In HTML, the stuff that goes in the <title>...</title> of a page is not permitted by the specification to contain HTML tags - things in angle brackets - and tags are not interpreted. If an equation or some other command that TTH translates into HTML formatting is in the title, then the title will break when expanded. Therefore TTH deals with commands differently in the title. By default it leaves them in the TEX form that they started as, since that is about as easy to read as any unformatted mathematics. Using the -n? switch enables control of the precise behaviour. See 3.
How do I make TTH border my tabular table?  
TTH looks in the format string argument of the begin{tabular} environment and if it begins with a | (vertical bar) then the HTML table is bordered.
TTH inserts the title and author even without the maketitle command  
True, TTH inserts them when you define them. This gives you a chance to fine-tune the presentation if you wish.
What is this strange result using \dot \hat \tilde \frac \vec ... in in-line equations?  
Neither over and under accents nor built-up constructs such as fractions can be rendered in-line (i.e. in a textstyle equation produced by $ ... $) in HTML. Therefore, TTH outputs something that is not elegant but reasonably indicates the original intention. Additional brackets are inserted to ensure that fractions are unambiguous. TTH will render all these built-up constructs correctly in a display equation. See also 5.2 and 5.3 for alternatives.
Why does the large square root sign look so ugly?  
There are some things that browser symbol fonts can't do well.
Why does a dagger sign come out strange?  
Browsers don't generally have a dagger sign in their fonts. TTH uses a kludge.
The file I "published" using Netscape Composer looks messed up when viewed on a Mac.  
Don't use Composer on TTH documents. See section 12.4.
Why does TTH mess up my \fbox, minipage, etc?  
The whole concept of a "box" is not really translatable into HTML. TTH tries to mimic the box using tables. But in some cases, especially in equations, it can't cope.
How do I get caligraphic fonts, {\cal E}, AMS fonts, etc?  
You can't because browsers don't have access to them. TTH can only support fonts that are available on the browsers that eventually visit the page. By default TTH tells the browser to render caligraphic as italic helvetica font. You may, if you wish, define \cal to be something different, such as %%tth:\def\cal{\it\color{red}}.
Why does TTH turn double-quotes into an accent instead of quotes?  
In basic TEX the double quotes character " is not defined, and hence may do anything that the local installation feels like. Double quotes must be inserted by using two quote " or back-quote " characters. In German TEX implementations, the double-quotes character is used to provide the umlaut over accent and for some other special needs. TTH supports these German uses in some appropriate contexts. English speakers should adopt proper TEX quote usage. There is essentially never a situation in LATEX where it is advisable to use a double quote to represent itself outside of a verbatim section (where it will naturally be treated literally). In Plain TEX you might need it. If so, \char`" is an absolutely fool-proof way to insert it. Here it is:". You can also just enclose it in braces thus:{"}.
Why doesn't TtH output use < p > for paragraphs?  
For the first years of its existence it did. However, standards of HTML interpretation have grown tighter to the point where <p> is a great liability. In XHTML (the latest HTML standard) <p> is a container element. It must have a closing </p>; so that every paragraph must be its own group. This compulsion is contrary to TEX usage. Therefore TtH changed to dispense completely with any use of <p>, using an empty <div> with an associated CSS style instead. This has the significant benefit of ensuring that for standards-compliant browsers, font changes propagate even into the cells of tables. (NS4 is not compliant, Mozilla, NS7 etc are, in this respect.)

B.4  How to write TEX designed for Web publishing

How do I insert code that is used only by TTH, not TEX?  
Use %%tth: followed by the material you wish to pass to TTH. TEX omits this line as a comment. Alternatively, insert \newif\iftth at the top of your document, then use a conditional: \iftth \TtH\ material \fi. TtH recognizes \iftth as a special `if' that is always true, whereas to TEX it is simply a new `if', which by default is false when defined.
How do I insert HTML tags into my file without TEX knowing?  
Use %%tth: then on this line put \begin{html} tags \end{html}. Do not try to continue this html onto a second line with a second %%tth: before the \end{html} because the html environment will output the %%tth:, which it probably not what you want. Another way to pass codes directly to the output is the \special{html: ... } command. Do not use \begin{verbatim} to pass HTML tags. It will convert the greater than and less than signs to make them appear in the display and not be interpreted as tags.
How do I insert code that is used only by TEX, not TTH?  
Insert \newif\iftth at the top of the file and then use the conditional constr uction:
\iftth\beginsection{The \TtH\ Header}\par\else\beginsection{The \TeX\ Header}\fi

The `else' clause may also be used with a blank first clause, of course: \iftth\else ... \fi. Alternatively, insert the definition \def\tthdump#1{#1} at the top of the file and then use \tthdump{\TeX\ material} to pass stuff only to TEX. The command \tthdump is an internal command for TTH (which cannot be redefined) that simply discards its argument. Thus, for example, the following will output alternate versions from TEX and TTH.
\def\tthdump#1{#1}
%%tth:\begin{html}<H1>The HTML Header</H1>\end{html}
\tthdump{\beginsection{The \TeX\ Header}\par}

How do I include the style file ...sty for the TEX paper I prepared for... journal?  
If you must, put it in the same directory as your .tex file and see if it works. If it crashes, you may have to write a simpler one. Remove the old style file. Look at your TEX file, or the TTH messages telling you which commands are unknown. Decide which of the journal's specific commands or environments you used or need. Write a little style file that defines them to do something simple and sensible, or translates them into standard LATEX commands. Or ask the journal to provide such a style file! If you are a journal publisher, distribute your simplified style file to your authors.
In bordered tables I want an empty cell to look empty. How do I make TTH do that?  
HTML tables by default "fill in" an empty cell, so that it gives the visual impression of being absent. This is sometimes useful, so TTH does not prevent it. If you want it to look like an empty cell, put a non-break space in it by &~& in the TEX.
How do I include into a macro I am defining a # sign for an HTML reference?  
When you do \special{html:<a href="#reference">} TTH just puts the html tag in the output verbatim. TEX does essentially the same for its dvi file and the dvi processor later may or may not complain about not understanding it; but generally it is ignored. However if you try to define a macro like \def\localhref{\special{html:<a href="#reference">}} then TEX will complain as follows:
! Illegal parameter number in definition of \localref.
<to be read again> 
                   r
l.3 \def\localref{\special{html:<a href="#r
                                            eference">}}
?
This problem is caused by TEX's syntax analysis of the contents of the definition. One solution is to hide the definition from TEX using %%tth:. An alternative definition that avoids this problem must also be included for TEX's benefit, for example thus:
\def\tthdump#1{#1}
\tthdump{\edef\localref{[a hyperreference]}}
%%tth:\def\localhref{\special{html:<a href="#reference">}}

Alternatively, use \# in place of # in the hypertex reference. TTH specifically recognizes this as a literal and does appropriate translation. For example
\def\localhref#1#2{\special{html:<a href="\##1">}#2\special{html:</a>}}

will use its first parameter as a local anchor reference, preceded by #, and its second as the text of the anchor. The sequences \% and \\ are also treated as escaped literals, inserting their second character, inside a raw html section.
How do I construct a macro to take as a single argument a URL, which may contain special TEX characters like   _ ~ @ & etc, that makes TTH construct a hyperreference but TEX just enter it in the text?
Use the built-in command \url{...}. This behaves in essentially the same way as the command defined in LATEX's url.sty. The reference will appear verbatim in the text (in teletype font).

B.5  Formerly Frequently Asked Now Rarely Asked

Why does TTH only manage to input a limited number of files, perhaps 15 or so, then report "file not found" after that?  
This is a limitation of the operating system. It has only a limited number of file handles available. In MSDOS this number is set by a command FILES=... in the operating system configuration file config.sys. It needs to be set to a number large enough to accommodate all the input or include files that your TEX document uses, plus whatever other file overhead the operating system is using. Under OS/2 a similar limitation exists and is avoided by increasing the number of allowable file handles in the emx run-time system (e.g. SET EMXOPT=-c -h400 in config.sys).
TTH seems not to work on WinNT when converting included PostScript   files, even though my ps2gif program works fine from the command line. What is the problem?
The problem is not TTH. It appears to be an operating system problem. The batch program ps2gif is breaking for some strange reason when called from TTH. See footnote 5.
TTH does not recognize evironment ... even though it claims to.  
Probably you left a spurious space, e.g. \begin {enumerate} between the \begin and the following brace. TTH occasionally won't accept that, even though LATEX does. It is bad style.

Index (showing section)


-a switch, 6.1, 6.2, 6.5, B.2
auxiliary files, 6.1

BibTEX, 6.2
bibtex, B.2
block level elements, 11.2
<body>, B.3
bugs, 13.0

calligraphic, B.3
catcodes, 9.1
CGI script, 3.0
character set, 12.2
color, 10.0
colordvi, 10.0
commands
     LATEX supported, 1.2
     alternative files, 9.2
     handling unsupported, 1.4
     redefining, 9.4
     renaming, 9.4
     unknown, 9.3
     unsupported, 1.4
compile, 2.0
Composer, B.3
compression
     vertical, 5.3
conditionals, see \if
CSS, 5.3

dagger, B.3
definitions
     delimited, 9.1
double-quotes, B.3

encoding, 12.2
environment
     not recognized, B.5
environments, 1.2
equations
     textstyle, see in-line equations, overaccents
Error, 4.0
extensions to LATEX, B.2

fbox, B.3
file not found, B.5
FILES, B.5
flex, 2.0
font
     face="symbol", 5.1
fonts, 1.1
     accessing, 12.0
     details, 12.2
footnotes, 9.4
frac command
     see switch -L, B.2

gif, 6.4
glossary, 6.3
graphics files, 6.4

\halign, 7.2
hash sign, B.4
<head>, B.3
\headline, 1.1
HTML
     3.2, 5.1
     4.0, 5.1
     insertion, 1.3
     tags, B.4

icons, 6.4
\if, 1.1, 9.1
iftth, B.4
in-line equations
     arrays, 5.2
     built-up display, 5.2
     fractions, 5.2
     overaccents, 5.2
\includegraphics, 6.4
index
     layout in one or two columns and the equivalent page length, 6.3
indexing, 6.3
\input
     "file not found" error, B.5
     disabling, 9.2
     TEXINPUTS, 9.2
     TTHINPUTS, 9.2
italic
     equation style, 5.1

jpeg, 6.4

LATEX extension packages, B.2
LaTeX2HTML
     differences, 5.1, 6.1
license, 14.0
limitations, 5.2
line-ends, B.2
longtable, 7.3

macro files, 9.2
macros
     alternate, 9.2
     special use, A.0
makeindex, 6.3
mathematics, 1.1
     layout style, 5.3
messages, 4.0

Netscape/Mozilla Composer, 12.4

<p>, B.3
picture environment, 6.5
portability, 6.1
postscript, 6.4
printing, 12.3
ps2gif, 6.4
ps2png, 6.4
publish
     through composer disallowed, 12.4

references
     forward, 6.1
\rm, 1.1

skip space and dimension commands, B.2
spacing, 5.1
square root, B.3
stderr, 3.0
stdin, 3.0
stdout, 3.0
Style Sheets, 5.3
styles, B.4
support, B.2
switches, 6.2, B.2
     -L, 3.0, 6.1, B.2
     -a, 6.1, 6.5
     -j, 6.3
     -u, 12.1
     -y1, 5.3, 12.1
     -y2, 5.3, 12.1
     TTH, 3.0
symbol font
     accessing, 12.0

Table of Contents
     Index entry, 6.3
tables
     bordered cells filled in, B.4
TEX-only code, B.4
texinputs path, 9.2
title
     HTML construction, 11.1
     TeX commands not expanded in, B.3
TTH-only code, B.4

unknown commands, see commands, unknown
URL, B.4
\usepackage, B.2
UTF-8, 12.6

warning, 4.0
web-server, 12.6
WinNT, B.5

Footnotes:

1The problem with \rm in text is that HTML has no < rm > tag, and relies on cancelling all previous (e.g.) < i > or < b > tags. By default (using style -y1) TTH uses Cascading Style Sheets to solve this problem. However not all older browsers support CSS and even in those that do, the user can turn off the CSS support. The best solution is to avoid \rm by using proper grouping of non-roman text. (In equations \rm is essential, but TTH has a work-around in equations.)
2Conditionals \if and \ifx are not 100% TEX compatible for cases where they refer to internal TEX commands because TTH internals are not identical. Catcodes are also unknown to TTH.
3See appendix for TEX macros supporting these commands
4The PNG graphics file format is an improved replacement for the GIF standard. Netscape has built in rendering for PNG. The GIF standard is plagued with legal problems related to a ridiculous patent on the type of file compression it uses.
5May 1999 reports indicated that there is a batch program in circulation bearing the comment ":#batchified by cschenk@snafu.de" that tries to implement the functionality of ps2gif and gives errors on WinNT when called by TTH but not when called from the command line. I have not had recent reports of problems, so I think this problem has been fixed.
6The alignment argument of the math array environment was ignored in TTH versions earlier than 2.20 but is now honored.
7See the file colordvi.tex for a list of the named colors.
8It proves to be better to specify 4.0 as the HTML Doctype because on some operating systems symbol font rendering is not honored for 4.01 documents.


File translated from TEX by TTHgold, version 3.81.
On 6 Feb 2010, 12:47.