Web Browser

Donald Ervin KnuthIf you click on [T], you can typeset the HTML doc­u­ment that you have opened or created, converting it into high-quality Portable Document Format with the help of the TeX engine, a software de­vel­oped by Professor Donald Ervin Knuth and his students at Stanford University around the year 1980. The kernel of TeX version 3.141592x is frozen and considered to be error-free, in contrast to most other software of that size, but the system still is flexible enough to be extended.

Han The ThanhBetween 1996 and 1998 support for right-to-left typesetting was imple­ment­ed by Peter Breitenlohner, and Hàn Thế Thành integrated a post-processor which creates PDF output and embeds PostScript and TrueType fonts as well as JPG and PNG images. The additional code of both extensions is contained in the pdf­e­tex binary.

Markup Shredder – which can use this wonderful piece of well-tested open source software – only adds 300 kilobytes of TeX macros and twice that amount of shell, batch and PHP scripts in order to realize an HTML parser and a font metric processor together with three simple user interfaces.

Traditionally, writers had to learn hundreds of proprietary commands to run the TeX type­setting engine. Markup Shredder wants to be TeX made easy, as basic HTMLHTML and CSSCSS knowledge should do now.

Like HTML-Tidy, which is described in the analyse section, GMS performs a syntax check of the input document’s markup and creates a protocol file that is given the same base name with a LOG extension. Each tag that has been processed is listed herein, and error messages are inserted wherever the GMS macro layer or the TeX engine itself detect any irregularity.

HTML-Tidy and GMS do not always agree in their syntax check results. For in­stance, Markup Shredder might mention missing end tags or missing quotation marks around attribute values where HTML-Tidy gives a strict HTML rating without warnings, because GMS tries to help authors to write valid XHTML documents. Another reason is that Markup Shredder does not test whether lower case is used for all element and attribute names in doc­u­ments that claim to be of type XHTML. GMS also tolerates characters € to Ÿ, which are not allowed in the specifications, though important things like dashes, Euro sign, French œ ligatures and German „gaensefuesschens“ could be placed there by Microsoft.

It is instructive to compare both test results with the HTMLHTML and XHTMLXHTML specifications.

At the end of the log, you find a link to the PDF output file. If GMS runs on a remote network computer, you can download and save it, if you click on the link with the right mouse button. In the context menu select save target as.

Text Mode

You can typeset a given HTML file by pressing [T]. The result of the syntax check will be displayed in the text viewer; press [Q] (Linux) or [Esc] (Dos, Windows) to quit.

You can select another TeX binary (tex, etex, pdftex, or pdfetex), if you press [S] and [P]. The TeX binary and its associated message pool file should be found in the search path or in [GMS_BI­NA­RIES], a sub-directory of [GMS­_ROOT]/binbin.

Command Line

Say gms -t /my­fold­er/my­file.htm (Linux) or gms /t x:\my­fold­er\my­file.htm (Dos, Windows) to typeset a markup document. If the file was opened or created before, it is sufficient to call gms -t or gms /t.

Alternatively, you can execute the command [GMS­_BI­NA­RIES]/pdf­etex -prog­name=ge­rolf [GMS­_ROOT]/doc/de­fault/de­fault.htm, if the [TEX­IN­PUTS] variable is set to [GMS­_ROOT]/etc.


Thanks to the operating system’s disk buffer, the second typesetting run of an HTML input will be a bit faster than the first. In the text mode interface, the process may be accelerated on Windows NT/XP, if the window is hidden. GMS will prompt you in the task bar whether it is still busy, or how long the run has taken. In the web browser interface, as well as in the other interfaces on Windows 9x, you do not get any feedback about the typesetting progress before it is finished, though it might take some time.

Sometimes you may discover an error in your input file while TeX is still working. In the text mode interface or in the command line, you can cancel the process by pressing [Ctrl+C]. If a question comes what to do next, answer [X][Enter] to exit at this point or [Ctrl+C] again to cancel output production. On Dos and Windows, you are also asked: Cancel batch process (Y/N)? Answer [N][Enter] to return to the GMS menu.

If TeX should break with an exhausted memory message on a large document, change the value that is assigned to the main_memory variable in tex­mf.cnf and initialize the TeX format file.


The pdfTeX engine can embed images, but only those of type JPG (for photographs), PNG (for graphics with a reduced number of colors), or PDF (e.g. single pages created with pdfTeX and LaTeX). The popular GIF image file format is not supported, but you do not have to modify your HTML document: Just provide a JPG or PNG file with the same base name for every inserted GIF image and place it into the same directory.

So you can have low-resolution GIF images which are displayed by browsers, while GMS, when processing GIF requests, will look for matching JPG or PNG files that may have a higher resolution. If such an image file cannot be found, however, the replacement function may return a different file with the same name from another directory within the document search path.

You may change your document to use only PNG images, avoiding to ship them in two different data formats, but Internet Explorer 3x and early Netscape Navigator 4x do not render PNG images. Using a JPG replacement for a GIF image will usually lead to a larger file size or loss of quality. GMS treats 1in = 25.4mm = 72pt = 72px, or 1px = 1pt, just for ease of page design, though this is not recommended by the CSS2 specification.


Here’s the main trick to fine-tune an HTML document for print via GMS and Acrobat Reader without changing its appearance in a browser on screen:


You can select the language of your document and the corresponding hyphenation rules by saying <html lang = "en-UK">, for example. The codes for the representation of names of languages are defined in ISO 639.2. If you discover wrong or missing hyphenation in the output PDF produced by GMS, use soft hyphens, like man&shy;u&shy;script or ap&shy;pen&shy;dix. Old browsers like Internet Explorer 3x and Netscape Navigator 4x, however, display &shy; as a dash, being always visible.

Between 1996 and 2006, reformed spelling and hyphenation rules for German were established. In GMS, you can enable reformed German hyphenation rules by saying <html lang = "de-rf"> in your document. For traditional texts, the <html lang = "de"> declaration, however, is insufficient because there are extra rules to modify the spelling of some words when they are split between lines. Thus you have to write Bä<span­>ck</span­>er and Be<span­>tt</span­>uch to tell GMS to take a sonderweg leading to Bäcker/Bäk-ker and Bettuch/Bett-tuch.

In any language, if a word appears to be split in a way that you do not like, you can restrict hyphenation to the desired places by inserting soft hyphens &shy;. A word will not be split if it is enclosed by an inline level element like <span>, unless a trailing space or punctuation mark is enclosed too.


Since version 0.06a, Gerolf Markup Shredder supports genealogical data markup according to the Gedcom XML 6.0 Specification. While you can open the example file ged­com­60.xml directly with the text mode interface, the web browser interface will refuse to do so. Therefore a copy of this file named ged­com­60.htm still is required.

For other languages than English, you have to modify the generated content which is defined at the end of ged­com­60.css, e.g. ‘Gender:before {content: "Geschlecht: "}’ for German. Internet Explorer does not generate this content, so use Mozilla Firefox or Opera for browsing. The ged­com­60.pdf output file produced by GMS includes images as generated content for <URI> elements, if JPG or PNG files can be found locally.