If you click on [T], you can
typeset the HTML document that you have
opened or created, converting it into high-quality Portable Document
Format with the help of the TeX engine, a software developed
by Professor Donald Ervin Knuth and his students at Stanford University around the year
1980. The kernel of TeX version 3.141592x is frozen and considered to be
error-free, in contrast to most other software of that size, but the
system still is flexible enough to be extended.
Between
1996 and 1998 support for right-to-left typesetting was implemented
by Peter Breitenlohner, and Hàn Thế Thành integrated a post-processor which
creates PDF output and embeds PostScript and TrueType fonts as well as JPG and
PNG images. The additional code of both extensions is contained in the
pdfetex
binary.
Markup Shredder – which can use this wonderful piece of well-tested open source software – only adds 300 kilobytes of TeX macros and twice that amount of shell, batch and PHP scripts in order to realize an HTML parser and a font metric processor together with three simple user interfaces.
Traditionally, writers had to learn hundreds of proprietary commands to run the TeX typesetting engine. Markup Shredder wants to be TeX made easy, as basic HTML and CSS knowledge should do now.
Like HTML-Tidy, which is described in the analyse section, GMS performs a syntax check of the input
document’s markup and creates a protocol file that is given the same base
name with a LOG
extension. Each tag that has been processed is
listed herein, and error messages are inserted wherever the GMS macro layer or
the TeX engine itself detect any irregularity.
HTML-Tidy and GMS do not always agree in their syntax check
results. For instance, Markup Shredder might mention missing end
tags or missing quotation marks around attribute values where
HTML-Tidy gives a strict HTML rating without warnings, because GMS tries
to help authors to write valid XHTML documents. Another reason is that
Markup Shredder does not test whether lower case is used for all
element and attribute names in documents that claim to
be of type XHTML. GMS also tolerates characters €
to Ÿ
, which are not allowed in the specifications, though
important things like dashes, Euro sign, French œ ligatures
and German „gaensefuesschens“ could be placed there by
Microsoft.
It is instructive to compare both test results with the HTML and XHTML specifications.
At the end of the log, you find a link to the PDF output file. If GMS runs on a remote network computer, you can download and save it, if you click on the link with the right mouse button. In the context menu select save target as.
You can typeset a given HTML file by pressing [T]. The result of the syntax check will be displayed in the text viewer; press [Q] (Linux) or [Esc] (Dos, Windows) to quit.
You can select another TeX binary (tex
,
etex
, pdftex
, or pdfetex
), if you press
[S] and [P]. The TeX binary
and its associated message pool
file should be found in the
search path or in [GMS_BINARIES]
, a sub-directory of
[GMS_ROOT]/
bin
.
Say gms
-t
/myfolder/myfile.htm
(Linux)
or gms
/t
x:\myfolder\myfile.htm
(Dos,
Windows) to typeset a markup document. If the file
was opened or created before, it is sufficient to call
gms
-t
or
gms
/t
.
Alternatively, you can execute the command
[GMS_BINARIES]/pdfetex
-progname=gerolf
[GMS_ROOT]/doc
/default/default.htm
, if the
[TEXINPUTS]
variable is set to
[GMS_ROOT]/etc
.
Thanks to the operating system’s disk buffer, the second typesetting run of an HTML input will be a bit faster than the first. In the text mode interface, the process may be accelerated on Windows NT/XP, if the window is hidden. GMS will prompt you in the task bar whether it is still busy, or how long the run has taken. In the web browser interface, as well as in the other interfaces on Windows 9x, you do not get any feedback about the typesetting progress before it is finished, though it might take some time.
Sometimes you may discover an error in your input file while TeX is still working. In the text mode interface or in the command line, you can cancel the process by pressing [Ctrl+C]. If a question comes what to do next, answer [X][Enter] to exit at this point or [Ctrl+C] again to cancel output production. On Dos and Windows, you are also asked: Cancel batch process (Y/N)? Answer [N][Enter] to return to the GMS menu.
If TeX should break with an exhausted memory message on a large
document, change the value that is assigned to the main_memory
variable in texmf.cnf
and initialize the TeX format
file.
The pdfTeX engine can embed images, but only those of type JPG (for photographs), PNG (for graphics with a reduced number of colors), or PDF (e.g. single pages created with pdfTeX and LaTeX). The popular GIF image file format is not supported, but you do not have to modify your HTML document: Just provide a JPG or PNG file with the same base name for every inserted GIF image and place it into the same directory.
So you can have low-resolution GIF images which are displayed by browsers, while GMS, when processing GIF requests, will look for matching JPG or PNG files that may have a higher resolution. If such an image file cannot be found, however, the replacement function may return a different file with the same name from another directory within the document search path.
You may change your document to use only PNG images, avoiding to ship them in two different data formats, but Internet Explorer 3x and early Netscape Navigator 4x do not render PNG images. Using a JPG replacement for a GIF image will usually lead to a larger file size or loss of quality. GMS treats 1in = 25.4mm = 72pt = 72px, or 1px = 1pt, just for ease of page design, though this is not recommended by the CSS2 specification.
Here’s the main trick to fine-tune an HTML document for print via GMS and Acrobat Reader without changing its appearance in a browser on screen:
screen.css
and
print.css
and link them to the document’s
<head>
element, saying: <link
rel
=
"stylesheet"
type
=
"text/css"
href
=
"screen.css"
media
=
"screen"
/><link
rel
=
"stylesheet"
type
=
"text/css"
href
=
"print.css"
media
=
"print"
/>
.class
attribute to its definition tag, <td
class
=
"td1">
, and an entry like
.td1
{width:
8cm}
to
print.css
(then you can still rely on your browser’s
auto-width function for screen rendering), or add .td1
{width:
50%}
to screen.css
..noscreen
{display:
none}
in screen.css
and
.noprint
{display:
none}
in
print.css
. Now, if you start your document body with
Good
<span
class
=
"noprint">morning</span><span
class
=
"noscreen">evening</span>
, the rendering reads:
Good evening! –The media
attribute applies
to the <style>
element as well, so you do not need external
CSS
files.
media
value and thus forcing authors to optimize their pages for
Microsoft products. So it may be necessary to load an empty dummy file as last
style sheet.You can select the language of your document and
the corresponding hyphenation rules by saying <html
lang
=
"en-UK">
, for example. The
codes for the representation of names of languages are defined in ISO 639.2. If you
discover wrong or missing hyphenation in the output PDF produced by GMS, use
soft hyphens, like man­u­script
or
ap­pen­dix
. Old browsers like Internet Explorer 3x
and Netscape Navigator 4x, however, display ­
as a dash,
being always visible.
Between 1996 and 2006, reformed spelling and hyphenation rules for German
were established. In GMS, you can enable reformed German hyphenation
rules by saying <html
lang
=
"de-rf">
in your document. For traditional texts, the
<html
lang
=
"de">
declaration, however, is insufficient because there are extra
rules to modify the spelling of some words when they are split between
lines. Thus you have to write
Bä<span>ck</span>er
and
Be<span>tt</span>uch
to tell GMS to take a
sonderweg leading to Bäcker/Bäk-ker and
Bettuch/Bett-tuch.
In any language, if a word appears to be split in a way that you do not
like, you can restrict hyphenation to the desired places by inserting soft
hyphens ­
. A word will not be split if it is
enclosed by an inline level element like <span>
,
unless a trailing space or punctuation mark is enclosed too.
Since version 0.06a, Gerolf Markup Shredder supports
genealogical data markup according to the Gedcom XML 6.0 Specification. While you can open the example file
gedcom60.xml
directly with the text mode interface, the web browser
interface will refuse to do so. Therefore a copy of this file named
gedcom60.htm
still is required.
For other languages than English, you have to modify the generated content
which is defined at the end of gedcom60.css
, e.g.
‘Gender:before
{content:
"Geschlecht:
"}
’ for German. Internet Explorer
does not generate this content, so use Mozilla Firefox or Opera for
browsing. The gedcom60.pdf
output file produced by GMS includes
images as generated content for <URI>
elements,
if JPG or PNG files can be found locally.