SGML Doctypes Descriptions:

              Doctype Classes
                   |
                   |
                Doctype (Generic Document Type)
                   |
                   |
                   ^
                SGMLNORM
               /        \
             SGML        XML

SGML:

  A full validating SGML parser. It works in 2 (two) passes. In the first pass the document is parsed, validated and normalized into a cannonical form and placed into a a persistant cache. In the second pass the cannonical document is parsed and processed by the SGMLNORM handler.

The MIME type (for CGI applications) is: text/sgml

XML:

"Beta Release (subject to change)"

The Extensible Markup Language (XML) is a language derived from SGML. XML has been designed to retain most of the power of SGML but with a substantially more simple grammar that is suited to tools such as yacc and lex.

The motivation for XML is to provide:

  • A structured open document type for distributed applications working on large-scale networks such as the Internet.
  • A simple language compatible with SGML
  • An easy to author (even by hand) document format.
  • An inter-operable basis for push delivery models.

The most significant features of XML are:

  • A document can be processed without need of the DTD.
  • All container tags are normalized
  • Unicode character (UCS-2) set support via &#nnnn; style entities.
  • Empty tags are explicit using a <tag/> markup convention.

The emerging XML specification is currently being tracked. The XML doctype supports the draft specification as well as a number of enhancements. It does not provide any significant language validation and since it is a child of SGMLNORM it also supports several SGML constructs that are not, and won't be, part of the standard.

Note: For the Autodetect mechanism to work correctly one must either use the .xml (or .XML) file name extension or specify <?XML VERSION="Version"?> (only the <?XML is significant) in the document declaration. Since the XML document will be correctly parsed by the SGMLNORM doctype the major effect of incorrect identification would be in the MIME type for the Raw record.

SGMLNORM:

Handles SGML-type documents with "normalized" tags and entities, viz. end-tags and entity replacement.
  • It is not a validating SGML parser and is not in complete adherence to 15.3 of the ISO standard.
  • It has been designed to provide the basic parsing and presentation services for heirarchical data for the Isearch engine, within the limitations and constraints of the Isearch architecture,
  • Supports (versions >= $1.8) the following Doctype options:
    • Key to define the container for the record key (index time option)
    • Headline to define the headline containter (search time option)
  • The DTD is not processed and need not even be available on the indexing platform. Although the input is assumed to have Each non-empty tag (container) defines a field.
  • In addition to storing tags values as a field it also stores the values of (complex) attributes to allow for a relatively complete search and retrieval of document content.

    In: <ONE TWO="Three">Four</ONE> Four is stored as the value of ONE Three is stored as the values for ONE@ and ONE@TWO

  • SGML comments and declarations are correctly ignored. Note: Version $1.8$ corrects a bug handling nested declaration.
  • Empty tags per SGML Handbook Annex C.1.1.1, p.68 are also supported. The "null end tag" (NET), p.69, with the short-tag feature used in markup such as <tag/hello world/ is supported in versions >= $1.7$.
  • In versions >$1.10$ the parser also correctly processes XML conventions, viz. <EMPTY/> for empty containers.
  • Version 2.x will include GRS-1, HTML and SUTRS Presentation.

The MIME type (for CGI applications) is: text/sgml

References:

SGMLTAG:

Handles documents with SGML-like markup. It is NOT intended for SGML documents.

In v2.0+ of the doctype suite the SGMLTAG doctype is tuned to the needs of GILS documents. The headline is constructed from the 2nd level tag.

The MIME type (for CGI applications) is

Application/X-SGMLTAG-<level-1>

Example: <Rec> <Title> Personnel Action System </Title> <Originator>U.S. Geological Survey </Originator>

For the above example: Application/X-SGMLTAG-Rec.
And the Headline: Personnel Action System.

GILS:

Handles documents with SGML-type markup for use in GILS systems. It is NOT intended for SGML documents. It uses SGMLTAG.

The MIME type (for CGI applications) is

Application/X-GILS-<level-1>

© Copyright 1995-1996   Basis Systeme netzwerk, Munich. All Rights Reserved.