This document describes recent user-visible changes in SP. Bug fixes are not described.
Better support for XML based on the Web SGML Adaptations Annex to ISO 8879.
New SX application that converts SGML to XML.
The architecture engine has been updated to match HyTime 2nd Edition
(This means that you must use
<?IS10744 ArcBase arch>
rather than
<?ArcBase arch>
.)
The Extended Naming Rules TC is supported. The extensions supported in external concrete syntaxes have been changed for compatibility with this.
The handling of character sets in the multi-byte version is more sophisticated. The character sets HTML page gives more information.
SP has built-in knowledge of many more base character sets.
nsgmls will report empty elements if the -oempty
option is used.
DTD-less parsing is possible, by using the -wno-valid
option to allow undefined elements and attributes. This allows
parsing of well-formed XML documents, whether or not valid.
There is now generalized support for architectural form processing.
Documentation is now in HTML format.
A BASE catalog entry can be used to specify a base system identifier for resolving relative storage object identifiers occurring in the catalog.
A LITERAL storage manager is now provided.
Programs have a -E option that sets the maximum number of errors.
A DELEGATE catalog entry allows distributed resolution of public identifiers.
nsgmls has a -B (batch mode) option that allows you to parse multiple documents with a single invocation of nsgmls.
In nsgmls the -c option now specifies a catalog as it does in spam and sgmlnorm, in addition to the -m option that previously did this.
The -n option has been replaced by a -onotation-sysid which applies to nsgmls only, and a -wnotation-sysid which applies generally.
SP can be built as a DLL under Win32.
The syntax of system identifiers has completely changed. The new syntax is based on the syntax of formal system identifiers defined in ISO/IEC 10744 (HyTime) Technical Corrigendum 1, Annex D.
The NSGMLS_CODE environment variable has been renamed to SP_BCTF. nsgmls has a -b option to specify the bit combination transformation format to be used for output.
A list of directories in which files specified in system identifiers should be searched for can be specified using the environment variable SGML_SEARCH_PATH or the option -D.
Individual SYSTEM identifiers in external identifiers can be overridden using SYSTEM entries in the catalog.
The OVERRIDE catalog entry now takes a YES/NO argument. (This change was required for conformance to the SGML Open TR.) It applies to each entry individually rather than to the entire catalog.
The -w options of nsgmls and spam have been enhanced. In spam, the -w option takes an argument as with nsgmls. There are new warnings for minimized start and end tags (-wunclosed, -wempty, -wnet and -wmin-tag); for unused short reference maps (-wunused-maps); for unused parameter entities (-wunused-param). -wall now doesn't include those warnings that are about conditions that, in the opinion of the author, there is no reason to avoid. A warning can be turned off by using its name prefixed by no-; thus -wmin-tag -wno-net is equivalent to -wunclosed -wempty. The -w option is also used to turn off errors: -wno-idref replaces the -x option; -wno-significant replaces the -X option.
In the output of nsgmls, characters that cannot be represented in the encoding translation specified by the NSGMLS_BCTF environment variable are represented using an escape sequence of the form \#N; when N is a decimal integer.
In the multi-byte versions of nsgmls there are new BCTFs is8859-N for N = 1,...,9.
There is a -o option to nsgmls which makes it output additional information: -oentity outputs information about all entities; -oid distinguish attributes with a declared value of id; -oincluded distinguishes included subelements.
nsgmls now automatically searches for a catalog entry file called "catalog" in the same place as the document entity. Note that when the document entity is specified with a URL, this matches the behaviour of Panorama.
A catalog entry file can contain CATALOG entries specifying additional catalog entry files. This matches the behaviour of Panorama.
The parser can now make available to an application complete information about the markup of prologs and SGML declarations. It would now be possible, for example, to use SP to write a DTD editor. spam exploits this to a limited extent: if the -p option is specified twice, then parameter entity references between declarations will be expanded; the -mreserved option puts all reserved names in upper-case; with the -mshortref option short reference use declarations and short reference mapping declarations will be removed; attribute specification lists in data attribute specifications in entity declarations can be normalized like attribute specification lists in start-tags; with -mms it resolves IGNORE/INCLUDE marked sections.
nsgmls has a -C option which causes the command line filenames to be treated as a catalog whose DOCUMENT entry specifies the document entity.
nsgmls has a -n option which causes it to generate system identifiers for notations in the same way as it does for entities.
spam now has a -f option like nsgmls.
The interface between the parser and entity manager has been redesigned so that the entity manager can be used independently of the parser. This is exploited by a new program called spent that prints an entity with a specified system identifier on the standard output.
In most cases, a Control-Z occurring as the last byte in a file will be stripped. This is controlled by the zapeof attribute in formal system identifiers.
External concrete syntaxes, character sets and capacity sets are supported using PUBLIC entries in catalog files. The multicode code core and reference syntaxes are no longer built-in. Only a few character sets are now built-in.
Within external concrete syntaxes, various useful extensions are permitted. In particular, an ellipsis syntax is allowed for the specification of name characters and single character short references. It is now practical to specify tens of thousands of additional name characters.
The default SGML declaration is more permissive.
nsgmls has a -x option that inhibits checking of idrefs.
nsgmls has a -w option that can enable additional warnings. In particular, -wmixed will warn about mixed content models that do not allow #pcdata everywhere.
The meaning of the f command in the output of nsgmls has changed slightly. It now gives the effective system identifier of the entity.
The functionality of the rast program has been merged into the nsgmls program and the rast program has been removed. The -t option makes nsgmls generate a RAST result.
spam has a -l option that uses lower-case for added names that were subject to upper-case substitution.
spam has a -mcurrent option that adds omitted attribute specifications for current attributes.
James Clark