DSSSL WWW Enhancements

James Clark (jjc@jclark.com)

1997-05-23

This document proposes a number of enhancements to DSSSL (ISO/IEC 10179:1996) for use on the Web.

These enhancements could be standardized within ISO. However they could also be standardized by some other organization using the extension mechanisms provided in DSSSL: this would involve defining a set of public identifiers for the added procedures, flow object classes and characteristics.

This document is still evolving. Whilst the list of issues is intended to be comprehensive, it does not yet propose solutions for all the issues.

Comments are welcome.

CSS1 formatting

All the formatting capabilities of CSS1 should be achievable in DSSSL. Furthermore they should be achievable in a way that is convenient to specify and is sufficiently easy to implement that these capabilities can be incorporated in dsssl-o.

sRGB color space

The sRGB color space used by CSS1 should be supported in DSSSL. This can be achieved simply defining an appropriate public identifier.

First line

CSS1 allows special formatting to be applied to the first line of a paragraph using the first-line pseudo-element.

To support this a first-line-style: characteristic should be added to the paragraph flow object. The value is a style object. For any inline flow object all of whose resulting areas lie in the first line of the paragraph, the style object will be interposed in the inheritance hierarchy immediately beneath the paragraph flow object. This is very similar to what currently happens with the table-column flow object.

For example, consider the following style-sheet:

(element P
  (make pararaph
    font-size: 10pt
    first-line-style: (style font-size: (+ (inherited-font-size) 4pt))))

(element SMALL
   (make sequence
     font-size: (- (inherited-font-size) 2pt)))

and imagine a paragraph that contains a SMALL element the first part of which lies on the first line of the paragraph:

Characters on the first line but not in the small element will be at 14pt.
Characters in the small element on the first line will be at 12pt
Characters in the small element not on the first line will be at 8pt
Other characters will be at 10pt

This is not trivial to implement.

First letter

CSS1 allows special formatting to be applied to the first letter of a paragraph using the first-letter pseudo-element.

In DSSSL this could be achieved by using a query to select the first letter. However this has some disadvantages.

Identifying the first letter is non-trivial. The first letter might be nested in some inline elements occurring within the paragraph, or it might be generated text. In this context we are interested in the first letter of a paragraph in the flow object tree: trying to identify this using the grove requires duplicating some of the work of the flow object tree construction process.
The first letter glyph may be composed from many characters: for example, there may be accents or other non-spacing characters.
There may be some characters before the first letter, such as an opening quotation mark, that should also have the same style as the first letter.

For these reason, it is better for the identification of the first letter to be left to the formatter. A first-letter-style: characteristic should be added to paragraph. The value is a style object. For flow objects that comprise the first letter, this style object will be interposed in the inheritance hierarchy immediately below the paragraph flow object and the style object for the first-line-style characteristic if any.

CSS1 achieves drop caps through a combination of the first-letter pseudo-element and the float property. I don't think this is a good solution. A drop cap is vertically aligned in a special way, and the computation of the size of a drop cap is not trivial: an drop cap that is to be dropped two lines is not simply twice normal size; normally it is computed so that the baseline of the dropped cap is aligned with the baseline of the second line and the top of the dropped cap is aligned with the top of the capital letters in the first line. In other languages the computation may be more complex. For example, in Thai, there are no capital letters, and some letters extend below the baseline: when the initial letter extends below the baseline, the size of the dropped initial letter is computed so that the bottom of the descender of the dropped letter aligns with the bottom of the descenders on the last indented line; but when the initial letter doesn't extend below the baseline, the letter is sized as with Latin scripts; moreover when the initial letter is taller than normal letters, the top of of initial letter extends above the first line.

A first-letter-drop-n-lines: characteristic should be added to paragraph that is either #f or an integer >= 2 that specifies the number of lines to drop the first letter.

Floating elements

In HTML, the values of LEFT and RIGHT for the ALIGN attribute on the IMG element cause the image to be floated to the left or right margin. The CLEAR attribute on the BR element can be used to make the next element move down past any floating images.

The float and clear properties in CSS1 provide similar capabilities.

Here is one simple way in which DSSSL could be extended to support this.

A side-float?: characteristic is added to the same flow objects that display-alignment: applies to. This is a boolean valued characteristic. It is not inherited. The default value is #f. When its value is #t, then the value of display-alignment: must not be center, and the area produced by the flow object will be floated to the side specified by the display-alignment: characteristic. An area whose flow object has a true side-float?: characteristic is called side-floating area.

A side floating area is treated by its area container differently from a normal display area.

A side-floating area that occurs in a paragraph does not cause a break.
Recall that the display areas with which an area container is filled are always created so that their size in the direction perpendicular to the filling-direction is equal to the size of the area container in that direction. This is not true for a side-floating area: it is not automatically filled out to the display-size.
When a side-floating area comes from a flow object occuring in a paragraph, then the top of the side-floating area should be aligned with the top of the line that contains the flow object preceding the side-floating area's flow object. If this is not possible, then the formatter should place the area as close to the line as possible.
The start-indent and end-indent characteristics for a side-floating area specify the position of the starting or ending edge of the side-floating area.

A side-float-margin: characteristic specifies the space between the side-floating area and the displayed areas placed to the side of it. The value is a length-spec. This is inherited. The initial value is 0pt. When a displayed flow object is placed alongside a side-floating area on the start side, then its start-indent is replaced by the sum of the side-floating areas start-indent, the size the side-floating-area and the the side-float-margin, but only if this is greater than the original start indent. Similarily for the end side.

In HTML and CSS1 floating images can be placed horizontally next to each other. There should be a characteristic that controls whether this is allowed (maybe side-float-multiple?:). There would also need to be some sort of control on the minimum amount ot space left for text.

The space-before: and space-after: characteristics on a side-floating area are treated specially. TODO: figure out how exactly.

To float one or more paragraphs into the margin, the paragraphs must be wrapped inside an included-container-area flow object with a true side-float?: characteristic.

The effect of the CLEAR attribute on the BR element can be achieved with an additional non-inherited characteristic side-float-clear: on displayed flow objects having the values none, start, end, both specifying the sides on which side floats are not allowed at the start of the areas produced by the flow object.

Positioning

The W3C working draft on positioning extends CSS to allow explicit positioning of HTML elements. DSSSL should be extended to support this kind of functionality.

Inline vertical alignment

The vertical-align property in CSS1 allows inline elements to be positioned vertically. It corresponds to the position-point-shift characteristic in DSSSL. When the vertical-align property is being used for the positioning of inline images (and other replaced elements), it corresponds to the position-point-y characteristic. The vertical-align property specifies the positioning in terms of properties of the current font or the maximum size of objects on the same line as element to be positioned.

Case folding

In CSS1, the text-transform property allows upper-casing and lower-casing to be done easily. In DSSSL this can be achieved through the char-map characteristic. This is more general, but hard to implement efficiently. It also depends on the fairly complex machinery for explicitly declaring language-dependent rules for case conversion and collation.

DSSSL should allow the symbols uppercase, lowercase, capitalize as the value of char-map. This would use the language and country characteristics to determine how to perform the case conversion.

Procedures should also be to the expression language to perform case conversion using a specified language and country (these would default to the system's default language and country). At the moment the DSSSL expression language allows explicit specification of language-specific rules for case conversion, and provides procedures for performing case conversion using these rules, which allows for maximum interchangeability but is burdensome for users and implementors. Many operating systems provide facilities to define and use language-specific rules for case-conversion. The new procedures would allow DSSSL implementations to take advantage of these facilities.

Borders

The effects of CSS1 borders are achieved in DSSSL with the box flow object.

CSS1 allows each side of a box to have different characteristics. DSSSL should have 8 new inherited characteristics (4 for displayed boxes and 4 for inlines boxes) whose values are style objects specifying the style for the corresponding border. Characteristics not specified in the style object would be inherited from the box flow object. The hardest thing is to choose names for each side of the box:

box-before-border-style: For a display box, specifies style for border that is perpendicular to the placement direction and first in the placement direction.
box-after-border-style: For a display box, specifies style for border that is perpendicular to the placement direction and last in the placement direction.
box-start-border-style: For a display box, specifies style for border that is perpendicular to the writing-mode direction and first in the writing-mode direction.
box-end-border-style: For a display box, specifies style for border that is perpendicular to the writing-mode direction and last in the writing mode direction.
box-line-before-border-style: For an inline box, specifies style for border that is perpendicular to the line progression direction and first in the line progression direction.
box-line-after-border-style: For an inline box, specifies style for border that is perpendicular to the line progression direction and last in the line progression direction.
box-escapement-before-border-style: For an inline box, specifies style for border perpendicular to the escapement direction and first in the escapement direction
box-escapement-after-border-style: For an inline box, specifies style for border perpendicular to the escapement direction and last in the escapement direction

CSS1 allows various 3D effects for borders (groove, ridge, inset, outset). I believe these effects can be achieved by using different colors for opposite sides, and (for groove, ridge) by using a line-repeat of 2 and supporting different colors for each line.

Background images

CSS1 offers more extensive control over background images than DSSSL:

The background-repeat property allows control over whether the background image is repeated.
The background-attachment property allows control over whether the background image scrolls along with the content.
The background-position property allows control over the positioning of the background image.

These should be added as inherited characteristics of the scroll flow object.

Blinking text

In CSS1 text can be made to blink using a blink value for the text-decoration property.

A blink?: characteristic should be added to DSSSL. This would be inherited. It would apply to the same flow objects that layer: currently applies to. Maybe another characteristic should be added to control the rate of blinking.

Should there also be a characteristic to control blink rate?

Link anchor formatting

CSS1 allows different formatting to be specified for visited and active anchors using the anchor pseudo-classes. In DSSSL different formatting for visited anchors can be achieved using the address-visited? procedure. However, this is not easy to implement as it requires an implementation to compute dependencies between the flow object tree and visited status of anchors and incrementally recompute the flow object tree whenever an anchor is visited.

There are a number of possible solutions:

A visited-style: and active-style: characteristic on the link flow object; the value would be a style flow object that would be applicable when the destination is visited or active.
An if-visited: and if-active: characteristic that specifies a sosofo to be used in place of the content of the link flow object when the destination is visited or active. This would not be inherited; the default value would be #f meaning that the content is not replaced when the destination is visited or active.
An if-visited and if-active procedure that takes an address and two sosofos and returns one of the two sosofos according to whether the address is active or visited. This is similar to the if-first-page and if-front-page characteristics used in Jade for simple-page-sequence headers and footers.

Small caps

CSS1 allows small caps to be easily specified using the font-variant property. In DSSSL small-caps are specified with a glyph-subst-table. This is not very convenient since this means that users must explicitly specify the relevant AFII glyph ids.

The DSSSL standard should define a public identifier for use with glyph-subst-method that handles caps and small caps. Maybe a public identifier for old-style digits should also be added.

Font substitution

CSS1 allows the font-family property to be specified as a prioritized list. This prioritized list is used in two ways:

It allows an alternative font-family to be used when a font-family is not available
It allows an alternative font to be searched for a glyph when a a font does not contain a glyph

CSS also allows the use of generic font families (serif, sans-serif, cursive, fantasy, monospace).

DSSSL should be extended to support the following:

Specifying the kind of font. The categories offered by CSS are very coarse and don't address non-Roman scripts. Panose numbers might be one useful approach here.
Specifying alternative font families to try when a specified font-family is not available
Specify alternative font families to search when a font-family does not contain a glyph
Specify a different font-family for each script. This is a very common need for multilingual typesetting. Handling this by simply specifying a list of fonts and using the first font that contains the required glyph doesn't seem desirable. With TrueType it is increasingly common for a font to support multiple scripts. The existing char-script-case procedure handles this, but is hard to implement.
Specify a set of glyphs that a font must have for it to be eligible for selection. If I am typesetting Swedish, I want to use a font that contains all the characters needed for Swedish: I don't want to use one font for the characters in Swedish that are also in English, and another font for the characters in Swedish that are not in English.

These requirements can be met by introducing a new font-family data-type that would be used as the value of a new font-family: characteristic. Procedures would be added that construct a font-family object for a generic font, construct a font-family object from a list of alternate fonts, and compose font-family objects for different scripts to create a new font-family. For example,

(define serif-font
  (script-font-family
     "ISO/IEC 10179:1996//Script::Latin"
     (alternate-font-family "Times Roman"
                            "Times New Roman"
                            (panose-font-family 2 2 6 3 5 4 5 2 3 4)))
     "ISO/IEC 10179:1996//Script::Latin"
     "MS Mincho")

(make paragraph
      font-family: serif-font)

Linking

URLs

A url-address procedure should be added that allows URLs to be used with the link flow object.

When URLs occur directly in documents rather than in the system identifier of entity declarations, the application must be responsible for specifying how relative URLs are to be resolved. In general the base URL for resolving a relative URL should be the URL of the storage object in which the relative URL occurs. One problem is that this information is not available in the grove.

Link flow object

It should be possible to specify whether the destination of the link is shown in a new window when the link is traversed. This could be achieved with a non-inherited, boolean-valued new-window? characteristic on the link flow object.

It should be possible to specify a label for a link that would be displayed like a tool tip when the mouse is within a link flow object. Should this be a sosofo or just a string?

Extended links

The link flow object conveniently supports only links where:

the linking element is one of the resources of the link,
there are exactly two resources,
the link is one-directional, and
the direction of the link is from the linking element resource to the other resource.

This covers simple links in XML, A elements with an HREF attribute in HTML and clinks in HyTime.

DSSSL should be extended so that richer hyperlinking models can be supported. The main problem is how to specify the presentation of a link resources when the resource is separate from the linking element.

Non-SGML packaging

Currently DSSSL specifications are packaged in SGML documents. This provides a lot of flexibility but has a significant implementation cost, and also tends to confuse users.

An alternative simpler method for packaging should be developed that doesn't embed style specifications in SGML documents. With this packaging scheme, specifications would consist entirely of Scheme-like syntax. There would be no declarations relating to character sets: an implementation would get this information by external means (maybe it would support only one character repertoire, or maybe it would use a MIME header). The ability to combine separate style specifications, which is provided at the moment by the SGML packaging with the USE attribute, could be provided in this new packaging method by a declaration:

(import "sysid")

Note that implementations will be automatically be able to distinguish this alternative packaging method from the current one, since after any leading whitespace specs using the alternative packaging method must start with a semi-colon or an open parenthesis, neither of which can begin an SGML document.

Implementation simplifications

Ports

Ports add significant complexity to the implementation. They are also hard to explain. Dsssl-o should be modified so that support for ports is not required.

Ports are currently used in dsssl-o in two places:

table headers and footers: Ports could be avoided by having two additional characteristics on table-part: table-part-n-header-rows which gives the number of initial rows that should be treated as header rows, and table-part-n-footer-rows which gives the number of trailing rows that should be treated as footer rows.
multi-mode flow object: Ports could be avoided by having an additional multi-mode-mode flow object. The multi-mode flow object would have a single port that accepts only flow objects of class multi-mode-mode.

Inserting Objects

A flow object should be added that allows objects to be inserted in a similar manner to the OBJECT tag in HTML.

Java Applets

An applet flow object should be added that allows embedding of Java applets in a similar manner to the APPLET tag in HTML.

Forms

Flow objects should be added that allow forms to be specified as in HTML.

Scripting

It should be possible to use a scripting language to allow the flow object tree to have dynamic behaviour.

Math

There is a proposal for mathematics on the Web called MathML (http://www.w3.org/pub/WWW/TR/WD-math/). This should be examined to determine whether changes are needed to the DSSSL math flow objects to handle this.

Alternative Syntax

Should an alternative syntax be provided? If so, what should it look like? C, CSS, XML?

Miscellaneous

White space

XML documents (especially those without a DTD) will have typically have lots of whitespace that needs to be ignored. This is inconvenient in DSSSL at the moment (process-children-trim helps but doesn't completely solve the problem).

The input-whitespace-treatment characteristic should apply to paragraphs in addition to characters; when the value is collapse, then initial and trailing whitespace should be ignored. I believe this mimics HTML behaviour of current browsers (process-children-trim was designed to do that, but actually doesn't).

Support for element type subclassing

CSS1 allows the CLASS attribute to be conveniently used for sub-classing element types. Obviously DSSSL cannot attach any special meaning to a particular attribute, but it could allow a style sheet to declare a particular attribute as working like CLASS in HTML, and then provide some way for referring to the class in an element construction rule. For example,

(declare-class-syntax "CLASS" "!")

(element P!WARNING  (make paragraph (literal "Warning: ") (process-children)))

One problem would be to decide exactly what the syntax of the class attribute should be:

A NAME attribute
A NAMES attribute
A CDATA attribute treated as a single class (possibly case insensitive)
A CDATA attribute treated as a comma-separated list of classes (possibly case insensitive)