W3CWD-DOM/WD-level-one-xml-971209

Document Object Model (XML)
Level 1

W3C Working Draft 9-December-1997

This version:
http://www.w3.org/TR/WD-DOM/level-one-xml-971209
Latest version:
http://www.w3.org/TR/WD-DOM/level-one-xml
Previous version:
http://www.w3.org/TR/WD-DOM/level-one-xml-971009
WG Chair:
Lauren Wood, SoftQuad, Inc.
Editors:
Gavin Nicol, INSO
Mike Champion, ArborText
Principal contributors:
Vidur Apparao, Netscape; Mike Champion, ArborText; Scott Isaacs, Microsoft; Arnaud Le Hors, W3C; Gavin Nicol, INSO; Peter Sharpe, SoftQuad, Inc.; Bill Smith, Sun Microsystems Inc; Jared Sorensen, Novell; Bob Sutor, IBM

Status

This document is part of the Document Object Model Specification; check the W3C web site for its current status. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". Note: Since working drafts are subject to frequent change, you are advised to check the list of current W3C working drafts.

Abstract

The Document Object Model (DOM) level one provides a mechanism for software developers and web script authors to access and manipulate parsed HTML and XML content. This document defines a set of objects that extends the Document Object Model (Core) such that the combination can represent all parts of a parsed XML document, and to allow XML validity checkers to be written using the interfaces defined herein.

Languages used

As in the Core DOM specification, the primary Document Object Model type definitions are presented using the Object Management Group's Interface Definition Language (IDL, ISO standard 14750).


Section 1. Introduction

The DOM Level One (Core) specification defines a set of object definitions that are sufficient to represent a document instance (the objects that occur within the document itself). This specification extends the DOM Level One (Core) specification such that document type definitions, entities, CDATA marked sections, and conditional sections can also be represented.

The objects and interfaces defined within this document are sufficient to allow validators and other applications that make use of a DTD (Document Type Definition) to be written. For editors, the interfaces defined here will probably be insufficient for fine-grained editing, where information about the document type declaration may be necessary, though structural isomorphism should be easily accomplished.


Section 2. Document Type Definition support overview

A Document Type Definition (DTD) defines three things:

  1. A definition of a grammar for a markup language.
  2. A (possibly empty) set of entity declarations.
  3. A (possibly empty) set of notation declarations.

This specification gives access to all of these, though only in the post-parse form.

From a practical point of view, this means that while all the information contained within a DTD is available, not all of the information about what created it is. Parameter entity references, for example, are assumed to have been already expanded, and hence, their boundaries are lost.


Section 3. Descriptions of objects related to the Document Type Definition

This section describes the objects that are used to represent the DTD of a document. The objects are not specific to XML, although some attributes are specific to the HTML DTD. Such cases are clearly marked.


3.1. DocumentType

interface DocumentType : Node {
  attribute wstring  name;

  attribute NodeEnumerator externalSubset;
  attribute NodeEnumerator internalSubset;
  
  attribute NamedNodeList  generalEntities;
  attribute NamedNodeList  parameterEntities;
  attribute NamedNodeList  notations;
  attribute NamedNodeList  elementTypes;
};

Each document has a (possibly null) attribute that contains a reference to a DocumentType object. The DocumentType class provides an interface to access all of the entity declarations, notation declarations, and all the element type declarations.


3.1.1. Attributes

name

The name attribute is a wstring that holds the name of the DTD; i.e. the name immediately following the DOCTYPE keyword.

externalSubset

The externalSubset attribute is an enumerator that allows iteration over the list of nodes (definitions) that occurred in the external subset of a document. In this example:

<!DOCTYPE ex SYSTEM "ex.dtd" >
<ex/>

it would iterate over all of the declarations that occurred within the ex.dtd external entity.

Note: An iterator interface is used so as to not constrain implementations

internalSubset

The internal subset iterates over all the definitions that occurred within the internal subset of a document (the part that appears within the document instance). For example

<!DOCTYPE ex SYSTEM "ex.dtd" [
<!ENTITY ex "example">
]>
<ex/>

if would iterate over a single node: the definition of the ex entity.
Note: An iterator interface is used so as to not constrain implementations

generalEntities

This is a NamedNodeList providing an interface to the list of general entities that were defined within the external and the internal subset. For example in:

<!DOCTYPE ex SYSTEM "ex.dtd" [
<!ENTITY foo "foo">
<!ENTITY bar "bar">
<!ENTITY % baz "baz">
]>
<ex/>

the interface would provide access to foo and bar but not baz. All objects supporting the Node interface that are accessed though this attribute, will also support the Entity interface (defined below).

parameterEntities

This is a NamedNodeList providing an interface to the list of parameter entities that were defined within the external and the internal subset. In the example above, the interface would provide access to baz but not foo or bar. All objects supporting the Node interface that are accessed though this attribute, will also support the Entity interface (defined below).

notations

This is a NamedNodeList providing an interface to the list of notations that were defined within the external and the internal subset. All objects supporting the Node interface that are accessed though this attribute, will also support the Notation interface (defined below).

elementTypes

This is a NamedNodeList providing an interface to the list of element types that were defined within the external and the internal subset. All objects supporting the Node interface that are accessed though this attribute, will also support the ElementDefinition interface (defined below).


3.2. ElementDefinition

interface ElementDefinition : Node {
  enum ContentType {
    EMPTY,
    ANY,
    PCDATA,
    MODEL_GROUP
  };

  attribute wstring        name;
  attribute ContentType    contentType;
  attribute ModelGroup     contentModel;

  attribute NamedNodeList  attributeDefinitions;
  attribute StringList     inclusions;
  attribute StringList     exceptions;
};

The definition of each element defined within the external or internal subset (providing it is parsed), will be available through the elementTypes attribute of the DocumentType object. The name, attribute list, and content model are all available for inspection.


3.2.1. Attributes

name

This is the name of the type of element being defined.

contentType

This attribute specifies the type of content of the element. The different types are:

EMPTY

The element is an empty element, and cannot have content.

ANY

The element may have character data, or any of the other elements defined within the DTD as content, in any order and sequence.

PCDATA

The element can have only PCDATA (Parsed Character Data) as content.

MODEL_GROUP

The element has a specific content model associated with it. The model is accessible through the contentModel attribute (below).

contentModel

If the contentType is MODEL_GROUP, then this will provide access to a ModelGroup (below) object that is the root of the content model object hierarchy for this element. For other content types, this will be null.

attributeDefinitions

This NamedNodeList provides an interface for accessing the list of attributes that were defined to be on an ElementDefinition. Each object supporting the Node interface that is accessed through this attribute will also support the AttributeDefinition interface.

inclusions

This provides an interface to a list of element type names that are included in the content model of this element by the SGML inclusion/exception mechanism (not available from XML, but used in HTML).

exceptions

This provides an interface to a list of element type names that are excluded from the content model of this element by the SGML inclusion/exception mechanism (not available from XML, but used in HTML).


3.3. ModelGroup

enum OccurrenceType {
  OPT,     // ?
  PLUS,    // +
  REP      // *
};

interface PCDATAToken : Node {
  // Token type for the string #PCDATA
};

interface ElementToken: Node {
  attribute wstring          name;
  attribute OccurrenceType   occurrence;
};

interface ModelGroup : Node {
  enum ConnectionType {
    OR,   // |
    SEQ,  // ,
    AND
    };

  attribute ConnectionType  connector;
  attribute OccurrenceType  occurrence;
  attribute NodeList        tokens;
};

The ModelGroup object represents the content model of an ElementDefinition. The content model is represented as a tree, where each node specifies how its children are connected, and the number of times that it can occur within its parent. Leaf nodes in the tree are either PCDATAToken or ElementToken.


3.3.1. Attributes for ModelGroup

connector

This attribute specifies how the members of tokens are joined together.

occurrence

This specifies how often this ModelGroup may occur at its position in the content model.

tokens

This provides access to the list of tokens that are allowed within this ModelGroup. Note that only PCDATAToken and ElementToken may occur within the token list.


3.3.2. Attributes on ElementToken

name

This is the type name for the element.

occurrence

This indicates how many times this element can occur in its position in the content model.


3.4. AttributeDefinition

interface AttributeDefinition : Node {
  enum DeclaredValueType {
    CDATA,
    ID,
    IDREF,
    IDREFS,
    ENTITY,
    ENTITIES,
    NMTOKEN,
    NMTOKENS,
    NOTATION,
    NAME_TOKEN_GROUP
    };

  enum DefaultValueType {
    FIXED,
    REQUIRED,
    IMPLIED
    };

  attribute wstring            name;
  attribute StringList         allowedTokens;
  attribute DeclaredValueType  declaredType;
  attribute DefaultValueType   defaultType;
  attribute NodeList           defaultValue;
};

The AttributeDefinition interface is used to access information about a particular attribute definition on a given element. Object supporting this interface are available from the ElementDefinition object through the attributeDefinitions attribute.


3.4.1. Attributes

name

The name of the attribute.

allowedTokens

The list of tokens that are allowed as values. For example, in

<!DOCTYPE ex [
<!ELEMENT ex (#PCDATA) >
<!ATTLIST ex test (FOO|BAR) "FOO" >
]>
<ex></ex>

this would hold FOO and BAR.

declaredType

This attribute indicates the type of values the attribute may contain.

defaultType

This specifies whether the attribute must be specified in the instance, and if it is not, what the attribute value will be if not provided.

defaultValue

This provides an interface to a list of Nodes that make up the default value for an attribute. This value is used if the attribute was not given an explicit value in the document instance.


3.5. Notation

interface Notation : Node {
  attribute wstring name;
  
  attribute boolean isPublic;

  attribute string  publicIdentifier;
  attribute string  systemIdentifier;
};

The Notation object is used to represent the definition of a notation within a DTD.


3.5.1. Attributes

name

This is the name of the notation.

isPublic

If a public identifier was specified in the notation declaration, this will be TRUE, and the publicIdentifier attribute will contain the string for the public identifier.

publicIdentifier

If a public identifier was specified in the notation declaration, this will hold the public identifier string, otherwise it will be null.

systemIdentifier

If a system identifier was specified in the notation declaration, this will hold the system identifier string, otherwise it will be null.


Section 4. Descriptions of objects related to Entities

To be written.


Section 5. Descriptions of objects related to CDATA and Conditional Sections

CDATA and conditional sections are objects specific to XML. CDATA sections are used in the document instance, and conditional sections in the DTD.


5.1. CDATA Sections

interface CDATASection : Node {
  attribute wstring content;
};

CDATA sections are used in the document instance, and provide a region in which most of the XML delimiter recognition does not take place. The primary purpose is for including material such as XML fragments, without needing to escape all the delimiters.


5.1.1. Attributes

content

This holds the text that was contained by the CDATA section. Note that this may contain characters that need to be escaped outside of CDATA sections.


5.2. Conditional Sections

interface ConditionalSection : Node {
  attribute boolean    included;
  attribute Node       condition;
  attribute NodeList   content;
};

Conditional sections are used in the DTD to provide a limited form of control over inclusion or exclusion of DTD fragments.


5.2.1. Attributes

included

This is a flag indicating whether this section was included during parsing.

condition

This Node indicates the condition. Generally, it will be a Text node containing either INCLUDE or IGNORE.

content

The content of this section.

Appendix A. IDL Interface for Document Type Definitions

typedef sequence<wstring> StringList;

interface DocumentType : Node {
  attribute wstring  name;

  attribute NodeEnumerator externalSubset;
  attribute NodeEnumerator internalSubset;
  
  attribute NamedNodeList  generalEntities;
  attribute NamedNodeList  parameterEntities;
  attribute NamedNodeList  notations;
  attribute NamedNodeList  elementTypes;
};

enum OccurrenceType {
  OPT,     // ?
  PLUS,    // +
  REP      // *
};

interface ModelGroup : Node {
  enum ConnectionType {
    OR,   // |
    SEQ,  // ,
    AND
    };

  attribute ConnectionType  connector;
  attribute OccurrenceType  occurrence;
  attribute NodeList        tokens;
};

interface ElementDefinition : Node {
  enum ContentType {
    EMPTY,
    ANY,
    PCDATA,
    MODEL_GROUP
  };

  attribute wstring        name;
  attribute ContentType    contentType;
  attribute ModelGroup     contentModel;

  attribute NamedNodeList  attributeDefinitions;
  attribute StringList     inclusions;
  attribute StringList     exceptions;
};

interface PCDATAToken : Node {
  // Token type for the string #PCDATA
};

interface ElementToken: Node {
  attribute wstring          name;
  attribute OccurrenceType   occurrence;
};

interface AttributeDefinition : Node {
  enum DeclaredValueType {
    CDATA,
    ID,
    IDREF,
    IDREFS,
    ENTITY,
    ENTITIES,
    NMTOKEN,
    NMTOKENS,
    NOTATION,
    NAME_TOKEN_GROUP
    };

  enum DefaultValueType {
    VALUE, 
    FIXED,
    REQUIRED,
    IMPLIED
    };

  attribute wstring            name;
  attribute StringList         allowedTokens;
  attribute DeclaredValueType  declaredType;
  attribute DefaultValueType   defaultType;
  attribute NodeList           defaultValue;
};

interface Notation : Node {
  attribute wstring name;
  
  attribute boolean isPublic;

  attribute string  publicIdentifier;
  attribute string  systemIdentifier;
};

Appendix B. IDL Interface for Entities

typedef sequence<octet&> buffer;

interface Entity : Node {
  attribute wstring name;
  attribute boolean isParameterEntity;
};

interface InternalEntity : Entity {
  attribute wstring content;
};

interface ExternalEntity : Entity {
  attribute boolean isNDATA;
  attribute boolean isPublic;

  attribute string  publicIdentifier;
  attribute string  systemIdentifier;
};

interface ExternalTextEntity : ExternalEntity {
  attribute wstring content;
};

interface ExternalNDATAEntity : ExternalEntity {
  attribute Notation  notation;
  attribute buffer    content;
};

interface NDATA : Node {
  attribute buffer content;
};

Appendix C. IDL Interface for CDATA and Conditional Sections

interface CDATASection : Node {
  attribute wstring content;
};

interface ConditionalSection : Node {
  attribute boolean    included;
  attribute Node       condition;
  attribute NodeList   content;
};

Appendix D. Java XML API definitions

//
// Note: the IDL contains the following definition for a StringList:
//
// 	 typedef sequence<String> StringList;
//
// Because Java does not support templates, we are using a Vector for this.
//

public interface DocumentType extends Node {

  void setName(String name);
  String getName();

  void setExternalSubset(NodeList externalSubset);
  NodeList getExternalSubset();

  void setInternalSubset(NodeList internalSubset);
  NodeList getInternalSubset();
  
  void setNotations(NamedNodeList notations);
  NamedNodeList getNotations();

  void setElementTypes(NamedNodeList elementTypes);
  NamedNodeList getElementTypes();

};

public final class OccurrenceType {
  public final int OPT  = 0;    // ?
  public final int PLUS = 1;    // +
  public final int REP  = 2;    // *
};

public interface ElementDefinition extends Node {

  public final class ContentType {
    public final int EMPTY 		 = 0;
    public final int ANY 		 = 1;
    public final int PCDATA 	 = 2;
    public final int MODEL_GROUP = 3;
  };

  void setName(String name);
  String getName();

  // The ints for the following two methods should be
  // constants defined in the ContentType class.

  void setContentType(int contentType);
  int getContentType();

  void setContentModel(ModelGroup contentModel);
  ModelGroup getContentModel();

  void setAttributeDefinitions(NamedNodeList attributeDefinitions);
  NamedNodeList getAttributeDefinitions();

  void setInclusions(Vector inclusions);
  Vector getInclusions();

  void setExceptions(Vector exceptions);
  Vector getExceptions();

};

public interface ModelGroup extends Node {
  
  public final class ConnectionType {
    public final int OR =  0;  // |
    public final int SEQ = 1;  // ,
    public final int AND = 2;
  };

  // The ints for the following two methods should
  // be constants defined in the ConnectionType class.

  void setConnector(int connector);
  int getConnector();

  // The ints for the two methods below should be
  // constants defined in the OccurrenceType class.

  void setOccurrence(int occurrence);
  int getOccurrence();

  void setTokens(NodeList tokens);
  NodeList getTokens();

};

public interface PCDATAToken extends Node {
  // Token type for the string #PCDATA
};

public interface ElementToken extends Node {

  void setName(String name);
  String getName();

  // The ints for the following two methods should be
  // constants defined in the OccurrenceType class.

  void setOccurrence(int occurrence);
  int getOccurrence();

};

public interface AttributeDefinition extends Node {

  public final class DeclaredValueType {
    public final int CDATA 				= 0;
    public final int ID 				= 1;
    public final int IDREF 				= 2;
    public final int IDREFS 			= 3;
    public final int ENTITY 			= 4;
    public final int ENTITIES 			= 5;
    public final int NMTOKEN 			= 6;
    public final int NMTOKENS 			= 7;
    public final int NOTATION 			= 8;
    public final int NAME_TOKEN_GROUP 	= 9;
  };

  public final class DefaultValueType {
    public final int VALUE 				= 0;
    public final int FIXED 				= 1;
    public final int REQUIRED 			= 2;
    public final int IMPLIED 			= 3;
  };

  void setName(String name);
  String getName();

  void setAllowedTokens(Vector allowedTokens);
  Vector getAllowedTokens();

  // The ints for the following two methods should be
  // constants declared in the DeclaredValueType class.

  void setDeclaredType(int declaredType);
  int getDeclaredType();

  // The ints for the following two methods should be
  // constants declared in the DefaultValueType class.

  void setDefaultType(int defaultType);
  int getDefaultType();

  void setDefaultValue(NodeList defaultValue);
  NodeList getDefaultValue();

};

public interface Notation extends Node {

  void setName(String name);
  String getName();

  void setIsPublic(boolean isPublic);
  boolean getIsPublic();

  void setPublicIdentifier(String publicIdentifier);
  String getPublicIdentifier();

  void setSystemIdentifier(String systemIdentifier);
  String getSystemIdentifier();

};

public interface CDATASection extends Node {

  void setContent(String content);
  String getContent();

};

Appendix E. ECMAScript XML API definitions

(This section has yet to be written.)

Appendix F: Glossary

There are a large number of terms that the DOM uses which may not be familiar to many of the readers. We suggest that you review the glossary if you encounter terms that aren't familiar.