/*------------------------------------------------------------------------- PROJECT YAO PORTABLE OBJECT-ORIENTED ENTITY MANAGER (POEM) Version 1.0 Alpha level 2.0 ------------------------------------------------------------------------- (C) 1993 Yuan-ze Institute of Technology Distributed under the terms of the License and Disclaimer of Warranties for Project YAO Materials, which includes the following text: "YUAN-ZE hereby grants to any user: (1) an irrevocable royalty-free, worldwide, non-exclusive license to use, execute, reproduce, display, perform and distribute copies of, and to prepare derivative works based upon these materials; and (2) the right to authorize others to do any of the foregoing." The full text of the License and Disclaimer of Warranties for Project YAO Materials should be consulted before using these materials. ------------------------------------------------------------------------- //// poemspec.txt /// POEM specification // Erik Naggum & Charles F. Goldfarb 8-9-94 -------------------------------------------------------------------------*/ The Portable Object-oriented Entity Manager (POEM) is an implementation of an entity manager that satisfies the requirements set forth in Charles Goldfarb's paper, "Entity Management in SGML". That paper, to a large extent, serves as the specification for POEM. This paper provides further details in those areas that require it. NOTE: POEM is currently at Alpha release 2.0 status, so this specification is subject to change. There are two parts to POEM: a) a services module, tentatively known as Virtual Entity to Real Storage Environment (VERSE) server, a preliminary version of which has been implemented; and b) external identifier mapping to VERSE services (not yet implemented), from both system identifiers and public identifier catalog entries. 1. System identifier syntax In POEM, a system identifier must be a "Formal System Identifier" (FSI), as described below. 1.1. Full syntax An FSI consists of a list of storage object specifications, separated by the double solidi known from the formal public identifier. The components of a storage object specification are separated by the double colon introduced in ISO/IEC 9070. The storage object specification consists of four parts: the storage system type, the storage object identifier, a record boundary indicator, and a list of dimspecs to specify a list of substrings from the text, and an overrun handling keyword ("trunc" or "error"). The dimspec list is optional, and defaults to "1 -1", or the entire object; overrun handling defaults to "trunc". The record boundary indicator is also optional, and defaults to the first actual line terminator in the storage object. Since the substring specification refers to byte counts in the actual storage object, and this means that the file has a known line terminator convention, the record boundary indicator is required when substrings are specified. If the storage object identifier contains double solidi or double colons, it can be delimited with either LITA or LIT in the reference concrete syntax, (but not the characters used to delimit the system identifier itself). In case both occur in the storage object identifier, it can be delimited with asterisks. Example: For a storage system type "FILE", to denote the local file system, a storage object identifier "example.sgml" to denote a local file name, the record boundary indicated by the line terminator "CRLF", and the dimspec "1 500" with an error reported for overruns, the system identifier would be "FILE::example.sgml::CRLF::1 500 ERROR". 1.2. Minimized syntax An FSI containing a single storage object specification with the local file system as the storage system type, a storage object identifier (i.e., file name) that does not contain double solidi or double colons, and defaulted record boundary indicator and substring specification, may be expressed as the storage object identifier alone. This minimization supports the common and intuitive notion of the system identifier as being a local file name. 1.3. Formal definition system identifier = storage object specification, ( "//", storage object specification )* storage object specification = ( storage system type, "::", storage object identifier, ( "::", (record boundary indicator | line terminator), ( "::", substring specification )? )? ) | storage object identifier storage system type = name storage object identifier = character data | ( LIT, character data, LIT ) | ( LITA, character data, LITA ) | ( "*", character data, "*" ) record boundary indicator = line terminator | "NONE" | "RMS" | ( "RERS", number, number ) line terminator = "LF" | "CR" | "CRLF" | "LFCR" substring specification = dimspec+, ( "TRUNC" | "ERROR" )? 1.4. Storage system type, storage object identifier More than one storage system type may be supported by the application. In order to support a storage system type, the application needs to inform the entity manager of the storage system types supported, and to supply it with the necessary functions for each supported storage system. There are two predefined storage systems that all entity managers will support: "FILE", and "FUNCHAR". "FILE" is the local file system, and the storage object identifier is a filename or a pathname. "FUNCHAR" is a pseudo-storage system that allows the function characters RE, RS, and SPACE to be placed between files or other objects that may not otherwise contain them. The storage object identifier is one or more of "RE", "RS", and "SPACE", separated by spaces and record boundaries. Other storage system types may be defined. The storage object identifier will be passed to the storage system when the object is accessed, and may contain whatever the storage system requires. The storage object identifier is not interpreted by the entity manager (except to remove delimiters), but passed to the storage manager as is, so that the storage manager may make any use of it that it sees fit. It is an error for a system identifier to contain a storage system type that has not been enrolled with the entity manager. 2.2. Record boundary indicator ("RBI") The purpose of the record boundary indicator is to support more than one file format stored on a central server or between systems that import files from one another without converting the line terminator convention. This could be said to be a mistake on the part of the user (not necessarily human), but it often requires manual intervention and the mistakes are not always easy to forestall. The keyword "NONE" means that no attempt will be made to interpret the input as consisting of records. This is the required mode if the text is to be read as blocks instead of characters. The keyword "RMS" stands for "Record Management System". If no record boundary indicator is specified, the first of the four line terminator conventions ("LF", "CR", "CRLF" and "LFCR" ) found in the storage object (if any) will be treated as the record boundary indicator for the rest of that storage object. If none is found, "NONE" is assumed. If each block is smaller than the expected block size, "RMS" is assumed. The keyword "RERS" is followed by the character numbers used for the RE and RS, respectively. It indicates that record boundary insertion has already been performed for the entity. It is an error for an RBI to be present and not to be one of the indicated values. 1.5. Substring specification The substring specification refers to the actual storage units (bytes) in the object. It is specified using the HyTime dimspec format. Since the dimspec format allows an offset from the end of the object, and not all storage systems are able to ascertain the size of objects beforehand, the maximum negative offset from the end of the object that is guranteed to work for all storage systems is more or less arbitrarily set to -4 (in order to allow truncation of unwanted trailing line terminators or end-of-file markers). If the storage system can support them, it may accept higher negative offsets, but is not required to do so. The intention is to allow the system to read to the end of the storage object, and be able to locate the end within the last read buffer. Note that more than one dimspec is allowed in the specification, so that more than one substring from the same object can be used without requiring the full storage object specification for that object again. 2.0. Public identifier catalog entries The public identifier catalog is constructed by POEM from one or more "mapping table" entities. A mapping table is a sequence of external identifiers, in the reference concrete syntax, in which both the public identifier and system identifier are present. Comment parameters and comment declarations are permitted. mapping table = (external identifier | comment)* external identifier = ( "PUBLIC", s+, unrestricted public identifier, s+, system identifier, s+ ) where "comment", "s", and "system identifier" are all defined by SGML, and "unrestricted public identifier" is defined in "Entity Management in SGML".