***************************************************************
****************** WELCOME TO SGML NEWSWIRE *******************
***************************************************************
* *
* To subscribe, send mail to sgmlinfo@avalanche.com *
* *
* (Please pass along to interested colleagues) *
* *
***************************************************************
MICROSOFT SPEAKS OUT ON SGML
============================
The following was posted by Microsoft to the CompuServe news
service in October 1993. It was also announced on SGML Newswire
5 October 1993, but that posting did not include the full text
as it is here.
Microsoft (R) Word & The SGML Standard
INTRODUCTION
This document provides background information on Standard
Generalized Markup Language (SGML) and briefly discusses a
forthcoming product from Microsoft to address the SGML authoring
needs of Word users.
SGML BACKGROUND
What follows is a brief history of the emergence of SGML.1
A Standard is Born
The term "markup" originally referred to the marks handwritten
on a manuscript by the copy editor or book designer to tell the
compositor how the manuscript was to be formatted. With the
introduction of computers and their use in typesetting, the
markup instructions would typically be embedded in the text of
the document through a process called "specific markup." These
markup instructions were typically surrounded by obscure control
characters to offset them from the body text, making the task of
entering them very manual and time consuming.
In addition, each new phototypesetting system used its own
proprietary markup language, thereby locking consumers into a
particular language and vendor.
In the early 1980's the Graphics Communications Association
(GCA) set out to define a standard markup language known as
"GenCode." However, it quickly became apparent that it would be
very difficult to build a tag set that was general enough to
serve the needs of all typesetter manufacturers without being
unwieldy in size and scope. At the same time the GCA was working
on solving these problems, an ANSI committee was defining a
standard based on another computer typesetting language,
Generalized Markup Language (GML). This standard represented the
document as a hierarchical tree of different related elements,
each of which would be formatted in a certain way. The two
organizations combined their efforts and focused on the task of
building one standard. In December, 1986 the combined efforts of
the committees were introduced by the International Standards
Organization (ISO) as standard 8879, SGML.
What the Standard Offers
First, SGML is a completely open standard which is platform,
vendor, and application independent. SGML files are stored as
ASCII text which ensures that they can be used on virtually any
platform. Second, the power and promise of SGML comes once a
document has been marked up with the appropriate SGML tags. By
defining structure and relationship within previously
unstructured information, SGML enables entirely new ways of
managing, publishing and reusing that information. For example,
an SGML database could store thousands of tagged documents, and
it could use these tags to publish customized versions of the
same document on demand.
To think about a hypothetical and extremely simple example of
how such a system could work, pretend you have a 100,000 page
airplane manual. As this document was originally written, every
paragraph was tagged with a security clearance to say whether it
was unclassified, classified, secret, or top secret. Chapters of
the document were tagged to determine their relevance for
technicians, air traffic controllers, pilots, and flight
attendants. All of this information could then be parsed into an
SGML database publishing system to facilitate on-demand
publishing. Using the encoded tags to understand the structure
of the document, you could create customized versions based on
the original data. For example, you could just as easily create
a version relevant to pilots with classified security clearance
as you could create a version relevant to unclassified air
traffic controllers. Because SGML stores information about the
structure of the documents and not the formatting, different
presentations can be used to suit the distribution model. For
example, the information could just as easily be presented as a
printed document or viewed by an on-line viewing tool. What
were cross references in the printed document (e.g. see figure
on page x) become hypertext jumps in the on-line document. By
structuring this information, whole new levels of control and
flexibility are gained to allow the use and reuse of the
content.
WHAT IS SGML?
So what is SGML? It is a data description language designed for,
but not limited to, describing the structure of textual data. An
SGML document has two parts: a DTD (Document Type Definition)
and a document instance (the actual data). The DTD describes the
structure of the instance. It identifies the legal tags in a
document and their relationships to each other. The document
instance contains the data, delimited by tags defined in the
DTD.
An Example
1.2 This is a Title, (Top Secret) This is body Text, this is
body text, This is body text, this is body text, This is body
text, this is body text, This is body text, this is body text,
This is body text.
The preceding paragraph could be encoded numerous ways. How it
is represented in SGML depends on the DTD used to "mark it up."
If you mark it up using the following DTD fragment (taken from a
CALS2 DTD), you would get the following SGML.
The DTD Fragment
Resulting SGML
This is a TitleThis is body Text, this is body text, This
is body text, this is body text, This is body text, this is body
text, This is body text, this is body
text, This is body text, this is body text, This is body text,
this is body text, This is body text
MICROSOFT SGML AUTHOR FOR WORD
MARKET BACKGROUND
Microsoft's vision of SGML is to broaden the accessibility of
this technology without requiring users to understand the
details of the technology. The largest problem facing SGML usage
today is the increased cost of tagging documents due to
decreased productivity. The currently available SGML editing
tools are typically not very user friendly and are designed
almost exclusively for the UNIX environment. Our approach
contrasts rather starkly with the current offerings, and we hope
that our product will result in highly increased author
productivity by allowing authors to work in a familiar and
comfortable editing environment (Word) while still enjoying
linkage with SGML.
GOALS OF SGML AUTHOR
* Make SGML easy, and make SGML authors more productive.
* Allow end users to create SGML without knowing "the standard."
* Allow MIS to configure the converter for any DTD.
HOW SGML AUTHOR WORKS
SGML Author has two parts, a converter (end-user focused) and a
separate mapping application (MIS focused).
The End User Model
To author an SGML document the end users simply construct their
documents in Word as they normally would, except they must use
styles for all formatting. To ensure that they use the styles
appropriately, the users format according to an MIS provided
style guide and set of Word templates. To create SGML, the user
then saves the file as SGML just as they would export to any
other file format.
Once the user has chosen to save an SGML representation of the
file, an ASCII text file is created which contains syntactically
correct (i.e. parseable) SGML. To achieve this syntactically
correct SGML, the converter may modify the Word file to ensure
conformity to the DTD. For example, a DTD might have a
element which required that there be at least two in the
list. If the user had only created one list item, the converter
would create a necessary, albeit empty, second item and inform
the user of this fact. The results of any necessary
modifications are returned to the user in the form of a new Word
file which has been annotated to describe in Word terminology
why the file was changed. For example, if the DTD required that
always follows and the user did not follow this
convention, then the converter would automatically create a
structure in the Word document. It would also insert a
Word annotation to inform the user why this change was
necessary. The user could then determine whether or not this new
Word document is semantically (i.e. it has the correct meaning)
correct, and then make any appropriate edits. Importantly, the
end user needs to know little about SGML throughout the entire
process. This usage model is detailed in the diagram below.
The MIS Model
To ensure that the desired result is achieved, the converter has
to be pre-configured to create the appropriate SGML. This is
done by creating a mapping file using a provided Mapping
Application. This application is geared at the SGML
knowledgeable individual, and it allows this individual to build
specific mappings between Word templates (i.e. styles) and the
structures in the SGML DTD. Where standard DTD's do exist (i.e.
CALS), Microsoft will provide pre-assembled mapping files and
templates. For customers who have built their own DTD's, they
will need to use the mapping application to build corresponding
templates and mapping files. This is detailed in the diagram
below.
Expected Availability
This product is planned for commercial availability in the first
half of 1994. The initial release will be for Microsoft Windows,
and it will be followed by releases for the Apple (R), Macintosh
(R) and Windows NT. These products will be sold and distributed
as separate add-ons to Microsoft Word, and they will require
Word 6.0 or later to run.
1 This is derived from an article by Elizabeth Gilmore in
the Journal of the Society for Technical Communication
(Volume 40, Number 2), May 1993.
2 The CALS (Computer-Aided Acquisition Logistics Support)
Initiative is a program of the US Department of Defense
(DOD) and its vendors that requires the use of SGML to
maintain documentation, contracts, and contract
proposals. CALS governs a vendor's interaction with the
DOD.
A91993 Microsoft Corporation. All rights reserved. Printed in
the United States of America.
The information contained in this document represents the
current view of Microsoft Corporation on the issues discussed as
of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot
guarantee the accuracy of any information presented after the
date of publication.
This technical overview is for informational purposes only.
MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS
SUMMARY.
Microsoft, MS, and MS-DOS are registered trademarks and Windows
and Windows NT are trademarks of Microsoft Corporation. Apple
and Macintosh are registered trademarks of Apple Computer, Inc.
10/93 Part. No. 098-53048
**************************************************************
* SGML NEWSWIRE LIST MANAGER *
* *
* Linda Turner *
* Corporate Communications *
* Avalanche *
* 947 Walnut Street *
* Boulder, CO 80302 *
* sgmlinfo@avalanche.com *
* linda@avalanche.com *
* Vox: (303) 449-5032 *
* Fax: (303) 449-3246 *
**************************************************************