*************************************************************** ****************** WELCOME TO SGML NEWSWIRE ******************* *************************************************************** * * * To subscribe, send mail to sgmlinfo@avalanche.com * * * * (Please pass along to interested colleagues) * * * *************************************************************** MICROSOFT SPEAKS OUT ON SGML ============================ The following was posted by Microsoft to the CompuServe news service in October 1993. It was also announced on SGML Newswire 5 October 1993, but that posting did not include the full text as it is here. Microsoft (R) Word & The SGML Standard INTRODUCTION This document provides background information on Standard Generalized Markup Language (SGML) and briefly discusses a forthcoming product from Microsoft to address the SGML authoring needs of Word users. SGML BACKGROUND What follows is a brief history of the emergence of SGML.1 A Standard is Born The term "markup" originally referred to the marks handwritten on a manuscript by the copy editor or book designer to tell the compositor how the manuscript was to be formatted. With the introduction of computers and their use in typesetting, the markup instructions would typically be embedded in the text of the document through a process called "specific markup." These markup instructions were typically surrounded by obscure control characters to offset them from the body text, making the task of entering them very manual and time consuming. In addition, each new phototypesetting system used its own proprietary markup language, thereby locking consumers into a particular language and vendor. In the early 1980's the Graphics Communications Association (GCA) set out to define a standard markup language known as "GenCode." However, it quickly became apparent that it would be very difficult to build a tag set that was general enough to serve the needs of all typesetter manufacturers without being unwieldy in size and scope. At the same time the GCA was working on solving these problems, an ANSI committee was defining a standard based on another computer typesetting language, Generalized Markup Language (GML). This standard represented the document as a hierarchical tree of different related elements, each of which would be formatted in a certain way. The two organizations combined their efforts and focused on the task of building one standard. In December, 1986 the combined efforts of the committees were introduced by the International Standards Organization (ISO) as standard 8879, SGML. What the Standard Offers First, SGML is a completely open standard which is platform, vendor, and application independent. SGML files are stored as ASCII text which ensures that they can be used on virtually any platform. Second, the power and promise of SGML comes once a document has been marked up with the appropriate SGML tags. By defining structure and relationship within previously unstructured information, SGML enables entirely new ways of managing, publishing and reusing that information. For example, an SGML database could store thousands of tagged documents, and it could use these tags to publish customized versions of the same document on demand. To think about a hypothetical and extremely simple example of how such a system could work, pretend you have a 100,000 page airplane manual. As this document was originally written, every paragraph was tagged with a security clearance to say whether it was unclassified, classified, secret, or top secret. Chapters of the document were tagged to determine their relevance for technicians, air traffic controllers, pilots, and flight attendants. All of this information could then be parsed into an SGML database publishing system to facilitate on-demand publishing. Using the encoded tags to understand the structure of the document, you could create customized versions based on the original data. For example, you could just as easily create a version relevant to pilots with classified security clearance as you could create a version relevant to unclassified air traffic controllers. Because SGML stores information about the structure of the documents and not the formatting, different presentations can be used to suit the distribution model. For example, the information could just as easily be presented as a printed document or viewed by an on-line viewing tool. What were cross references in the printed document (e.g. see figure on page x) become hypertext jumps in the on-line document. By structuring this information, whole new levels of control and flexibility are gained to allow the use and reuse of the content. WHAT IS SGML? So what is SGML? It is a data description language designed for, but not limited to, describing the structure of textual data. An SGML document has two parts: a DTD (Document Type Definition) and a document instance (the actual data). The DTD describes the structure of the instance. It identifies the legal tags in a document and their relationships to each other. The document instance contains the data, delimited by tags defined in the DTD. An Example 1.2 This is a Title, (Top Secret) This is body Text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text. The preceding paragraph could be encoded numerous ways. How it is represented in SGML depends on the DTD used to "mark it up." If you mark it up using the following DTD fragment (taken from a CALS2 DTD), you would get the following SGML. The DTD Fragment Resulting SGML This is a TitleThis is body Text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text, this is body text, This is body text MICROSOFT SGML AUTHOR FOR WORD MARKET BACKGROUND Microsoft's vision of SGML is to broaden the accessibility of this technology without requiring users to understand the details of the technology. The largest problem facing SGML usage today is the increased cost of tagging documents due to decreased productivity. The currently available SGML editing tools are typically not very user friendly and are designed almost exclusively for the UNIX environment. Our approach contrasts rather starkly with the current offerings, and we hope that our product will result in highly increased author productivity by allowing authors to work in a familiar and comfortable editing environment (Word) while still enjoying linkage with SGML. GOALS OF SGML AUTHOR * Make SGML easy, and make SGML authors more productive. * Allow end users to create SGML without knowing "the standard." * Allow MIS to configure the converter for any DTD. HOW SGML AUTHOR WORKS SGML Author has two parts, a converter (end-user focused) and a separate mapping application (MIS focused). The End User Model To author an SGML document the end users simply construct their documents in Word as they normally would, except they must use styles for all formatting. To ensure that they use the styles appropriately, the users format according to an MIS provided style guide and set of Word templates. To create SGML, the user then saves the file as SGML just as they would export to any other file format. Once the user has chosen to save an SGML representation of the file, an ASCII text file is created which contains syntactically correct (i.e. parseable) SGML. To achieve this syntactically correct SGML, the converter may modify the Word file to ensure conformity to the DTD. For example, a DTD might have a element which required that there be at least two in the list. If the user had only created one list item, the converter would create a necessary, albeit empty, second item and inform the user of this fact. The results of any necessary modifications are returned to the user in the form of a new Word file which has been annotated to describe in Word terminology why the file was changed. For example, if the DTD required that always follows and the user did not follow this convention, then the converter would automatically create a structure in the Word document. It would also insert a Word annotation to inform the user why this change was necessary. The user could then determine whether or not this new Word document is semantically (i.e. it has the correct meaning) correct, and then make any appropriate edits. Importantly, the end user needs to know little about SGML throughout the entire process. This usage model is detailed in the diagram below. The MIS Model To ensure that the desired result is achieved, the converter has to be pre-configured to create the appropriate SGML. This is done by creating a mapping file using a provided Mapping Application. This application is geared at the SGML knowledgeable individual, and it allows this individual to build specific mappings between Word templates (i.e. styles) and the structures in the SGML DTD. Where standard DTD's do exist (i.e. CALS), Microsoft will provide pre-assembled mapping files and templates. For customers who have built their own DTD's, they will need to use the mapping application to build corresponding templates and mapping files. This is detailed in the diagram below. Expected Availability This product is planned for commercial availability in the first half of 1994. The initial release will be for Microsoft Windows, and it will be followed by releases for the Apple (R), Macintosh (R) and Windows NT. These products will be sold and distributed as separate add-ons to Microsoft Word, and they will require Word 6.0 or later to run. 1 This is derived from an article by Elizabeth Gilmore in the Journal of the Society for Technical Communication (Volume 40, Number 2), May 1993. 2 The CALS (Computer-Aided Acquisition Logistics Support) Initiative is a program of the US Department of Defense (DOD) and its vendors that requires the use of SGML to maintain documentation, contracts, and contract proposals. CALS governs a vendor's interaction with the DOD. A91993 Microsoft Corporation. All rights reserved. Printed in the United States of America. The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This technical overview is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft, MS, and MS-DOS are registered trademarks and Windows and Windows NT are trademarks of Microsoft Corporation. Apple and Macintosh are registered trademarks of Apple Computer, Inc. 10/93 Part. No. 098-53048 ************************************************************** * SGML NEWSWIRE LIST MANAGER * * * * Linda Turner * * Corporate Communications * * Avalanche * * 947 Walnut Street * * Boulder, CO 80302 * * sgmlinfo@avalanche.com * * linda@avalanche.com * * Vox: (303) 449-5032 * * Fax: (303) 449-3246 * **************************************************************