This library provides procedures for converting multifont text created with the jrichtext.tcl library (or with compatible tags) in a Tk text widget to a variety of other formats, including a generic `Save As...' panel you can use to prompt your users for a filename and file type.
The library contains a lot of procedures, currently only the most important public procedures are documented. If all you want to do is let your users save the contents of a richtext widget in various formats, the only procedure you'll need is j:tc:saveas.
Currently, the following output formats are supported:
* Tclformat richtext as supported by jrichtext.tcl
* TeX
* HTML
* PostScript
With one exception, only font information is converted; underlining, colours, and other tags are not (yet :-) converted. The exception is that jdoc hypertext links to other documents (i.e., not within the same document) are preserved after a fashion when converting to HTML. The HTML links generated for links to other jdoc documents (as opposed to standard URLs) may need handediting, however, since relative links in HTML documents and jdoc don't follow the same rules.
Thanks to Miguel Santana <santana@imag.fr> for permission to use the /reencodeISO procedure from his a2ps program when converting richtext to PostScript.
This library considers the following tags:
richtext:font:roman
richtext:font:italic
richtext:font:bold
richtext:font:bolditalic
richtext:font:typewriter
richtext:font:heading0
richtext:font:heading1
richtext:font:heading2
richtext:font:heading3
richtext:font:heading4
richtext:font:heading5
jdoc:link:link
(where link is a URL or the name of a jdoc document)
j:tc:saveas t
t is the text widget whose content is to be converted
This procedure brings up a File Selection panel with an option button that lets the user choose among the supported file formats. When the user chooses a format and a name and clicks OK or presses Return, the text widget t is saved in the chosen format in the specified file.
The File Selection panel seems to cause Tk scripts to crash under at least some beta versions of Tk 4.0.
All of the following take as their sole argument a text widget whose contents are to be converted, and return the contents of that text widget converted to the given format as their value. (Note that this can mean that you're schlepping around some fairly large strings.)
j:tc:tclrt:convert_text t - convert to Tclformat richtext; see jrichtext.tcl
j:tc:tex:convert_text t - convert to TeX source
j:tc:html:convert_text t - convert to HTML (without links, currently)
j:tc:ps:convert_text t - convert to PostScript
Because it's designed to write into a text widget, this is the most faithful format.
The distinction between j:rt:par and two successive j:rt:cr's is lost when converting text that was generated with the jrichtext.tcl library, but it's not actually reflected in the text widget in the first place.
The TeX generated by j:tc:tex:convert_text works, but it's really weird and unnecessarily verbose. It makes lots of characters active and changes some standard parameters, so if you try to embed it in TeX documents of your own you should enclose it in braces. If you don't use any nonASCII characters, you can trim off most of the preamble, which provides support for the ISO 8859-1 character set.
Tabs are converted to a fixed amount of whitespace, and spaces at the beginning of a line are lost. Multiple blank lines are also lost.
When converting to HTML, tabs are lost, as is any spacing at the beginnings of lines.
The distinction between paragraphs and line breaks is lost; all sequences of line breaks are translated as a single <P> code.
The linebreaking algorithm is hideous, and long words are likely to be wrapped across lines.
Tabs are rendered as a fixed amount of space. Spaces occasionally appear at the beginnings of lines when they shouldn't (similarly to the way they do in the Tk text widget).
What is generated is actually a PostScript program that generates the formatting, rather than a set of simple page descriptions, so it makes a lot of demands on your PostScript interpreter, and may print more slowly than you expect. Also, it doesn't conform to the PostScript comment conventions (it can't), so tools that need to work with PostScript files pagebypage will fail.
The ISO 8859-1 character set is supported only if you have a Level 2 PostScript interpreter (or at least an interpreter than knows ISOLatin1Encoding).
* The code needs to be reorganised. Code is shared between different formats that shouldn't be, and code isn't shared that should be.
* Whitespace is often lost or garbled in many of the formats.
* Much of the code is pretty inefficient.
* Tags other than font tags should be handled, for instance, colour and underlining should be supported (where possible).
* In addition to improving the existing conversions (and they really need it!), I'd like to provide modes for plaintext (with lines broken sensibly, and maybe capitalisation for headers) and formatted text (like nroff(1) output). LaTeX, RTF, and troff are other possibilities.
* It would be nice to support WYSIWYG writing of manual pages, or generation of them from jdoc documents. This would probably require a little additional information beyond what's in the text widget (e.g. name and description, section of the manual, etc.)
* When jdoc documents are converted to HTML, I'd like to translate hypertext links and anchors as well as fonts. (The capabilites of jdoc are closely modelled after those expressible in HTML.)
* The exact fonts used when generating PostScript and TeX should be user preferences.
* The TeX conversion does a lot of work to support ISO 8859-1. This should only be done if there are actually nonASCII characters in the text (or perhaps it should be a user preference). The PostScript conversion should support ISO 8859-1 even on Level 1 interpreters (it's easy enough).