=========================================================================
Date:         Tue, 5 Feb 91 10:56:20 GMT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         DEL2@PHOENIX.CAMBRIDGE.AC.UK
Subject:      Unicode 1.0

I recently requested a copy of the draft spec of Unicode 1.0 character
encoding.  Although not able to give it all the time I'd have liked, my
brief look does raise a number of comments.  I'm grateful to have the
opportunity to plug my comments into the general discussion (via TEI,
HUMANIST and the UNICODE team themselves:microsoft!asmusf@uunet.uu.net).

(a)  There are a number of significant typos; is anyone keeping a master
record of these?

(b)  Robin Cover  <ZRCC1001@SMUVM1> has raised the question why there are
not separate encodings for Hebrew SIN and SHIN.  They are certainly at least
as distinct as, say, LATIN E followed by ACUTE and LATIN E ACUTE.  I take
it that the reason the latter case has two encodings is because of
previous ISO encodings; but since those are in any case ASCII encodings
(and Unicode is intended as a replacement for ASCII) how relevant is that?
The question also raises a more fundamental problem in my mind.  There
are a number of situations where a glyph (or conglomerate of glyphs) can
reasonably be encoded in alternative ways; HYPHEN (U+2010=U+002d) would be
a case in point.  We are told that some of these redundancies are there so
that natural pairing can be used "if desired" (page 6).  However, these coded
pairs are not consistently undertaken (eg CAPITAL DOTTED I).  But what worries
me is that two encodings of an identical text may thus turn out to be very
different; and for anyone using computer comparison of texts this could be
quite problematic.  So over against those who complained that, eg, separate
codings for GREEK ALPHA+GRAVE are not available I would voice the opposite
disquiet:  the encodings are too comprehensive.  If ALL accentuation was
added as a separate code I think comparison of texts would be easier.

The ordering of the accents would then of course be important, and I don't
think the algorithm given (centre-out) is terribly helpful; which is
nearest the cente in GREEK ROUGH BREATHING+ACUTE+IOTA SUBSCRIPT?
Wouldn't an additional algorithm (clockwise starting at twelve o'clock)
be useful?

(c) While we're on Greek, I couldn't find a Greek semicolon (raised dot).
Maybe I just didn't look hard enough, but full punctuation would be useful.
But see my comment (e) below.
I also failed to locate LATIN CAPITAL LETTER WYNN.

(d)  In general I approve of the policy that by adding the special Coptic
forms to the Greek alphabet one can generate Coptic text, with hard copy
generated by choosing an appropriate font.  (And mutatis mutandis for
other languages.)  However, there are some drawbacks to this policy; I
foresee the following problems:
  (i)  It may be necessary to indicate to someone (if only the compositor)
where to change font.  Could a coding for change-of-language be incorporated?
  (ii) In some Greek texts it may be important to indicate where ligatures
are used; there seems no way in this encoding to distinguish between
GREEK KAPPA + GREEK ALPHA + GREEK IOTA on the one hand and the ligature
which stood for "kai" on the other.  I am sometimes in the position of
needing to say (as indeed the authors of the manual were) something like
"There are three possible form of LATIN SMALL LETTER G CEDILLA (U+0123)
and they look like ..."  How could I encode my ellipsis?  Could the whole
of the manual as printed be sensibly encoded in Unicode?  Oddly, there are
some forms which are exclusively graphic variants (ie one would not find
them together in a "natural" text) which do attract separate codings;
GREEK SMALL LETTER SCRIPT THETA for instance.  Perhaps consistency is
unattainable, but to me it is a desideratum.

(e) The encoding of special numerals seemed odd.  AS well as a select
group of fractions (thirds, quarters and eighths, I think) there is the
top half of fractional 1/nnn (U+215f).  How is its use envisaged?  Wouldn't
a generalised "fractional line" be better (let's call it U+nnnn) so that
<number string1>nnnn<number string2> is to be interpreted as a fraction?

Similarly, Roman 12 (XII) is encoded as U+216b, but 13 (XIII) must be
(presumably) U+2169 2162.  Why not a single code for "roman numbers follow
 here:"
(or just use ROMAN CAPITAL LETTER X &c)?

If codes for general *modes* like "Greek font"; "roman numeral", "fraction"
were included, then many ambiguities and problems could be reduced.  My
Greek semicolon, for instance, could be "GREEK FONT + ;"

This contribution could be better thought-out, but it was this or nothing.
If the latter seems preferable; please discard!

Sincerely,
Douglas de Lacey.
=========================================================================
Date:         Tue, 5 Feb 91 12:52:13 HNE
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         PADROUIN@LAVALVM1.BITNET
Subject:      CIL 92

This circular replaces the one published a few days ago.
cipl92@lavalvm1
========================================================================

          XVe Congres international des linguistes
                 Quebec, Canada, 9-14 aout 1992

       Organise par l'Universite Laval avec le concours de
         l'Association canadienne de linguistique (ACL)
                     et sous les auspices du
      Comite international permanent des linguistes (CIPL)


                         1ere circulaire
                     Renseignements generaux
                       General information
                          1st circular


            XVth International Congress of Linguists
             Quebec City, Canada, August 9-14, 1992

     Organized by Laval University in collaboration with the
              Canadian Linguistic Association (CLA)
                     Under the auspices of
    the Permanent International Committee of Linguists (PICL)


CIL92
Departement de Langues et Linguistique
Universite Laval
Quebec City, (Que.), G1K 7P4, CANADA
Telephone: (418) 656-5323
FAX: (418) 656-2019
E-Mail: CIPL92@LAVALVM1.BITNET


                          ANNOUNCEMENT


            XVth International Congress of Linguists
     Organized by Laval University in collaboration with the
              Canadian Linguistic Association (CLA)
                      Under the auspices of
 the Permanent International Committee of Linguists (PICL)

                 Quebec City, August 9-14, 1992


General theme of the Congress:     "The Survival of Endangered
                                    Languages"


Honorary President:      Michel Gervais
                         Rector of Laval University

Organizing Committee:

President:                Pierre Auger
                          Department of Languages and Linguistics
                          Laval University

Vice-President:           Walter Hirtle
                          Department of Languages and Linguistics
                          Laval University

General Secretary:        Silvia Faitelson-Weiser
                          Department of Languages and Linguistics
                          Laval University

Program:                  Marie Surridge
                          Past President of the CLA
                          Department of French Studies
                          Queen's University, Kingston, Ontario

Local Arrangements:       Jean-Louis Tremblay
                          Department of Languages and Linguistics
                          Laval University

Publications:             Conrad Ouellon
                          Director of CIRAL
                          Department of Languages and Linguistics
                          Laval University


GENERAL INFORMATION


DATE AND LOCATION:       August 9-14, 1992, Laval University,
                         Quebec City, Canada

ACCOMMODATION:           Hotels in all price ranges and
                         limited accomodation in university
                         residences halls.

                         For information on accommodation,
                         contact:

                         OFFICE DU TOOURISME ET DES CONGRES
                         DE LA COMMUNAUTE URBAINE DE QUEBEC
                         399, Saint-Joseph East Street
                         Quebec Ciry, Quebec
                         Canada, G1K 8E2
                         Tel: (418) 522-3511

PASSPORTS AND VISAS:     All visitors to Canada, except
                         residents of the United States, are
                         required to have a valid passport.
                         Citizens of some countries are also
                         required to have a visa.  All enquiries
                         should be addressed to the closest
                         Canadian embassy, consulate or high
                         commission.

REGISTRATION FEES:
                    Participants   Accompanying        Students*
                                   Guests

Before 91/04/30:    $160.50(U.S.)**  $80.25(U.S.)   $160.25(U.S.)
                    $187.25(CAN)     $107(CAN.)     $187.25(CAN.)

From 91/05/01 to
91/12/31:           $214(U.S.)       $107(U.S.)      $160.50(U.S.)
                    $251.45(CAN.)    $133.75(CAN.)   $187.25(CAN.)
From 92/01/01 to
92/08/09:           $285(U.S.)       $142.50(U.S.)   $171(U.S.)
                    $342(CAN.)       $171(CAN.)       $199.50(CAN.)


                         Congress fees may be paid by cheque to
                         CIPL92, by credit cards (American Express,
                         Master Card and Visa).

                         In the event of a cancellation, part of
                         the registration fee will be refunded (75%
                         before February 28, 1992, 50% from March
                         1, 1992 to May 31, 1992).  There will be
                         no reimbursement for cancellations
                         received at the Congress office after May
                         31, 1992.   However, the persons concerned
                         will be sent Congress registration
                         packets.

* Only participants with an official letter from their universities
certifying their student status will pay the student registration
fee.
**All taxes included within the registration fees.

PROGRAM:
               Days                     Activities

Sunday, August 9                        Registration
                                        Reception

Monday, August 10                       Opening ceremony
                                        Plenary session
                                        Oral presentations
                                        Poster sessions
                                        Panel discussions

Tuesday, August 11                      Plenary session
                                        Oral presentations
                                        Poster sessions
                                        Panel discussions

Wednesday, August 12                    Excursions

Thursday, August 13                     Plenary session
                                        Oral presentations
                                        Poster sessions
                                        Panel discussions

Friday, August 14                       Plenary session
                                        Oral discussions
                                        Poster sessions
                                        Panel discussions
                                        Closing ceremony

OFFICIAL LANGUAGES:      The languages of the Congress will be
                         Canada's two official languages, French
                         and English.

PLENARY SESSIONS:        As is customary, the topic of each plenary
                         session will be introduced by three or
                         more speakers.  This will be
                         followed by a general discussion. The
                         topics of these sessions are:

                         1. Semantics, syntax, pragmatics
                         2. The word
                         3. Endangered languages
                         4. Theoretical approaches to language:
                            the state of the art and prospects for
                            the future

PAPERS:                  Conference papers may take the form of
                         oral presentations or poster sessions.
                         Oral presentations are scheduled to last
                         twenty minutes, including a five-minute
                         question period.  Participants choosing
                         the poster session will be allowed two
                         hours. The schedule of papers will be
                         announced in the third circular.

                         The following is a provisional list of
                         section topics:

                         1.Sounds, phonemes and intonation
                         2.The word (morphology, lexicology,
                           lexicography, terminology)
                         3.The sentence (syntax, function, etc.)
                         4.Meaning (semantics, lexical meaning,
                           grammatical meaning, etc.)
                         5.Spoken or written text (pragmatics,
                           discourse analysis, etc.)
                         6.Language and society (sociolinguistics,
                           linguistic variation, language and
                           culture, etc.)
                         7.Language and the individual
                           (psycholinguistics, neurolinguistics,
                           language acquisition, etc.)
                         8.The history of language
                         9.Language planning
                         10.Language learning
                         11.Survival of endangered languages
                         12.Theories of language
                         13.Language and the computer
                         14.Pidgins et creoles
                         15.The history of linguistics
                         16.Methodology (data observation, corpus
                         gathering and processing,experimentation)
                         17.Other (language and women, sign
                         language,etc.)

                         Participants wishing to present a paper
                         will be requested to send an abstract
                         before October 1, 1991.  See the second
                         circular (May 1991) for details.

PANEL DISCUSSIONS:       The Organizing Committee invites
                         participants to propose topics for
                         panel discussions by April 1, 1991.
                         Participants whose topic is chosen will
                         be responsible for organizing their panel
                         discussion.

PRESENTING A PAPER:      The second circular will be sent to
                         those who complete the enclosed
                         answer card.

INFORMATION:             CIL92
                         Pierre Auger
                         Departement de langues et linguistique
                         Universite Laval
                         Quebec City, (Que.)
                         G1K 7P4, CANADA
                         Telephone: (418) 656-5323
                         FAX: (418) 656-2019
                         E-Mail:  CIPL92@LAVALVM1


Early registration Form
XVth International Congress of Linguists


Name_Mr.________________________________________

     Ms.___________________________________________

Title_____________________________________________

Institution or
Agency___________________________________________________

Address:_________________________________________________

       __________________________________________________

     ___________________________________________________

     __________________________________________________

Tel.:______________________________

FAX:_______________________________

E-Mail:____________________________

REGISTRATION

          Before 91/04/30     From 91/05/01       From 92/01/01
                              to 91/12/31         to 92/08/09

Regular   $160.50(U.S.)       $214(U.S.)          $285(U.S.)
          $187.25(CAN.)       $251.45(CAN.)       $342(CAN.)

Students  $160.50(U.S.)       $160.50(U.S.)       $171(U.S.)
          $187.25(CAN.)       $187.25(CAN.)       $199.50(CAN.)

Accompanying
Guests    $80.25(U.S.)        $107(U.S.)          $133.75(U.S.)
          $107(CAN.)          $133.75(CAN.)       $171(CAN.)


PAYMENT        Cheque

               Master Card         Visa           American Express


                         Expiration date:__________________________
                         Signature:_______________________________


I would like to present a paper.
           Yes       No

Chosen sections (in order of preference)
1.______________________________________
2.______________________________________

Way preferred to present a paper

     -oral

     -poster session

     -no preference


                           ANSWER CARD
(To be filled out by anyone wishing to receive the second circular)

Name Mr.____________________________________________________

     Ms._____________________________________________________

Address:____________________________________________

          _____________________________________________

          ____________________________________________

Tel:______________________  FAX:_________________________

E-Mail:_________________________


CIL92
Pierre Auger
Departement de langues et linguistique
Universite Laval
Quebec City, (Que.)
G1K 7P4, CANADA
Telephone:(418) 656-5323
FAX:(418) 656-2019
E-Mail:CIPL92@LAVALVM1.BITNET


         +++++++++++++++++++++++++++++++++++++++++++++++


                             ANNONCE


            XVe Congres international des linguistes
      Organise par l'Universite Laval avec le concours de
         l'Association canadienne de linguistique (ACL)
                    et sous les auspices du
           Comite international des linguistes (CIPL)

                    Quebec, 9 au 14 aout 1992


Theme principal du Congres:  "La survie des langues menacees"


President d'honneur:     M.Michel Gervais
                         Recteur de l'Universite Laval

Comite d'organisation

President:          M. Pierre Auger
                    Departement de langues et linguistique
                    Universite Laval

Vice-president:     M. Walter Hirtle
                    Departement de langues et linguistique
                    Universite Laval

Secretaire-generale:  Mme Silvia Faitelson-Weiser
                    Departement de langues et linguistique
                    Universite Laval

Programme:          Mme Marie Surridge
                    Presidente sortante de l'ACL
                    Departement d'Etudes Francaises
                     Universite Queen, Kingston, Ontario

Accueil:            M. Jean-Louis Tremblay
                    Departement de langues et linguistique
                    Universite Laval

Publications:       M. Conrad Ouellon
                    Directeur du CIRAL
                    Department de langues et linguistique
                    Universite Laval


RENSEIGNEMENTS GENERAUX


DATE ET LIEU DU CONGRES:  9 au 14 aout 1992, Universite Laval,
                    Quebec, Canada

HEBERGEMENT:        Hotels de differentes categories et
                    nombre limite de logements economiques
                    dans les residences de l'Universite Laval.

                    Toutes les demandes concernant
                    l'hebergement doivent etre acheminees a:
                    OFFICE DU TOURISME ET DES CONGRES
                    DE LA COMMUNAUTE URBAINE DE QUEBEC
                    399, rue Saint-Joseph Est
                    Quebec, Quebec
                    Canada, G1K 8E2

PASSEPORTS ET VISAS:  Tous les visiteurs entrant au Canada, sauf
                    les residents des Etats-Unis, doivent etre en
                    possession d'un passeport valide.  Pour les
                    ressortissants de certains pays, un visa est
                    egalement requis.  Chacun des participants est
                    encourage a consulter l'ambassade, le
                    consulat ou le haut-commissariat canadien le
                    plus pres, pour verifier les conditions qui
                    s'appliquent a leur situation.

FRAIS D'INSCRIPTION:

                    Congressistes  Accompagnants  Etudiants*

Avant 91/04/30:     160.50$(U.S)** 80.25$(U.S.)   160.50$(U.S.)
                      187.25$(CAN.)   107$(CAN.)  187.25$(CAN.)

Du 91/05/01 au      214$(U.S.)     107$(U.S)      160.50$(U.S.)
91/31/12:           251.45$(CAN.)  133.75$(CAN.)  187.25$(CAN.)

Du 92/01/01 au      285$(U.S.)     142.50$(U.S.)  171$(U.S.)
92/08/09:           342$(CAN.)     171$(CAN.)     199.50$(CAN.)

                    Les frais d'inscription peuvent etre payes par
                    cheque, a l'ordre de CIPL92, par cartes de
                    credit (American Express, Master Card ou Visa).

                    En cas d'annulation, une partie des frais
                    des inscriptions sera remboursee, soit
                    75% avant le 28 fevrier 1992, 50% du 1er mars
                    au 31 mai 1992.  Les annulations effectuees
                    apres le 31 mai 1992 ne seront pas
                    remboursees, cependant les personnes inscrites
                    recevront tout le materiel distribue pour la
                    tenue de ce congres.

*Seuls les participants munis d'une lettre de leur universite
attestant leur statut d'etudiant pourront beneficier du tarif
etudiant.

**Ont ete rajoutees au frais d'inscription les taxes federale et
provinciale sur les biens et les services.

PROGRAMME
     Jours                              Activites

Dimanche, 9 aout                    Inscription
                                   Soiree d'accueil

Lundi, 10 aout                     Ceremonie d'ouverture
                                   Session pleniere
                                   Communications orales
                                   Communications par affiche
                                   Tables-rondes

Mardi, 11 aout                     Session  pleniere
                                     Communications orales
                                   Communications par affiche
                                   Tables-rondes

Mercredi, 12 aout                  Journee libre, excursions

Jeudi, 13 aout                     Session pleniere
                                     Communications orales
                                   Communications par affiche
                                   Tables-rondes

Vendredi, 14 aout                  Session pleniere
                                    Communications orales
                                   Communications par affiche
                                   Tables-rondes
                                   Ceremonie de cloture


LANGUES DU CONGRES: Les deux langues du Congres seront les langues
                    officielles du Canada, soit le francais et
                    l'anglais.

SESSIONS PLENIERES: Selon l'usage, le sujet de chaque session
                    pleniere sera presente par trois conferenciers
                    ou plus.  Une discussion generale suivra.  Les
                    sujets de ces sessions seront les suivants:

                    1.Semantique, syntaxe, pragmatique
                    2.Le mot
                    3.Les langues menacees
                    4.Les approches theoriques:
                      le present et l'avenir

COMMUNICATIONS:     Des communications orales ou des
                    communications par affiche seront acceptees.
                    Le temps alloue pour une communication orale
                    et la discussion qui suit est de vingt minutes.
                    L'auteur d'une communication par affiche
                    beneficiera d'une periode de deux heures pour
                    presenter sa communication.  L'horaire des
                    communications figurera dans la troisieme
                    circulaire.

                    La liste provisoire des sujets retenus pour les
                    sections est la suivante:

                    1. Les sons, les phonemes et l'intonation
                    2. Le mot (morphologie, lexicologie,
                       lexicographie, terminologie, etc.)
                    3. La phrase (syntaxe, fonction, etc.)
                    4. Le sens (semantique, signification lexicale,
                       signification grammaticale, etc.)
                    5. Le texte parle ou ecrit (pragmatique,
                       analyse de discours, etc.)
                    6. Langage et societe (sociolinguistique,
                       variation linguistique, langue et
                       culture,etc.)
                    7. La langue et l'individu (psycholinguistique,
                       neurolinguistique, acquisition et
                       apprentissage des langues)
                    8. La langue dans le temps
                    9. L'amenagement linguistique
                   10. Apprentissage des langues
                   11. La survie des langues menacees
                   12. Theorie du langage
                   13. Langage et informatique
                   14. Pidgins et creoles
                   15. Histoire de la langue
                   16.Methodologie (observation des donnees,
                    constitution et traitement de corpus,
                    experimentation, etc.)
                   17. Autres (la langue et les femmes, le langage
                    par signes,etc.)

                    Les participants sont invites a proposer une
                    communication dans une des sections ci-haut
                    mentionnees.  Ils devront faire parvenir un
                    resume de leur communication avant le 1er
                    octobre 1991 au bureau du Congres.  Pour de
                    plus amples renseignements, consultez la
                    deuxieme circulaire disponible en mai 1991.

TABLES-RONDES:      Le Comite d'organisation invite les
                    participants interesses a organiser une table-
                    ronde a soumettre des propositions de sujets,
                    et ceci avant le 1er avril 1991.   Le
                    participant dont le sujet sera accepte sera
                    responsable de l'organisation de la table-
                    ronde.

PRESENTATION D'UNE COMMUNICATION:La deuxieme circulaire (mai 1991)
                    sera envoyee a ceux qui auront rempli la fiche
                    de reponse ci-jointe. Veuillez la retourner le
                    plus tot possible.

INFORMATION:        CIL92
                    Pierre Auger
                    Departement de langues et linguistique
                    Universite Laval
                    Quebec, (Que.)
                    G1K 7P4, CANADA
                    Telephone: (418) 656-5323
                    FAX: (418) 656-2019
                    E-mail: CIPL92@LAVALVM1.BITNET


                    Fiche de preinscription

            XVe Congres international des linguistes


Nom___________________________

Prenom_M._______________________

Mme_________________________

Titre:________________________________________________

Etablissement ou
organisme:__________________________________________________

 Adresse:________________________________________

          _______________________________________

           _______________________________________

          ______________________________________

      Tel.:(____)____________________

     FAX:(____)___________________________________

     E-MAIL:(____)________________________________________


 INSCRIPTION:
          Avant 91/04/30  Du 91/05/01     Du 92/01/01
                          au 91/12/31   au 92/08/09

 Regulier     160.50$(U.S.) 214$(U.S.)    285$(U.S.)
              187.25$(U.S) 251.45$(CAN.)  342$(CAN.)

 Etudiants    160.50$(U.S.) 160.50$(U.S.) 171$(U.S.)
              187.25$(CAN.) 187.25$(CAN.) 199.50$(CAN.)

 Accompagnants 80.25$(U.S)  107$(U.S.)    133.75$(U.S)
               107$(CAN.)   133.75$(CAN.) 171$(CAN.)


 MODE DE PAIEMENT:   Cheque

       Master Card         Visa        American Express

     Date d'expiration:____________________
     Signature:_______________________________


Je desire presenter une communication.
     Oui    Non


     Sections choisies (par ordre de preference)

     1.___________________________________________________

     2.___________________________________________________


     Mode de presentation choisi

     -oral

     -par affiche

     -aucune preference


                        Fiche de reponse
(a remplir par tous ceux qui veulent recevoir la deuxieme
circulaire)

 Nom_________________________

 Prenom  M.___________________________________________
         Mme_________________________________________

Adresse:_______________________________________________

        _______________________________________________

        _______________________________________________

Tel.:(____)___________________Fax:(____)___________________

E_Mail:(____)______________________________


CIL 92
Pierre Auger
Departement de langues et linguistique
Universite Laval
Quebec,(Que.)
G1K 7P4, CANADA
Telephone:(418) 656-5323
FAX:(418) 656-2019
E-Mail:CIPL92@LAVALAM1.BITNET
=========================================================================
Date:         Tue, 5 Feb 91 19:55:18 -0500
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Don Walker <walker@FLASH.BELLCORE.COM>
Subject:      COLING-92 First Announcement & Call for Papers

   Fourteenth International Conference on Computational Linguistics

			     COLING-92

		  23-28 July 1992, Nantes, France

	       FIRST ANNOUNCEMENT AND CALL FOR PAPERS


DATES:  The conference will last five full days (not counting Sunday).
Pre-COLING tutorials will take place on 20-22 July (2-1/2 days).

ORGANIZERS:  GETA and IMAG, Grenoble (F. Peccoud, Ch. Boitet, J. Courtin),
Palais des Congres, Nantes (M. Gillet), Universite de Nantes (M.H. Jayez),
EC2 (G. d'Aumale).

PROGRAMME CHAIR:  Prof. A. Zampolli, Universita di Pisa, ILC, via della
Faggiola 32, I-56100 Pisa, ITALY (tel: +39.50.560481; fax: +39.50.589055).

DEADLINES:  Send six A4 or 8-1/2 by 11 inch copies of the full paper to
Prof. Zampolli before 1 November 1991.  Notifications of acceptance will
be sent by 1 March 1992.  Camera-ready copies of final papers conforming
to the COLING-90 style sheet must reach GETA (GETA-IMAG, COLING-92, BP 53X,
F-38041 Grenoble, FRANCE) by 1 May 1992.

TOPICS:  All topics in Computational Linguistics are acceptable.  Papers
concerning real applications will be especially welcome.  A special session
on language industry is planned.  Please indicate main areas of papers using
two-level categories: computational models and formalisms (in morphology,
syntax, semantics, pragmatics, discourse, dialogue, . . .), methods
(symbolic, numerical, statistical, neural, . . .), tools (specialized
languages, environments), large-scale resources (textual, lexical,
grammatical databases), applications (natural language interfaces,
information retrieval, text generation, machine translation, machine
aids to writing, translating, abstracting, learning, . . .), hypermedia
and natural language processing (integration of text, speech, graphics,
video), generic questions in language industry (engineering, ergonomics,
legal aspects, normalization, . . .).

TYPES OF PAPERS:  Topical papers (maximum seven pages in final format)
on crucial issues in Computational Linguistics, and project notes
(maximum five pages).  Only unpublished papers will be accepted.
Papers should describe substantial and original work, especially
new methodologies and applications.  They should emphasize completed
rather than intended work.

PRELIMINARY SCHEDULE:  Twelve 30-minute lecture slots daily (hopefully
in only three parallel sessions) and three 30-minute demonstration slots
during the lunch break (hopefully in at least ten parallel sessions).
It should be possible to have lunch and go to two or even three demos.

DEMONSTRATIONS:  Demonstrations are strongly encouraged.  A project note
without a demo will have a lower probability of acceptance.  With a demo,
it will get three consecutive demo slots.  A topical paper including a
demo will be presented as a lecture and as a demo.

LANGUAGES:  One extra page will be allowed for a long abstract in
English, if the paper is written in another language, or conversely
(paper in English and long abstract in another language).  Speakers
not giving their talk in English are encouraged to use visual aids
in English.

EXHIBITION:  An exhibition of language industry products will be
organized in parallel by EC2, the well known organizer of the annual
Avignon meetings on Expert Systems.  Industrial firms are encouraged
to present state-of-the-art NLP products.

OTHER ACTIVITIES:  A social programme will be proposed to participants
and companions.  Individual discovery is also possible, as Nantes and
its region are culturally very active and full of picturesque places.


		    Organized on behalf of the
	International Committee on Computational Linguistics

Martin Kay, Palo Alto (President); Eva Hajicova, Prague (Vice President);
Donald E. Walker, Morristown (Secretary General); Christian Boitet,
Grenoble; Nicoletta Calzolari, Pisa; Brian Harris, Ottawa; David Hays,
New York (Honorary); Kolbjorn Heggstad, Bergen; Hans Karlgren, Stockholm;
Olga Kulagina, Moscow; Winfried Lenders, Bonn; Makato Nagao, Kyoto;
Helmut Schnelle, Bochum; Petr Sgall, Prague; Yorick Wilks, Las Cruces;
Antonio Zampolli, Pisa
=========================================================================
Date:         Tue, 5 Feb 91 19:57:12 -0500
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Don Walker <walker@FLASH.BELLCORE.COM>
Subject:      ACL Applied Natural Language Processing Conference - Trento 1992

			CALL FOR PAPERS
      3rd Conference on Applied Natural Language Processing
		Trento, Italy, 1-3 April 1992

			 sponsored by
	    Association for Computational Linguistics


PURPOSE
The focus of this conference is on the application  of natural
language processing techniques to real world problems.  It will
include invited and contributed papers, tutorials, an industrial
exhibition, and demonstrations.  A special video session is also
being organised.  The organizers want the conference to be as
international as possible, and to feature the best applied natural
language work presently available in the world.  This conference
follows on from those held in Santa Monica, California in 1983,
and in Austin, Texas in 1988.

AREAS OF INTEREST
Original papers are being solicited in all areas of applied natural
language processing, including but not limited to: dialog systems;
integrated speech and natural language systems; machine translation;
explanation and generation; database interface systems; tool
development; text and message processing; grammar and style checking;
corpus development; knowledge acquisition; lexicons; language
teaching aids; evaluation; adaptive systems; multilanguage systems;
multimedia systems; help systems; and other applications.  Papers
may discuss applications, evaluations, limitations, and general
tools and techniques.  Papers that critically evaluate a relevant
formalism or processing strategy are especially welcome.

REQUIREMENTS FOR SUBMISSION
Authors should submit, by 10 September 1991, a) six copies of a
full-length paper (min 9, max 18 double-spaced pages, minimum font
size 12, exclusive of references); b) 16 copies of a 20-30 line
abstract; c) a declaration that the paper has not been accepted
nor is under review for a journal or other conference nor will it
be submitted during the conference review period.  Papers arriving
after the deadline will be returned unopened.  We regret that papers
cannot be submitted electronically, or by fax.

Papers should describe completed rather than intended work, identify
distinctive aspects of the work, and clearly indicate the extent
to which an implementation has been completed; vague or unsubstantiated
claims will be given little weight.  Both the paper and the abstract
should include the title, the name(s) of the author(s), complete
addresses and e-mail address.

Papers from Europe and Asia should be sent to:
	Oliviero Stock (ANLP-3)		phone: +39-461-814444
	I.R.S.T.			  fax: +39-461-810851
	38050 Povo (Trento), ITALY	email: stock@irst.it

Papers from America and other continents should be sent to:
	Madeleine Bates (ANLP-3)	phone: +1-617-8733634
	BBN Systems & Technologies	  fax: +1-617-8733776
	10 Moulton Street		email: bates@bbn.com
	Cambridge, MA 02138, USA

Authors will be notified of acceptance or rejection by 30 November
1991. Full-length versions of accepted papers, prepared according
to instructions, must be received, along with a signed copyright
release statement, by 15 January 1992.  All papers will be reviewed
by members of the program committee, which is co-chaired by Madeleine
Bates (BBN Systems & Technologies) and Oliviero Stock (IRST) and
also includes:

Robert Amsler, MITRE		       	Kathy McKeown, Columbia Univ.
Giacomo Ferrari, Univ. of Pisa		Sergei Nirenburg, Carnegie Mellon Univ.
Eduard Hovy, USC/ISI			Makoto Nagao, Kyoto Univ.
Paul Jacobs, General Electric	   	Remko Scha, Univ. of Amsterdam
Martin Kay, Xerox PARC			Karen Sparck Jones, Univ. of Cambridge
Mark Liberman, Univ. of Pennsylvania	Henry Thompson, Univ. of Edinburgh
Paul Martin, MCC			Wolfgang Wahlster, DFKI

VIDEOTAPES
Videotapes are sought that display interesting research on NLP
applications to real-world problems, even if presented as promotional
videos (not advertisements).  An ongoing video presentation will be
organized that will demonstrate the current level of usefulness of
NLP tools and techniques.

Authors should submit one copy of a videotape of at most 15 minutes
duration, accompanied by a submission letter giving permission to
copy the tape to a standard format and two copies of a one to two
page abstract that includes: title, name and address and email or fax
number of authors; tape format of the submitted tape (VHS, any of
NTSC, PAL or SECAM); duration.  The final tape format provided by
the authors should be one of VHS, 75'' u-Matic, BVU, in any of NTSC,
PAL or SECAM.  Videotapes cannot be returned.

Tape submissions should be sent to the same address as the papers
(see above).  The timetable for submissions, notification of
acceptance or rejection, and receipt of final versions is the same
as for the papers.  See above for details.

Tapes will be reviewed and selected for presentation during the
conference.  Abstracts of  accepted videos will appear in the
conference proceedings.  We are also considering the possibility of
producing a collection of video proceedings, for those videotapes
that authors agree to distribute.  A preliminary indication on this
matter will be appreciated.

DEMONSTRATIONS
Beside demonstrations to be carried on within a regular booth at the
industrial exhibition, there will be a program of demonstrations on
standard equipment available at the conference (SUN's, MAC's, etc.).
Anyone wishing to present a demo should send a one-page description
of the demo and a specification of the system requirements by 1 December
1991 to
	Carlo Strapparava		phone: +39-461-814444
	I.R.S.T.			  fax: +39-461-810851
	38050 Povo (Trento), ITALY	email: strappa@irst.it

PRIZE
A prize will be given for the best nonindustrial demonstration.

TUTORIALS
The meeting will be preceded by one or two days of tutorials by
noted contributors to the field.
Responsible for tutorials:
	Jon Slack			phone: +39-461-814444
	I.R.S.T.			  fax: +39-461-810851
	38050 Povo (Trento), ITALY	email: slack@irst.it

WORKSHOPS
Proposals for organizing workshops in Trento immediately after the
conference can be addressed to Oliviero Stock at the above address.

INDUSTRIAL EXHIBITION
Facilities for exhibits will also be available. Persons wishing to
arrange an exhibit should send a brief description together with
a specification of physical requirements (space, power, telephone
connections, table, etc.) by 1 September 1991 to
	Giampietro Carlevaro		phone: +39-461-814444
	I.R.S.T.			  fax: +39-461-810851
	38050 Povo (Trento), ITALY	email: carleva@irst.it

GENERAL INFORMATION
Local arrangements are being handled by
	Tullio Grazioli and Oliviero Stock	phone: +39-461-814444
	I.R.S.T.				  fax: +39-461-810851
	38050 Povo (Trento), ITALY		email: interne@irst.it

For information on the ACL, contact
	Donald E. Walker (ACL)		phone: +1-201-8294312
	Bellcore, MRE 2A379		  fax: +1-201-4551931
	445 South Street, Box 1910	email: walker@flash.bellcore.com
	Morristown, NJ 07960, USA

The conference is also supported by the European Coordinating
Committee for Artificial Intelligence (ECCAI), the Italian Association
for Artificial Intelligence (AI*IA) and Istituto Trentino di Cultura.
=========================================================================
Date:         Wed, 6 Feb 91 06:37:33 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "Eric Johnson DSU, Madison, SD 57042" <ERIC@SDNET.BITNET>
Subject:      Conference

                               I C E B O L 5

   Fifth International Conference on Symbolic and Logical Computing

                         Dakota State University
          April 18-19, 1991                   Madison, SD  57042


                             KEYNOTE SPEAKER
                               Nancy M. Ide

Professor and Chair, Computer Science Department, Vassar College.
Author of _Pascal for the Humanities_ and articles on William Blake,
artificial intelligence, and programming for the analysis of texts.

                            FEATURED SPEAKER
                             Ralph Griswold

One of the creators of the Icon programming language and SNOBOL4.
He is the editor of two newsletters, and the author of six books and
dozens of articles on computer languages and string and list processing.
He is Professor of Computer Science at the University of Arizona.


ICEBOL5, the fifth International Conference on Symbolic and Logical
Computing, is designed for teachers, scholars, and programmers who want
to meet to exchange ideas about computer programming for non-numeric
applications -- especially those in the humanities.  In addition to a
focus on SNOBOL4, SPITBOL, and Icon, ICEBOL5 will feature presentations
on processing in a variety of programming languages such as Pascal,
Prolog, C, and REXX.

                             SCHEDULED TOPICS

Music Score Recognition                     Automatic File Generation
Predicate Logic                            Parallel Logic Programming
Tools for Navajo Lexicography              Expert System for Advising
Computer Analysis of Poetry and Prose      Simulating Neural Activity
Parsing Texts                                 Data Integrity Checking
Digitized Voice Management                   Selecting Expert Systems
Grammar and Machine Translation                   Editing Large Texts
                     Logical Modeling of Complex Systems

                              ACCOMMODATIONS
                    Please make your own reservations.

Lake Park Motel  (Single $23);(Double $26)
W. Hwy. 34    Phone (605) 256-3424

Super 8  (Single $26);(Double $32)
W. Hwy. 34    Phone (605) 256-6931

All major chains available in Sioux Falls, SD (50 miles from conference
site)

- - - - - - - - - - - - - REGISTRATION FORM - - - - - - - - - - - - - -

     FIFTH INTERNATIONAL CONFERENCE ON SYMBOLIC AND LOGICAL COMPUTING

                             April 18-19, 1991
Indicate the number for the following:
                                                                 Amount
______Advance registration $150.00 (includes two lunches,
      coffee breaks, banquet, one copy of the proceedings);
      On-site registration $175.00                             $________
______Additional copies of ICEBOL5 proceedings ($35.00 each)   $________
______Additional banquet tickets ($15.00)                      $________
______Shuttle from Sioux Falls airport ($40.00 per passenger
      round trip)                                              $________
      Arrival:  Flight________ Date ________ Time ________
      Departure:  Flight_______ Date ________ Time ________
       (Rental cars are available at the Sioux Falls, SD airport)

                                         Total Amount Enclosed $________

Name_______________________________ College or Firm_____________________

Mailing address_________________________________________________________
               _________________________________________________________
               _________________________________________________________
Return this form to:   Eric Johnson, ICEBOL5 Director, 114 Beadle Hall,
Dakota State University, Madison, South Dakota  57042  USA
=========================================================================
Date:         Wed, 6 Feb 91 15:45:50 PST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Ken Whistler <whistler@ZARASUN.METAPHOR.COM>
Subject:      Reply to Douglas de Lacey, Re: Unicode 1.0

I sent the following reply letter to Mr. de Lacey's recent
comments on Unicode 1.0.  Since he posted his comments to
TEI-L, I am also forwarding my reply to that list. (Ken Whistler)

Mr. de Lacey,

Asmus Freytag forwarded your comments to several of us who are currently
working on the Unicode 1.0 draft.  While formal resolution of commentary
will await decisions by the Unicode Technical Committee, I thought it
might prove useful to clarify a few things now.  These are my own
opinions, and do not necessarily reflect the decisions of the UTC.

Many of the bizarre characteristics of the symbols area that you note
(encoding of fractions, Roman numerals, etc.) are simply the price we
have had to pay to preserve interconvertability with other, important
and already-implemented character encodings.  We fully expect that
any "smart" Unicode implementation will ignore most of the fraction
hacks, for example, and encode fractions in a uniform and productive
way.  There is, in fact, a dual argument fraction operator in Unicode
(U+20DB) to support such implementations.

The coexistence of composite Latin letters (e.g. E ACUTE) with
productive composition using non-spacing diacritics is also forced
by compromises between competing requirements for mapping to old
standards and implementation needs of the various parties which
will use Unicode.  While this has been (accurately) criticized as
leading to non-unique encoding--in the sense that alternative,
correct "spellings" of the "same text" can be generated--it is
my considered opinion, after long arguments with proponents of
other approaches, that uniqueness is not obtainable.  In other
words, we could design a scheme which could theoretically lead
to unique encoding, but it would be unacceptable as a practical
character encoding--so we wouldn't get it anyway.  Unicode
started out as you envision it--with only baseforms and non-spacing
diacritics for Latin/Greek/Cyrillic, so that all accented letters
would be composed.  But that allowed for no acceptable evolutionary
path from where we are to where we would like to be.  The other
approach, which tries to encode every single combination anyone
could use (i.e. ISO DIS 10646), is necessarily incomplete, in
that it refuses to acknowledge productivity in application of diacritics
(e.g. for IPA).

So Unicode is admittedly a chimera--but a practical, real chimera
that will be implemented, rather than an impractical and
unimplementable one.

You identify a problem which arises from non-uniqueness, namely:

>two encodings of an identical text may thus turn out to be very
>different; and for anyone using computer comparison of texts this could be
>quite problematic.

I would imagine this also disturbs the dreams of many who are working
on the text encoding initiative.  But again, I think there is no
way to guarantee uniqueness.  Furthermore, the entire notion of
"identical text" requires rigorous definition before algorithmic
comparisons by computer make any sense.  Is a text on a Macintosh
comparable to the "identical text" on an IBM PC?  Well, perhaps,
once considerations of several layers of hardware, software, and
text formatting, together with character set mapping are resolved.
Such comparisons involve appropriate filters, so that canonical
forms are properly compared.  All Unicode implementers I know
of are fully aware of the problem of canonical form for text
representation.  (By the way, it might be fair to say that this
is an order-of-magnitude more critical problem for corporate
database implementors than it is for text analysis.)

Another thing to keep distinct in understanding Unicode is that
not everything which can appear on a page can be encoded in Unicode
plain text.  Changes of font, changes of language, or metatextual
references to a particular glyph:
>"There are three possible form of LATIN SMALL LETTER G CEDILLA (U+0123)
>and they look like ..."
require a higher level of text structure than simply a succession
of characters one after another.  Unicode is definitely not going
to be defining a bunch of ESCAPE code sequences to be embedded into
text with particular semantics such as "change font to...".  Modern
text editing, analyzing, and rendering software deals with such things
by means of distinctions on a "plane above" the text itself.  The plain answer
to the question, "could the whole of the manual as printed be
sensibly encoded in Unicode?", is clearly no, since it requires
a layer of formatting and distinguishes multiple fonts.

The particular case of the GREEK SMALL LETTER SCRIPT THETA is just
baggage dragged along from mistakes made in earlier encodings (thus
also the other admitted glyphs encoded separately in the Greek block).

There is a scheme for indicating preferential rendering (where possible)
using ligatures (such as Greek "kai").  The ZERO WIDTH JOINER (U+200D)
and ZERO WIDTH NON-JOINER (U+200C) can be used as rendering hints
for ligatures, as well as serving as an important part of the
proper implementation of cursive scripts such as Arabic.

I don't think there is a LATIN CAPITAL LETTER WYNN to be found.  This
is a good case for following the "How to Request Adding a Character
to Unicode" guidelines.  If you can provide clear textual evidence that
wynn appears in regular use with a case distinction, then a capital
form would be a good candidate for addition.

The Greek semicolon was unified with MIDDLE DOT (U+00B7).

The diacritic ordering algorithm (centre-out) is meant to apply
independently to diacritics on top and to diacritics on the bottom.
The issue of how to specify unambiguously side-by-side ordering
within diacritics at the same vertical level is a good one, and I think
it will have to be addressed in the final draft.

I hope these clarifications are helpful.

--Ken Whistler
=========================================================================
Date:         Thu, 7 Feb 91 07:50:34 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "Robert A. Amsler" <amsler@STARBASE.MITRE.ORG>
Subject:      UNICODE 1.0 Draft

I am still concerned over offering two (or more) encodings of characters
WITHOUT any prescriptive guidance to those who have the option of
selecting either for new texts to be encoded or translated.

It is not necessary to forbid alternate encodings, only to discourage
them from continued use. Some guidance as to whether either encoding
is preferred seems desirable. If UNICODE doesn't provide such guidance
I would advocate the TEI adding its own recommendations.

As Yaccov Choueka once pointed out to me, all the text that has
been encoded up to now is but the tiniest fraction of the text
that WILL BE encoded in the future. We own the future a better
chance to use more desirable encodings than we may have to put up
with because of poor planning in the past.
=========================================================================
Date:         Thu, 7 Feb 91 09:56:00 EDT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         John Lavagnino <LAV@BRANDEIS.BITNET>
Subject:      Against conference announcements

I propose that conference organizers be discouraged from posting their
announcements to this list, unless the conference has sessions devoted
to SGML or the TEI.  To my mind, there have been too many postings
recently about linguistics conferences with no particular connection to
this list's subject.  If there's anybody who is a) interested in those
conferences and b) heard about them only on this list, and not also on
various linguistics and humanities lists, I would be surprised.

John Lavagnino
Department of English and American Literature, Brandeis University
=========================================================================
Date:         Thu, 7 Feb 91 17:31:12 +0100
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Timothy.Reuter@MGH.BADW-MUENCHEN.DBP.DE
Subject:      Conference announcements

I second John Lavagnino's request. I would prefer it if conference
announcements on ALL lists took the form of a ten-line statement
that details of such and such a conference can be obtained from
LISTSERV@SOMEWHERE_OR_OTHER together with the closing date for
papers and for registration - but perhaps that's too much to hope for.
Timothy Reuter, MGH, Munich
=========================================================================
Date:         Thu, 7 Feb 91 13:12:00 EDT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Comments:     Warning -- RSCS tag indicates an origin of VERONIS@VAXSAR
From:         Jean Veronis <VERONIS@VASSAR.BITNET>
Subject:      Re: Conference announcements

It seems reasonable that conference announcements appear on TEI-L if they are
related to the TEI only.

More generally, I would enjoy reading more TEI-related discussions on the list
which has been lacking in recent months. It's difficult for me to understand
why, since there seems to be a lot of activity within the TEI. Why so little on
the list?

Jean Veronis
=========================================================================
Date:         Thu, 7 Feb 91 17:11:07 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Don Goldhamer <dhgo@MIDWAY.UCHICAGO.EDU>
Subject:      Re: Conference announcements

VERY brief announcements of "relevant" converences (the suggested 10
lines or less) with pointers to more information would seem most desirable.

I would prefer to interpret "relevant" very liberally, so as to
include those announcements we have recently received.  Some of us are
not subscribed to many lists.


 Donald H. Goldhamer
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 Project Manager                                 1155 E 60th, Chicago IL 60637
 Department of Academic & Public Computing        dhgo@midway.UChicago.EDU
 University of Chicago Computing Organizations   voice:(312)702-7166
=========================================================================
Date:         Fri, 8 Feb 91 11:17:00 MDT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         CHERYLL BALL <CBALL@UNMB.BITNET>
Subject:      Mainframe Letters

 I am looking for schools that have a good way of producing mass quantities
of letters on the mainframe. We have an IBM 9121 running MVS/XA. Our
users have data collection on IDMS. Some users departments have PC's and
some are hardwired to the mainframe and some use the network to get to
there systems. We need a letter processing system to produce approx. 500-
2000 letters a night, 300,000 letters a year. We are looking at purchased
software like IBM's ASF with Document Writing Feature/Document Composition
Feature. We are looking at IDMS/PC to leave the capability for the user's
to use there PC's. Please let me know what you are doing at your site.

                   I do not belong to this listserv group
                  please send replies to:

                         Cheryll Ball CBALL@UNMB or CBALL@TRITON.UNM.EDU
                         University of New Mexico
                         Analyst/Programmer II
=========================================================================
Date:         Mon, 11 Feb 91 02:04:52 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Josh Hendrix <ST702537@BROWNVM.BITNET>
Subject:      Re: Unicode 1.0

Does anyone know where I could obtain a copy of the Unicode book?

Josh Hendrix
=========================================================================
Date:         Mon, 11 Feb 91 18:14:25 +1100
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         mr p paul <dat557h@MONU1.CC.MONASH.EDU.AU>
Subject:      Re: Unicode 1.0

Please let me know when you've found it.
=========================================================================
Date:         Mon, 11 Feb 91 09:38:55 DNT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Hans Joergen <DDAHM@VM.UNI-C.DK>
Subject:      AHC 91 Call for papers
In-Reply-To:  Message of Thu, 7 Feb 91 09:56:00 EDT from <LAV@BRANDEIS>

AHC-conference in Odense 28th to 30th August 1991

The sixth international conference of the AHC will be held in Odense,
Denmark and arranged by the Danish Data Archives.

In the program comity for the conference are Peter Denley, Westfield
College, London, Stefan Fogelvik, Stockholms Historiska Databas,
Daniel Greenstein, Glasgow University, Hans J�rgen Marker, Dansk
Data Arkiv, Jan Oldervol, Universitetet i Troms�, Kevin Schurer,
Cambridge Group, Josef Smets, Montpellier, and Manfred Thaller,
Max Planck Institut f�r Geschichte, G�ttingen.

Topics of the Conference

The topics of the conference will as usual be a broad presentation of
every thing that is going on in history and computing. Papers are
invited on substantial subjects as well as methodological questions.

Among the expected topics are
   -Standardization and exchange of machie readable data in the
    historical disciplines
   -Data analysis and presentation
   -Event history analysis
   -Text analysis
   -Simulation and modelling
   -Computer aided teaching
   -Social and economic history
   -Quantitative methods

At the forthcoming conference it would be natural, as the conference
is located in Scandinavia, if demographic studies and large data
collections would be a central issue

Furthermore a number of workshops on methodological questions will
be held in the spring of 1991. These workshops will present their
results for further discussions in workshop sessions at the conference.
One of these workshops is dedicated to the application of the TEI guidelines
in the field of history.

Papers are invited on all aspects of computing in history. The papers
will be published in a proceedings volume from the conference
provided that they are submitted in machine readable form
(WordPerfect or ASCII).

Info

Further information on the conference will be obtainable from
    Hans J�rgen Marker
    Danish Data Archives
    Munkebjergv�nget 48
    5230 Odense M
    Denmark
    Phone +45 66 15 79 20 Fax +45 66 15 83 20
    E-Mail (EARN): DDAHM @ NEUVM1
=========================================================================
Date:         Mon, 11 Feb 91 09:54:08 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Susan_R._Harris@UM.CC.UMICH.EDU
Subject:      Re: Unicode 1.0

You can obtain a copy of the Unicode Final Review Document by
sending mail to MICROSOFT!ASMUSF@UUNET.UU.NET. (This is what
I heard in HUMANIST.  I sent e-mail to this address last week,
though, and haven't gotten a reply.)
     -Susan R. Harris
=========================================================================
Date:         Mon, 11 Feb 91 16:12:36 GMT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         DEL2@PHOENIX.CAMBRIDGE.AC.UK
Subject:      Unicode

For those of you interested in the Unicode (character-encoding) debate:
for e-mail responses the deadline has been extended to 25 February.  So
order your copy now from microsoft!asmusf@uunet.uu.net, and make sure
your voice is heard.
Regards, Douglas de Lacey.
=========================================================================
Date:         Mon, 11 Feb 91 10:57:00 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "Robin C. Cover" <ZRCC1001@SMUVM1.BITNET>
Subject:      UNICODE DOCUMENT SOURCE

Re:
> Date:         Mon, 11 Feb 91 02:04:52 EST
> Subject:      Re: Unicode 1.0
> Does anyone know where I could obtain a copy of the Unicode book?
> Josh Hendrix

Ask for the UNICODE "Draft Standard - Final Review Document."  The comment
period is to officially close Feb. 15th, and a technical committee meeting for
review of comments is scheduled for Feb 28th; email comment open till Feb 25th.

I would be interested in whether the TEI editors (or relevant sub-committees)
have supplied any comment to the UNICODE Consortium reflecting the interests
of various TEI constituencies.

Unicode Final Review
c/o Asmus Freytag
Building 2/Floor 2
Microsoft Corporation
One Microsoft Way
Redmond, WA  98052-6399  USA

Email: microsoft!asmusf@uunet.uu.net
Tel: (1 206) 882-8080
FAX: (1 206) 883-8101
Telex: 160520
=========================================================================
Date:         Mon, 11 Feb 91 19:39:00 GMT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Lou Burnard <LOU@VAX.OXFORD.AC.UK>
Subject:      TEI views on Unicode

Briefly, and rather unsatisfactorily, in answer to Robin Cover's latest
query: yes, the TEI working group on character set problems, chaired by
Harry Gaylord, was specifically charged with an evaluation of the
relevance of Unicode to TEI concerns (among other things) when it was
set  up in December. Its report is due in a few weeks but I have
already asked Harry to post a comment here too as soon as possible. Be
patient though -- like me, he's up to his eyebrows trying to get things
ready for the various TEI events at the ACH/ALLC Conference in Tempe
next month!

Lou Burnard
EuroEd TEI
=========================================================================
Date:         Mon, 11 Feb 91 14:03:31 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Comments:     "ACH / ACL / ALLC Text Encoding Initiative"
From:         Michael Sperberg-McQueen 312 996-2477 -2981 <U35395@UICVM.BITNET>
Subject:      Reminder:  deadline for ACH/ALLC conference registration

> Remember also that a TEI workshop will be held at ACH/ALLC '91.
> If you are one of the many who'd like to get better information on
> how the TEI scheme is supposed to work in practice, be sure to come
> to ACH/ALLC and to the workshop.  -CMSMcQ

This is a reminder that "early" registration for ACH/ALLC '91
must be in by February 12 to qualify for the discount and to
be sure of space in the dormitory.

You would like a registration form or a copy of the program,
contact ATDXB@ASUACAD.

Daniel Brink, Associate Dean for Technology Integration
College of Liberal Arts and Sciences
Arizona State University, Tempe, AZ 85287-1701
602/965-7748/1441 fax -1093  ATDXB@ASUVM.INRE.ASU.EDU
=========================================================================
Date:         Mon, 11 Feb 91 16:55:00 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         FORTIER@UOFMCC.BITNET
Subject:      Final Critique

1
-
-
                THE TEI GUIDELINES (VERSION 1.1 10/90)
0                             A CRITIQUE
0                               by the Literature Working Group
0  Background
+  __________
0     This critique of version P1.1 of the TEI Guidelines was
0  drafted by the five members of the Literature Texts Work Group.
0  These people work with texts in four natural languages, several
0  literary genres and periods from the Middle Ages to the present.
0  Among them they have recorded several million words of text,
0  directed the development of several software systems, and
0  published several dozen articles and half a dozen books based on
0  computer analyses of texts; the methodology of these publications
0  varies from traditional literary history to advanced statistical
0  analyses.
0     Much of the following critique is based on the Survey of the
0  needs of scholars in literature carried out by the Work Group; to
0  which some forty interdisciplinary producing scholars responded.
0  A copy of the results of this Survey is available from the TEI
0  Project.  A preliminary version of this critique was circulated
0  to the Editors of the project, and Michael Sperberg-McQueen's
0  responses to it have been extremely helpful in arriving at this
0  final version.
-  1. Perspective
+  _  ___________
0     The Work Group is impressed by the finished character of the
0  current version of the Guidelines document, and the almost total
0  absence of typographic errors.  As people who work with and
0  generate texts on a daily basis, we recognize the amount of
0  effort which such an achievement represents.  We wish to begin by
0  expressing high praise for the current Guidelines as the result
0  of concentrated and efficacious work on a difficult problem.
0  Michael and Lou should be particularly singled out for this
0  praise.
-     The comments which follow are offered in a spirit of friendly
0  collaboration in the hope that that will make an impressive
0  document even better and will bring it more closely into
0  conformity with the needs and perspectives of scholars working
0  with literature.
-     The Work Group understands that the TEI is proposing a coding
0  system for interchange, not for entry of texts.  We realize also
0  that many things are suggested as options, not as requirements.
0  It must however also be recognized that simple considerations of
0  efficiency -- it is practical to have a locally standard code as
0  close as possible to the interchange code -- will tend to foster
0  the use of TEI codes at the local level; ASCII was originally
0  proposed as an interchange code; it is now a standard for
0  alphanumeric representation.
-     The very polished and comprehensive nature of the present
0  Guidelines, also, means that there will be a tendency for them to
0  become standards, both for interchange and local processing, and
0  even data entry; this possibility must be faced and taken into
0  account as they are drafted.  By a similar process optional
0  codes, in the absence of clear distinction between the optional
0  and the required, will tend to be considered as recommended or
0  required, in spite of occasional or implicit indications to the
0  contrary.
-     Three of the Poughkeepsie principles bear on this matter.
-  A. The Poughkeepsie Principles
+  _  ___ ____________ __________
-  2. The Guidelines are also intended to suggest principles for the
0  encoding of texts in the same format.
-  5. The Guidelines should include a minimal set of conventions for
0  encoding new texts in the format.
-  9.  Conversion of existing machine-readable texts to the new
0  format involves the translation of their conventions into the
0  syntax of the new format.  No requirements will be made for the
0  addition of information not already coded in the texts.
-  It is our opinion that these three principles are of particular
0  importance to scholars in literature, and that they are not
0  sufficiently reflected in the current version of the Guidelines.
0  Our reasons for this opinion will become clear in the rest of
0  this report.
-  B. The Perspective of the Literature Scholar
+  _  ___ ___________ __ ___ __________ _______
0     Like most practitioners of an intellectual discipline,
0  Literature Scholars are accustomed to working from a
0  methodological perspective.  The Guidelines would profit greatly
0  from a theoretical introduction, making clear what is meant by
0  such terms as "text", "tag", "hierarchy", etc.  The fragments of
0  discussion of this topic found here and there in the Guidelines
0  (e.g. p. 71) are not adequate for this purpose.  We realise that
0  generating such definitions will not be an easy task given that
0  in a printed text titles, footnotes, and variants are clearly
0  tags to the text, but in a TEI text they are treated as text.
0  How the nature of text and tag changes as a result of a change in
0  medium is not at all clear.
0     Similarly, we in literature recognize in a single text a
0  plethora of structures:  physical (page and line breaks), formal
0  (parts, chapters, paragraphs), grammatical, semantic, actantial,
0  narrative, psychological, and so on.  Each can be deemed
0  hierarchical from certain perspectives.  Do the Guidelines permit
0  all of these structures to be defined as hierarchies?  Does it
0  require such definition for their manipulation?  Does it allow
0  them to be handled simultaneously so that their interrelations
0  can be examined?  The suggestions for treating parallel texts in
0  5.10.12 (pp. 122-3) and elsewhere are not very clear on these
0  matters.
-  Literary texts usually aim at richness of expression and
0  multiplicity of levels of possible meaning.  Can SGML-based
0  Guidelines integrate this basic characteristic of literature, or
0  do they attempt to abolish it?
0     We realise that these are vexed questions, recalcitrant to
0  simple answers, particularly when one accepts -- as we do with
0  high praise -- the principle enunciated by the linguists (p. 130)
0  that all theoretical positions must be welcomed by the
0  Guidelines, but no one must be given pride of place.  On the
0  other hand, we consider it crucial for the acceptance of the
0  Guidelines by our constituency that a thoughtful discussion of
0  these matters be found at the beginning of the Guidelines
0  document.  For instance, the discussion of highlighting on pp. 78
0  and 124 would seem, in the absence of such a discussion, to be
0  based on the premise that authorial intention is discernible from
0  the text; such a premise ceased being intellectually respectable
0  in our field about fifty years ago.
-     The pragmatics of work on literature texts is also a source of
0  concern in a number of areas.
0     The scholar in literature typically works with large amounts
0  of data, since computer processing is used mainly when it is not
0  practical to commit a text to memory.
0     These scholars are concerned mainly with inputting texts as
0  rapidly and with as reasonable a cost as possible, verifying it
0  as effectively and cheaply as possible, and getting on as quickly
0  as possible with the analytic work which was their reason for
0  working with the machine.
0     Except when they are generating a canonical text, literature
0  scholars work with a specific edition of a text which is
0  considered canonical in the sense that it is the one which is
0  cited and quoted in serious professional work.  According to
0  situations, this specific edition will be a critical edition, a
0  prestigeous edition, a trade edition.  They will want to refer
0  easily to pages and lines in this text.  That the electronic
0  version of this be stable and not subject to change other than to
0  correct errors is also a requirement.  This perspective is made
0  perfectly clear in the responses to the Survey and in the
0  practices of the great repositories of machine-readable texts,
0  like the Tresor de la langue francaise.
0     Literature scholars are not interested in, in fact many object
0  vehemently to, the perspective of obtaining texts which already
0  contain - explicitly or implicitly - literary interpretations.
0  The responses and comments elicited by the Survey bear eloquent
0  witness to this.
-     For these reasons we recommend that the Guidelines clearly
0  distinguish between a minimal set of required tags and a wide
0  range of optional tags to be used at the discretion of the text
0  preparer.
-     The present version of the Guidelines is not in harmony with
0  our perspective.  Some Examples:
-  p. 1 (1.1.1)  The statement is made that the Guidelines "are also
0  intended to provide both guidance to the scholar embarking on the
0  creation of an electronic text, both as to what textual features
0  should be captured and as to how they should be represented".  We
0  do not find such a claim appropriate in what is clearly becoming
0  a technical manual, not a user's guide.  We consider that such a
0  claim constitutes a dangerous trap for the neophyte.  It should
0  be removed.
-  p. 4 para 3. States that full tags need not be entered by hand,
0  and allusion is made to macros or parsers;  no examples are
0  furnished, no names or references are furnished.  Here again we
0  are concerned about about the effect on the neophyte.  If macros
0  and parsers exist, examples of both should be provided here and
0  at least half of the examples in the rest of the document should
0  show their use.
-  p. 15 (2.1.4)  Recommends embedding a given interpretation into
0  mark up at the time of data capture or conversion in the form of
0  a DTD.  The Survey clearly indicates that most scholars of
0  literature strongly oppose finding interpretation already in
0  texts which they receive.  To recommend embodying such
0  interpretation in an interchange format is paradoxical to say the
0  least.
0     It is recognized that all coding can be seen as a kind of
0  interpretation but a fundamental distinction must be made here.
0  A certain character is or is not in italic; once the way of
0  representing italic has been decided, a simple either-or decision
0  carrying very little intellectual content will resolve the
0  matter.  Why a word is italicised is open to a number of
0  interpretations; scholars legitimately may not agree on which one
0  or ones are valid.  This is interpretation in the usual sense,
0  and is the domain of the scholar working on the completed text,
0  not that of the coder inputting or converting the text.
0  Recommendations overlooking this distinction will alienate the
0  vast majority of literature people working with computer.  The
0  Survey has made this clear.
-  p. 16 (2.1.4.2)  Minimisation rules are a good idea.  Examples
0  (note the plural) should be provided.
-  p. 23, the example.  The coding is much too wordy; the poem,
0  which is tiny, disappears under mass of the codes.  Responses to
0  the Survey and discussions on TEI-LIST have made clear the dismay
0  of the scholarly community with this wordiness.  Minimisation
0  will have to be carried much further, and software will have to
0  be developed with a feature similar to the reveal codes/hide
0  codes function on many word processors.  This is not a minor
0  problem but points to an underlying reality.  If structural
0  features are indicated by format, this indication suffices.
0  Those features which require explicit coding will be more
0  complex, more prone to error, more difficult to enter
0  consistently, more difficult to verify and proofread.  Scholars
0  are not likely to undertake such onerous tasks whose results will
0  be so fragile.
0     It should be recalled that in the final analysis the success
0  of the TEI standards will depend on their acceptance and use by
0  the scholarly community.
0     In general, the very wordy nature of the tags recalls an
0  archaic period in computing, when the user was expected to
0  specify everything to the machine.  A more contemporary and user-
0  friendly mode of tagging is expected by current users and must be
0  sought, since few users can be expected to put up with such
0  wordiness any more.
-  p. 28 (2.1.7)  Entity Reference (string substitution).  This is
0  excellent.  It must be stressed more, alluded to more, and shown
0  frequently in examples.
-  p. 55 (4.1.4)  Since most scholarly work in literature is based
0  on a canonical text, in which pagination and lineation frequently
0  varies with the PRINTING not just the edition; it is essential to
0  identify the date of printing and the print shop in the header
0  material of a machine readable file of a text based on a printed
0  edition.  Reference back to the original, verification and
0  proofreading are impossible without them.
-  p. 62. We suggest putting print shop and date of printing between
0  the information on the publication and that of the distribution.
0  This would also be the appropriate place to identify the location
0  and shelf mark for manuscripts and incunabulae.
-  p. 65 (4.5) The encoding declarations are of course the ideal
0  place to put allusions to and/or explanations of the local coding
0  conventions.  Please stress this fact here.  In fact, we
0  recommend making it a condition of conformity to TEI standards
0  that local coding for features not available on the key-board
0  used (font changes, accented letters, etc.), be documented in a
0  header record.
-  p. 71 (5.1), para 1.   The definition of text, "an extended
0  stretch of natural discourse, whether written or spoken", is not
0  correct.  Not all texts are extended.  Spoken natural discourse
0  is not text until transcribed in written form.
-  p. 71 (5.1), para 5.  Again, the ability to point to a unique
0  place in the text of the original printed document is essential
0  to the needs of literature scholars.  This must be stressed here
0  and shown in the examples.  The Survey is eloquent on this
0  matter.
-  p. 77 (5.2.5)  Colophon -- not a term everyone can be expected to
0  know.  Note that the Pleiade edition shows this as front matter.
0  Given the practical importance of printing date and print shop
0  information included here, we recommend that it be put at the
0  beginning of the file, right after the publisher identification.
-  p. 77 (5.3.1)  Given their importance for locating a quoted or
0  identified passage, line breaks should be mentionned here and
0  their importance stressed.   The Survey made this abundantly
0  clear.
-  p. 93 (5.6)  A strong recommendation to code page breaks:
0  EXCELLENT.  Please put in as strong or a stronger recommendation
0  to code line breaks, i.e. always put in unless there is a
0  compelling reason not to do so, even in prose texts.  To do
0  otherwise would be to ignore the contribution of the scholars who
0  participated in the Survey.
-  pp. 125-6 (5.11.2) Information about the layout of the edition
0  input (i.e. page and line breaks), which permits reference back
0  to the original text being studied, is crucial to the needs of
0  most literature scholars.  To state that the "line-break" tag "is
0  intended only for cases where lineation of a prose text is
0  considered of importance in its own right" (p. 126), suggests
0  that such reference is rare, whereas it is THE NORM.  It MUST NOT
0  be downplayed in this fashion.
0     Our judgement, confirmed by the Survey, is that most scholars
0  use electronic text in a fashion that requires the ability to
0  make unambiguous reference back to a precise place in canonical
0  printed text on which it is based.  Thus lineation of a prose
0  text is always considered important a priori, unless for cases
0  like the Bible, a clear case can be made for coding in a
0  different fashion. In short, the suggestion that lineation can
0  somehow not be important in a text runs counter to the needs and
0  practices of scholars of literature.
-  p. 177 (7.3.1.1) It is not necessary to specify the metre
0  attribute in every line.  That is the work of the analyst not the
0  archivist or the scanner corrector.
-  p. 178 (7.3.1.2)  Even for rhyme of type "aa" French prosody
0  recognizes at least three types: rime suffisante (not necessarily
0  the same as assonance), rime pauvre, and rime riche.  Perhaps
0  this should also be taken into account.  BETTER, given the range
0  of languages to which the Guidelines are to apply and the large
0  number of prosodic systems in question, perhaps the Guidelines
0  should not be so prescriptive.  The Work Group expects to work on
0  optional codes for such things, once more pressing requirements
0  of literature scholars have been attended to.
-  p. 200 Putting tags, entities and redefinitions in a separate
0  file for calling up by many texts is an excellent idea.
0  Unfortunately the example is not at all clear, and makes this
0  seem much more complex and confusing than it is or need be.
-  pp. 207-09.  It is a trap for the unwary and an irritation to the
0  experienced to show the suppression of typographical information
0  (line breaks) in an extended example like this.  The
0  justification that the edition used wasn't very good - "the
0  edition being used is of little editorial interest in itself"
0  (208) - makes things worse; poor editions should not be converted
0  to machine-readable form!
-  pp. 219-33 (A.6)  We agree that in the case of the Bible the
0  older and more authoritative method of identifying passages
0  should prevail.
-  2. Coding Levels
+  _  ______ ______
0     The Guidelines recommend three levels of coding:
-  1. Required in any TEI conformant document (e.g. Title, author,
0  etc.)
-  2. Required for interchange, but a more succinct local code is
0  recommended (e.g. accented letters, non-roman alphabetics).
-  3. Optional e.g. <Word in italics because of irony, unless the
0  author really meant just to try to represent the intonation of
0  the speaker> really</word in italics because of irony, unless the
0  author really meant just to try to represent the intonation of
0  the speaker>.  It is not always easy to tell which is which from
0  the present version of the document.  This distinction must be
0  made clear.
0     We recommend a very small number of required codes: just what
0  is necessary to identify fully the edition and printing used and
0  to find a given passage in it in terms of pages and lines,
0  divisions into chapters, acts and scenes, cantos, or books, etc.,
0  the character set used, and the representation used for features
0  in the text but not in the character set (i.e. accented letters,
0  font changes).  All other codes must be optional.  Examples of
0  optional codes should be furnished.  We repeat that the
0  distinction between the two types must be made abundantly clear
0  even to the uninformed, casual or negligent reader.
0     In our view, a possible method would be to separate out each
0  type and group them as required, or optional.  An alternate
0  method would be to tag each heading with a parenthetical
0  indication of which class each tag or tag type belongs to.  The
0  optimum method would be to do both.
-     Further comments on coding levels follow:
-  p. 1 (1.1.2) The Guidelines recommend the use of simpler and less
0  wordy codes in a local environment, which codes are to be
0  translated into full TEI coding for interchange.  EXCELLENT!!!
0  BRAVO!!!  PLEASE DO MORE OF THIS!  It should be made very clear
0  that this is the RECOMMENDED approach.  Examples of existing
0  coding schemes upgradable to TEI level taken from existing
0  archives should be given.  Other examples (made up for the
0  purpose) should be given.  It must be made clear to the user that
0  clean, clear and easy codes are to be the NORM for local use, and
0  that the full TEI codes are for interchange and possibly archive
0  purposes only.
-  p. 4 para 5.  Interchange format does not allow any tag
0  reduction.  This is legitimate.  But it MUST be made clearer that
0  local minimization is encouraged, as long as automatic upgrading
0  to full TEI codes is possible from the local code.
-  pp. 13-14   The examples are the perfect place to show a local
0  code first, then the full TEI code.
-  pp. 45-52 (3.2)  Character Sets.  It MUST be made clear that this
0  applies to interchange only.  Local codes MUST be recommmended
0  and SHOWN which are easy to input and easy to use on a screen and
0  printer of MAC, DOS and Mainframe machines (at least 2 sets of
0  examples for each of the three).  Preferably get some from
0  existing databases and some from the various forms of 8859.
0     The exclusion of such an important punctuation mark as the
0  exclamation mark puts a needless coding burden on scholars.  This
0  exclusion should be removed.  SGML should not take precedence
0  over the needs of scholars.  Similar arguments can be made in
0  favour of the pound sign and square brackets.
-  p. 58 (para 4)   The exclusion of recording the names of the
0  person or persons who actually did the recording work reveals an
0  inappropriate class and/or gender bias.  Please delete this
0  paragraph.
-  pp. 58-59  The examples provide an excellent opportunity to show
0  both local codes and TEI codes.
-  p. 59 (4.3.2) para 5.  The changes listed "corrections of mis-
0  spellings of data, changes in the arrangements of the contents,
0  changes in the output format", are not in fact minor.  This
0  paragraph contradicts p. 55 (4.1.6).  Please clarify, or better
0  still, choose.  pp. 82-83 (5.3.6, 5.3.7)  It MUST be made clear
0  that these very wordy and error-prone features are optional.
0  Please try to cut down their length.  It is essential to warn the
0  potential user of their complexity and of the difficulty of
0  coding them accurately in a text of any size.  Their optional
0  nature MUST be made more clear.  In their present state they are
0  counter productive, both because of their wordiness and because
0  of the technical naivete which such wordiness embodies.
-  pp. 84-6 (5.3.8) List handling is excessively wordy and takes too
0  much for granted.  There must be an example of a simplified local
0  code as well as the full TEI code here.
-  pp. 86-89 (5.3.11) Numbers:  a perfect example here of a trap for
0  the unwary.  Only "may" on p. 87 shows that this extremely wordy
0  coding is optional.
-  pp. 89-90 (5.4)  This is a good idea but for a post-input markup.
0  This fact must be made clear and encouraged.  Mention that this
0  is a relatively rare occurrence.
-  p. 93 (5.6.1) It is absolutely necessary to have an example here
0  and to show both local and TEI formats.
-  p. 94 (5.6.1)  It is absolutely necessary here to have an example
0  and to show both local and TEI coding.  It is very doubtful that
0  any scholar or dritic will ever use this kind of coding.
0  Something more straightforward and user friendly is required.
-  p. 97 (5.6.4)  Seems to suggest only fully explicit coding in
0  milestones.  You really need to show brief local codes here, PLUS
0  their expansion into TEI codes.
-  p. 103 (5.8.1)  Explicit tagging of sentences.  This is overkill.
0  This must be clearly indicated as optional and another part needs
0  to be added suggesting how to set up a local code permitting
0  automatic conversion to this level of coding.
-  pp. 110 ff (5.10.3)  The examples from pp. 110 through 117 are
0  prime candidates for examples of both local and full TEI codes.
0  The Critical Edition example is particularly weak.  The example
0  is trivial.  The only clear presentation is the uncoded one.  The
0  explicit and wordy recording of the lack of variants, and the use
0  of "&zero.var" for omissions, are bizarre in the extreme and flie
0  in the face of a millenium of scholarly practice.  This attempt
0  to reduce three parallel texts to a single linearly expressed
0  notation is clearly defective.  The text has been destroyed and
0  converted into an unreadable list of real and potential variants.
0     The prime function of any text is to be read.  This conversion
0  has destroyed the text as text.  Reference must be made to
0  experts in this domain and their advice must be followed.  Here
0  again we hope to work on this, once more fundamental questions
0  have been resolved.
-  p. 170 (7.2.1)  The encoding declarations are an EXCELLENT idea
0  and to be encouraged, indeed made required.  They also foster the
0  definition of local standards which can be converted
0  automatically into TEI format.
-  pp. 207-09  A perfect place for a two-step example the first part
0  showing local code, the second showing TEI code.
-  3. Coding Types
+  _  ______ _____
0     Here are discussed the two types of coding Presentational
0  (capital letters, line breaks, italics, etc.), and Descriptive
0  (Proper noun, italics showing irony, stress or a foreign word,
0  etc.)
0     Our perspective is that coding (inputting or converting text)
0  is not the same as interpreting.  Descriptive coding as presented
0  in the Guidelines is squarely in the domain of interpretation.
0  Most scholars do not want interpreted texts; they expect to do
0  that job themselves.  They made this abundantly clear in the
0  Survey; we must not ignore them.  When possible scholars hire
0  assistants to input texts, and do not expect these assistants to
0  do the interpretation.  This whole aspect needs to be brought
0  into conformity with scholarly practice, otherwise the TEI
0  standards will not be respected.
0     To repeat one-to-one conversion of typographical features is
0  not controversial; it should be done as faithfully as possible.
0  It must be a requirement in a TEI conformant text.  Coding or
0  interpretation in the sense of description of authorial
0  "intention" or the choice among several alternatives on the basis
0  of judgement is a different matter, which is designated
0  descriptive coding.  It can be allowed but never recommended.
0  The Guidelines are quite unclear on this matter, and seem to make
0  conflicting suggestions in different places.
0     Descriptive mark up can at the limit be made an option for
0  those who feel they must do it.  But it must be made clear that
0  such tagging is OPTIONAL and NOT REQUIRED.
-     Comments on details follow:
-  p. 12 (2.1.2) Direct quotation, indirect quotation, indirect
0  discourse, free indirect discourse, authorial comment,
0  description or narration -- all of these aspects of a text can
0  blend one into another.  Which is which is open to interpretation
0  and debate.  It is ludicrous to tag them as if such distinctions
0  could be made once and for all. Not only must the optional nature
0  of such tagging be stressed, but potential users must be
0  cautioned to exercise prudence in such coding, to define
0  categories carefully, to test them by hand on small samples and
0  shake them down on larger samples of electronic text, before
0  undertaking the tagging of a full text.
-  p. 71 (5.1) Presentational mark up is allowed here, as well as
0  descriptive.  NO!  Presentational mark up should be recommended,
0  with descriptive at most recognized as possible if one wants to
0  use it, but with warnings against it.  The examples will have to
0  be revised.
-  pp. 77-78, 88, etc.  The concept of crystals (or the choice of
0  term) is not made clear, the examples are difficult to follow.
0  Revision seems in order.
-  pp. 78-9 (5.3.2)  This section is presented primarily in terms of
0  descriptive mark up, which is wrong.  The presentational should
0  be recommended, if only because it avoids the excessive wordiness
0  of the descriptive approach.   The wordiness of the so-called
1
0                                                                  2
-
0  presentational mark up must be reduced, for example "highlighted
0  rendition=italic" can be replaced with "ital" without any loss of
0  information.  In fact, the longer form is more descriptive than
0  presentational.  The earlier examples of handling of the
0  underlying features of italics, require so subjective an
0  interpretation that any scientific rigour in a text coded using
0  them would be destroyed.
-  pp. 79-81 (5.3.3)  Do NOT recommend tagging of underlying
0  features, just the opposite.  Stick with the <q> </q> for open
0  and close quotes, suggest something else for block quotes, e.g.
0  <bq>.  Remind the user that she can use open and close quotes or
0  guillemets (other things for embedded quotes) for a local code
0  and have a conversion program take care of the rest.
0     "Guillemets" by the way is used in the plural. There is no
0  such thing as a single guillemet. What you show as such are
0  greater than and less than signs.  What is the use of 66U, etc.
0  when character set tables are in the appendix?
0     The recommendation to use "rendition = unmarked" (p. 80) with
0  "q" is bizarre in the extreme.  Many readers, and some of the
0  better software, can be expected to identify an item as unmarked
0  without the aid of a specific tag.
-  pp. 81-82 (5.3.4, 5.3.5)  Perfect traps for the unwary.  This is
0  interpretation and dependant on time; it adds unnecessary work,
0  confusion and possibility for error.  Particularly true in the
0  case of "croissant" (p. 81) and in the example on p. 82.
-  p. 83 (5.3.7)  If anyone in our community sees the bibliographic
0  tagging on 83, the TEI is a dead letter.  The issues of how to
0  handle names, abbreviaations in names etc. is important and not
0  easy for programers to deal with but if this level of coding has
0  to be done at the capture or transmission stage, we assure you,
0  no one will use TEI. (Sorry archivists and programmers might, but
0  no one who is putting text into machine readable form in order to
0  do anything critical or scholarly with it will ever do this kind
0  of hiding of information in layers and layers of codes.
-  p. 103 (5.8.1)  Explicit tagging of sentences.  This takes for
0  granted that such can be known, which is not the case for
0  numerous poets, and even novelists since the l930's cf. Celine,
0  Simon, etc. in French.  Here is an excellent example of why
0  descriptive coding is wrong.
-  p. 105 (last para)  It is most questionable whether one should
0  EVER remove an interpretable feature from a text and replace it
0  by an interpretation.  Not only does this make impossible
0  verification of the data (it has to be re-interpreted not
0  proofread) but it also involves the coder usurping the role of
0  the scholar who does the interpretation.
-  p. 123 (5.11)  Here presentational mark up is described as
0  exceptional and extraordinary, earlier it was presented as a
0  valid alternative; consistent standards never hurt.
0     More important, presentational mark up should BE the standard,
0  with descriptive only an option which is allowed with cautions.
-  p. 123 (5.11) Use of "descriptive" in line one and of
0  "presentation" in line 4 shows the problem presented by the SGML
0  approach.  If presentational markup had been used from the start
0  as the sine qua non -- none of this would be a problem.
-  p. 124 (5.11.1) The example.  What edition was used?  What are
0  the page and line boundaries?  Or was this all made up too?
0     This example is a perfect demonstration of the weakness of
0  descriptive mark up:  "Anglice" is not found in the standard
0  Latin dictionary (Lewis and Short).  What are we dealing with
0  here?  Are the italics quotes, emphasis or ironic?  Let the coder
0  code and leave the interpretation to the scholar.
-  p. 176 (7.3)  First, according to certain schools of
0  interpretation texts can and should be regarded in isolation, and
0  it is not the place of the TEI to pass judgement on this question
0  of literary theory.
0     Second, presentational mark up is essential because the
0  Guidelines deal with coding a text, not its interpretation.  The
0  role of a given textual feature is ALWAYS open to interpretation,
0  so the function of a good coding scheme is to facilitate
0  interpretation, not pre-empt it.
-  p. 214 (bottom) The Hamlet example.  The stage type describes
0  only the first half of the stage direction; this is the problem
0  with descriptive tagging.
0     Someone should try to reduce the wordiness of this tagging,
0  particularly in the case of the speaker distinctions.
-  4. Other
+  _  _____
0     This section contains comments that do not fit easily into the
0  categories used above.
-  pp. 75-76 (5.2.4) Why use <div0> etc.?  The names given to the
0  sections by the author are the text.  If the author choses to use
0  a number "I" or "2" surrounded by blank space that is what SGML
0  should do.  It if cannot code blank lines and blanks, then we are
0  in rather serious trouble as literature scholars.  We will be
0  forced to describe, when presentation is what we want to do.
0  This whole section is really designed for programmers, not for
0  people in our area -- this type of material will only frighten
0  users away from the Guidelines; it is virtually incomprehensible
0  and in the long run not even true. There are alternatives other
0  than the one listed, using the facts of the text, rather than any
0  imposed divisions: large or small.
-  p. 76 (5.2.4)  The distinction between legal and illegal forms is
0  not clear.  In any case the legalistic terminology is not
0  appropriate.
-  p. 79 (line 7).  The "second" sentence.  TYPO.  It is the only
0  sentence in the example unless the TEI standards have subtleties
0  which escaped the committee.
-  p. 88 (example 1) TYPO.  </date> must go after "seventy-seven" if
0  you care to be consistent with the date coded earlier as
0  1977-06-12.
-  p. 90 (example after <del>) "Dumb clucks": Belittling the reader
0  in this fashion is not amusing; it is offensive. Remove it and
0  find a real example from a real text.
-  p. 95 Assumes exactly what we do not want to assume: "text has
0  been entered without preserving pagination".  No need for
0  artificial reference scheme; one already exists (the page numbers
0  and carriage returns at the end of the lines).
-  p. 96 (4.6.2) What can it mean to mark as "absent" a piece of
0  text  that is not present?  What exactly is there to be marked?
-  p. 105 (5.8.2) Soft hyphens EXIST in source texts.  Please
0  suggest more clearly how to handle them when they occur.
-  pp. 110 ff. (5.10.3).  Find a real text for a real example here.
0  The imaginary and "humourous" one trivialises what is being done.
-  p. 129 (6.1) para 2.  Trying to define forms with no reference to
0  content is a mug's game.  The whole concept of structure shows
0  that form determines content and content determines form, in
0  varying degrees according to the context, example, and
0  interpretative perspective, of course.  In other words, you must
0  create unanimity among the community of scholars BEFORE you can
0  define the forms they can use.  Not a practicable enterprise.
-  p. 130 (6.1)  The principle for linguistics (welcome all
0  theoretical positions, favour none) is EXCELLENT.  We recommend
0  the same thing for literature; this is the basic premise of most
0  of the preceding comments.
-  pp. 140-44 (6.2.4)  Incredibly wordy and unreadable coding for
0  linguistic features.  If the linguists consider this a good idea,
0  more power to them.  We recommend not getting into this for
0  literature texts.
-  p. 169 (7.1) "verse, drama and narrative".  Narrative used in the
0  sense of prose.  Not all prose is narrative (cf. Cook books, or
0  the TEI Guidelines), not even all literary prose is narrative
0  (some is descriptive).  If you are going to try to dictate, or
0  even make suggestions, to scholars in literature, you must get
0  the technical language right, and "sermons guidebooks, recipe
0  books, etc." (p. 176) are NOT narratives, formal or otherwise, in
0  any accepted sense of the word.
-  p 180 (7.3.2.1) Overkill if both speaker and speech tell that the
0  speaker is Cordelia -- why not just say so once by recognizing
0  the abbreviation of the speaker's name that is in the text to be
0  the "tag" that it is.  The real problem in dealing with speech in
0  plays is that the speaker's tag needs to appear with each
0  sentence (or all the words) of long speeches.  Identifying "Cor"
0  as <speaker> Cor. </speaker> does not contribute to solving this
0  problem.
-  p. 180 (7.3.2.2) Excellent example of giving the simple tag,
0  mentioning that some investigators may want to also encode this,
0  that and the other, but not giving prescriptive examples.
-  p. 181 (7.3.2.4)  French texts of plays also show the date and
0  place of the first production as well as the names of the actors.
0  You should provide for this.
-  p. 181 (7.3.3) Use PROSE not narrative, to include the essay and
0  free form creations (cf. Butor's works).
-  p. 207 -- the original of this example is a printed document, not
0  scanner output.  Please begin by showing the original not an
0  intermediary stage of processing.
-  P. 215 The idea of removing speaker tags, then identify them as
0  speaker 1 and speaker 2, but then to actually give them names in
0  the speech tag that follows, is to say the least messy. Either
0  the speaker is tagged in the text or is not.
-  p. 215  Note "Mar.Marc" -- clearly a leftover fragment of a
0  redundant tag.
-  p. 270  Alternate Base for DTD for drama---If this goes out to
0  any public other than programers, then the TEI standards will not
0  be used. Give us one reason why anyone would want to.
-  Place of insertion to be chosen:  Concern was expressed in the
0  Work Group about the integrity of electronic texts.  Simply
0  counting the size of a file in bytes does not guarantee that one
0  can recognize modifications in it.  Shareware exists which
0  generates a unique number for a text; a number which will change
0  if any modifications are made to it.  Please look into the
0  possibility of recommending such software, or better recommending
0  that it, and the number generated by it be included with archived
0  or shared texts.
=========================================================================
Date:         Mon, 11 Feb 91 18:47:31 MST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Daniel Brink <ATDXB@ASUACAD.BITNET>
Subject:      Re: Final Critique
In-Reply-To:  Message of Mon, 11 Feb 91 16:55:00 CST from <FORTIER@UOFMCC>

Since there is a "Critique" session planned for ACH/ALLC, will there
be any representation from the authors of this "Final Critique" at
the conference?

Daniel Brink, Associate Dean for Technology Integration
College of Liberal Arts and Sciences
Arizona State University, Tempe, AZ 85287-1701
602/965-7748/1441 fax -1093  ATDXB@ASUVM.INRE.ASU.EDU
=========================================================================
Date:         Tue, 12 Feb 91 09:33:00 MDT
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         CHERYLL BALL <CBALL@UNMB.BITNET>
Subject:      Mass Mailings

  For all you folks out there who responded to me so RUDELY, I was
given this list by someone on another list who thought this list might help.
I am truly sorry for being so NAIVE to believe everything that I am told
from other people. I had no IDEA that I would be chastized so HARSELY.
Believe me when I say I WILL NEVER POST TO THIS LIST AGAIN.
=========================================================================
Date:         Tue, 12 Feb 91 12:07:10 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         var@IRIS.BROWN.EDU
Subject:      TEI and HyTime

There is an ANSI meeting coming up soon that might be of interest to
many participants in the Text Encoding Initiative.  This meeting deals
with the standardization of Hypertext around an extension to SGML
called HyTime.  What follows is the meeting announcement for that
meeting.  I am enclosing it here since the meeting is a little over a
week away and it would be nice to have a few TEI folks join the meeting
even on such short notice.

	Victor Riley
	Institute for Research in Information and Scholarship (IRIS)
	Brown University
	PO Box 1646
	Providence, RI 02912
	var@iris.brown.edu

====8<====8<====8<====8<====

X3V1.8M MUSIC IN INFORMATION PROCESSING STANDARDS (MIPS) COMMITTEE
        operating under the rules and procedures of the
              American National Standards Institute

                      X3V1.8M Secretariats:

The Computer Music Association              Graphic Communications Association
c/o Larry Austin, President                 c/o Marion Elledge, Vice President,
P. O. Box 1634                                Information Technologies
San Francisco, California 94101-1634        100 Daingerfield Road
USA (817 566 2235; cma@dept.csci.unt.edu)   Alexandria, Virginia 22314 USA
  (X3V1.8M document orders and service      (703 519 8160; Fax: 703 548-2867)
   to the music technology community)         (X3V1.8M participant mailings
                                               and service to the publishing
                                               systems community)


       MEETING NOTICE and DRAFT AGENDA - FIFTEENTH MEETING

MEETING NOTICE:

     Meeting times:

          Saturday, February 23, 1991, 10:00 AM - 5:00 PM.
          Sunday, February 24, 1991, 9:30 AM - 5:30 PM.
          Monday, February 25, 1991, 9:30 AM - 5:30 PM
          Tuesday, February 26, 1991, 9:30 AM - 5:30 PM.
          Wednesday, February 27, 1991, 9:30 AM - 1:00 PM.


     Meeting Host:

          Graphic  Communications   Association   (GCA),   Norman
          Scharpf,  President;  Marion  Elledge,  Vice President,
          Information Technologies.  The meeting is being held in
          conjunction with the GCA's "TechDoc Winter '91" confer-
          ence.  TechDoc Winter '91,  is  subtitled  "Interactive
          Electronic Documentation (IED)." Tutorial sessions will
          occur simultaneously with  X3V1.8M  meetings  (in  dif-
          ferent  rooms,  of course) on February 25 and 26, while
          the TechDoc Winter '91 conference will take place  from
          February  27 (a one-day overlap with X3V1.8M's meeting)
          to March 1.  There will be a tutorial on HyTime  during
          Tuesday  afternoon, February 26, which the X3V1.8M com-
          mittee may or may not choose to attend.

     Meeting Location:

          The Radisson Hotel
          1600 N. Indian Avenue
          Palm Springs, California 92262
          619 327 8311


WRITTEN CONTRIBUTIONS

     The usual mailing of papers contributed since the last mail-
     ing, together with the most recent revision of X3V1.8M/SD-7,
     the Journal of Development for the  HyTime  Hypermedia/Time-
     based  Document Representation Language (eighth draft), will
     be mailed to participants of record toward the end of  Janu-
     ary,  1991.   Papers should be received in camera-ready form
     by January 15, 1991  by  X3V1.8M  Vice  Chairman  Steven  R.
     Newcomb, Center for Music Research, School of Music, Florida
     State University R-71, Tallahassee, Florida 32306-2098  USA.
     (Voice:  904  644  5786, 904 422 3574.  Fax: 904 386 2562 or
     904 644 6100.  Internet: srn@cmr.fsu.edu.)

LODGING

     Lodging at the Radisson Hotel is available  for  $119/night,
     which  is  a  special rate available to those who mention on
     the phone that  they  are  there  in  conjunction  with  the
     Graphic  Communications  Association's  TechDoc  Winter  '91
     Conference.  The phone number for Radisson  reservations  is
     619  327  8311.   There  is,  of course, no requirement that
     X3V1.8M participants stay at the Radisson,  but,  since  the
     meeting  will  be  held there, the Radisson will be the most
     convenient (if probably not the least expensive) lodging.

TRAVEL

     It is possible to travel directly to Palm  Springs  by  air.
     It  is  generally less expensive to go to Orange County Air-
     port and drive for a couple of hours to Palm  Springs,  par-
     ticularly if you are renting a car anyway.

NOTES TO NEW PARTICIPANTS/OBSERVERS:

     1.   Prospective members and observers are  welcome  at  any
          time  to  participate  in the current technical work of
          the committee.  (You can be most effective in conveying
          your  viewpoint if you can present it in the context of
          the current work -- in other words, please be  familiar
          with  X3V1.8M/SD-6,  SD-7  and SD-8.  If you don't have
          these, they can be obtained for a nominal  charge  from
          the  Computer Music Association's X3V1.8M Secretariat.)
          New participants are also urged to obtain and read  ISO
          8879  (Standard Generalized Markup Language).  ISO 8879
          is obtainable from the Graphic Communications  Associa-
          tion  for  $67.50  (156  pp.).   You should also obtain
          International Standard ISO 8879:1986/Amendment  1  from
          the same organization.

     2.   As usual, a portion of the second day's  meeting  (Sun-
          day) has been set aside for persons who wish to address
          the committee on topics of their own choosing, relating
          to the subject matter or methodology of the committee's
          work.  Mr.  Brian Caporlette of the U.  S.  Air Force's
          Human Resources Laboratory at Wright-Patterson AFB will
          be presenting the recent revisions to the Content  Data
          Model  (CDM) for Interactive Electronic Technical Manu-
          als (IETMs) his organization has made in order to  make
          the CDM conform to HyTime.

     3.   New participants are asked (but not required) to inform
          Charles Goldfarb (c/o Sue Orlando, IBM Almaden Research
          Center, 408/927-2578) or Steve Newcomb  (Florida  State
          University  Center  for Music Research, Tallahassee, FL
          32306-2098, 904/644-5786) if they plan to attend.

DRAFT AGENDA:

     Saturday

          Administrative matters, including: opening, approval of
          agenda,  introduction of new participants, and schedul-
          ing  the  sixteenth  (and  possibly  the   seventeenth)
          meeting(s). Technical work will include a review of the
          changes to SD-7 made as a result of work  done  at  the
          fourteenth meeting.

     Sunday

          Continuation of review of SD-7, particularly the appli-
          cation of the "HyTime architectural form" idea to addi-
          tional elements.  Presentation by  Mr.   Caporlette  on
          the  AFHRL  Content Data Model as revised to conform to
          the HyTime hyperlink and document location  facilities.
          Reconsideration  of  the  "endsets"  idea,  which would
          allow certain link end locations to be restricted to  a
          given list of generic identifiers.

     Monday

          Continuation of SD-7 review, including the  generaliza-
          tion of the time model to space and time.

     Tuesday

          Continuation of Monday's agenda.  Review of the operat-
          ing model of a HyTime engine outlined at the thirteenth
          meeting.  Possible adjournment to  HyTime  Tutorial  in
          the afternoon, which will include a presentation of the
          proto-SD-9 document, "HyTime Review," by Messrs.   Kipp
          and Newcomb.

     Wednesday

          Enumeration of instructions to  the  editors  regarding
          revisions to the working draft of HyTime.  Adjournment.

{Revised 91/02/12}
=========================================================================
Date:         Thu, 14 Feb 91 18:19:18 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Comments:     "ACH / ACL / ALLC Text Encoding Initiative"
From:         Michael Sperberg-McQueen 312 996-2477 -2981 <U35395@UICVM.BITNET>
Subject:      TEI progress report, February 1991

As a change from conference announcements, we thought readers of this
list might like a brief summary of what has been happening in the TEI
since the distribution of the first Draft Guidelines last fall.  Sincere
apologies to those who feel such a report is long overdue!

1.  TEI Deliverables

1.a.  Documents

First, a brief recap on the project's overall timescale and objectives.
What will the TEI deliver in June 1992, when the funding dries up?  It
seems clear that a single massive report (a revised and extended version
of the current document TEI P1) will not be enough.  The need for a
brief introductory guide, setting out the basic TEI framework and
philosophy, has been repeatedly pointed out to us, sometimes privately
and often publicly, as has the pressing need for tutorial material, and
for demonstrations of TEI encoded texts in action.  No effort was put
into producing these in the first cycle, for the good reason that we did
not at that time know what exactly we would be providing an introductory
guide to!  Now that the basic TEI framework is a little less nebulous,
it seems appropriate to address these problems.

Preparations for the forthcoming TEI Workshop at Tempe will provide one
important source of such materials, and input from the affiliated
projects another.  It's possible that readers of this list may also have
prepared some summary or explanatory material which might be of use --
don't be shy about letting us know about it, if you have.  (For
starters, we were recently delighted to receive a translation into
Hungarian of the four page `executive summary' of P1).

1.b.  Software -- a non-deliverable

After tutorial and introductory materials the most frequently expressed
desire at present seems to be for TEI-conformant software: systems which
behave like the analytic packages we all know and love, but can also
take advantage of the new capabilities offered by SGML.  As a first
step, we need programs (filters, as they are known in the trade) to
translate from the TEI encoding scheme to those required by the
application programs we use, and back in the other direction.  For
rolling one's own software, the community needs generally available
routines which can read and understand TEI documents and which can be
built into software individuals or projects develop for themselves or
others (TEI parsers).  Equally important for the usability of the
encoding scheme in the community at large will be TEI-aware data-entry
software -- editors and word processors which can exploit the rich text
structure provided by SGML, simple routines to allow TEI tags to be
entered into a text with a keystroke or two instead of ten or twenty (or
in extreme cases even more!), and other tools to help make new texts in
the form recommended by the TEI.

Approximations to some of these are already available, and we hope to be
demonstrating some of them at the Tempe Workshop.  As we have often
said, the TEI is not in the business of software development:
nevertheless, it's clear that when any opportunity of steering software
developers into channels likely to benefit the TEI community presents
itself, we'd be foolish not to take it.  So far, only encouraging noises
have been heard from most, but products like DynaText (from Electronic
Book Technologies) are a clear indication of the kinds of software we
should expect to be able to choose amongst by the time the project ends.
The Metalanguage Committee has accepted a `watching brief' to monitor
and report on the features of commercially available SGML software, and
has already produced a preliminary working paper (ML P28) which lists
several products of interest to the TEI community, as well as a revised
and expanded version of Robin Cover's monumental bibliography of SGML
related information (ML W14).  (These are not yet publicly available; ML
P28 is being revised to correct a slip or two, and ML W14 will be put on
the TEI-L file server just as soon as we can sweettalk the UIC system
management into the necessary megabyte or so of disk space and move the
data to Chicago from Kingston.)

1.c.  And more documents

Just as many people have asked for some description of TEI encoding less
technical and formal than TEI P1, so also some have asked for a more
formal treatment of the scheme, so that it would be easier to write the
TEI-conformant software they'd like to develop.  In this connection,
some work is proceeding (slowly!) on a formal presentation of the subset
of SGML required by the TEI; the Metalanguage committee is also working
on a more explicit definition of the notion 'TEI conformance'; this
concept was intentionally left vague in the first draft but it appears
that such vagueness has less to recommend it than we thought.

2.  TEI Workplans

If we're not producing any software, and only grudgingly getting round
to explaining the work done in the first cycle, what, you might
reasonably enquire, are we in fact doing?  The major objective during
the second funding cycle will be to extend the scope and coverage of the
Guidelines.  Those who have read P1 closely will be aware, as we are, of
the very large number of topics sketched out, adumbrated or downright
neglected therein.  We remain confident that P1 provides a good general
framework for most forms of text-based scholarship, but we need to put
this claim to the test in more (and more different) areas of
specialisation than was possible during the first cycle.

How will this be done?  One way, as we've already indicated, will be
through the testing of the Guidelines in a practical situation which the
Affiliated Projects will carry out.  The other will be through the
setting-up of a number of small but tightly-focussed working groups to
make recommendations in specified areas, either directly where an area
is already well-defined, or indirectly by sketching out a problem domain
and proposing other work groups which need to be set up within it.  Each
work group will be given a specific charge and will work to a specified
deadline.  So far, about a dozen such groups have been set up, most of
which are due to report back by the end of March: a list of currently
active work groups and their heads is given below:

TR1:  Character sets (Harry Gaylord, University of Groningen)
TR2:  Text criticism (Robert Kraft, University of Pennsylvania)
TR3:  Hypertext and hypermedia (Steven DeRose, EBT)
TR4:  Mathematical formulae and tables (Paul Ellison, University
      of Exeter)
TR6:  Language corpora (Douglas Biber, Northern Arizona University)
AI1:  General linguistics (Terry Langendoen, University of Arizona)
AI2:  Spoken texts (Stig Johansson, University of Oslo)
AI3:  Literary studies (Paul Fortier, University of Manitoba)
AI4:  Historical studies (Daniel Greenstein, University of Glasgow)
AI5:  Machine-readable dictionaries (Robert Amsler, Mitre Corporation)
AI6:  Computational lexica (Robert Ingria, BBN)

Each group is formally assigned to one of the two major working
committees of the TEI, depending on whether its work is primarily
concerned with Text Representation (TR) or Text Analysis and
Interpretation (AI).  These two committees will then review and endorse
the findings of each work group, though we expect that for some areas we
will also seek expert outside reviewers, perhaps with the assistance of
the Advisory Board.

A number of other work group topics have already been identified, and
are in the process of being set up: these include the following:

TR5:  Newspapers
TR7:  General reference works
TR8:  Physical description of manuscripts and incunabula
TR9:  Analytic bibliography
AI7:  Terminological data

For some of these we have already identified suitably qualified members;
for others (in particular the first two)
* * * * * * * * * * * * * * * * * * * * * * * * *
* we are soliciting volunteers or nominations.  *
* * * * * * * * * * * * * * * * * * * * * * * * *
If there is an area of textual scholarship which you feel has been
unjustly neglected by the current draft, please don't hesitate to let us
know about it!  Among other areas already proposed for consideration
are

    - version control and the gradual enrichment of
      machine-readable texts
    - ephemera (tickets, matchbooks, advertising)
    - fragmentary ancient media (potshards, inscriptions etc.)
    - emblems (both isolated and libri emblematum)

A meeing was held in Oxford in early December for the heads of all
then-constituted workgroups, and some workgroups are already well
advanced in their work.  As reports become available, their existence
will be publicized on this list and elsewhere.  (You have already seen
one working paper produced by the work group on literary studies.)  In
addition, of course, we will be making a full TEI progress report at the
Tempe conference.

3.  TEI Working Documents

We are in the process of revising and making more accessible the TEI
document register at Chicago, which holds information about all
TEI-related working papers, reports and publications.  Wherever
possible, we will try to make sure that finalized reports of general
interest are posted on this ListServ in the usual way.  To find out what
is currently available, send a note to LISTSERV@UICVM containing the
line GET TEI-L FILELIST.  Specific documents can be requested in the
same way, or by contacting Wendy Plotkin (U49127@UICVM) who looks after
the register.

The one document most requested (P1 itself) is still, we regret, not
available in electronic form -- we just haven't buckled down to the task
of recoding its current rather esoteric markup.  Please bear with us!
However, the following documents are now or will soon be available (as
are others of ephemeral or less general interest -- contact Wendy
Plotkin for a full list), some tagged in TeX, some in (an extended form
of) Waterloo or IBM GML, some without explicit tags in a form designed
for reading onscreen or simple printing:

TEI PC P1 The Preparation of Text Encoding Guidelines
    (closing statement of the planning meeting in Poughkeepsie, NY,
    November 1987 -- often referred to in TEI documents as the
    "Poughkeepsie Principles")
TEI AB P1 Closing Statement of the Text Encoding Initiative Advisory
    Board Meeting, February 1989
    (just what the title says)
TEI J6 Welcome to TEI-L
TEI J10 Guide to the Structure of the TEI
    (September 1989 -- now slightly out of date, since this document
    doesn't cover the work groups described above)
TEI PO A1 List of Participating Organizations
TEI ED P1 Design Principles for Text Encoding Guidelines
    (a statement of basic design goals for the TEI)
TEI ED P3 Theoretical Stance and Resolution of Theory Conflict
    (possible outcomes in fields with competing theoretical approaches)
TEI ED W5 Tags and Features
    (a stab at a basic taxonomy of tags and textual features, with the
    specification of a database record design for a database of tags;
    rather technical, has been described as unreadable by some readers,
    as fairly useful by others)
TEI ML W13 Guidelines for TEI Use of SGML
    (virtually identical with section 2.2 of TEI P1; rather technical)
TEI ML W14 SGML Bibliography (Barnard and Cover)
    (very large bibliography of work on SGML and text encoding; will be
    available soon electronically from TEI-L and as tech report from
    Queen's University, Ontario)
TEI AI3 W4 Literature Needs Survey Results
    (responses to a survey on needs of literary scholars conducted
    by the work group for literary studies)
TEI AI3 W5 The TEI Guidelines (Version 1.1):  A Critique by the
    Literature Working Group
    (a detailed commentary on TEI P1 from the point of view of literary
    scholars)
TEI AI1 W2 List of Common Morphological Features for Inclusion in TEI
    Starter Set of Grammatical-Annotation Tags
    (list of grammatical features and the values they may take, for the
    languages of the EEC and Russian; makes no concessions for the
    non-linguist and does not discuss the mechanisms required for
    abbreviating grammatical annotation)
TEI AI1 W3 Feature System Declarations and the Interpretation of
    Feature Structures
    (technical treatment of problems arising in use of feature structures
    as defined in TEI P1 chapter 6, and proposal for a method of
    solving them with a specialized SGML document declaring the feature
    system in use.  No concessions for lack on linguistic or SGML
    knowledge.)

4.  A plea for help

We've said it before and we'll say it again: the TEI will only succeed
with the active critical participation of the community it aims to
serve.  If you have views on any of the topics addressed by the TEI we
want to hear them.  Post a note to this bulletin board, or to us
directly:  we may not respond as fully or as quickly as we might wish
to, but be sure that your comments will be taken note of and forwarded
to the appropriate technical committee or workgroup.  We are committed
to respond to and summarize all comments on our proposals, and it is a
commitment we take very seriously indeed.  (A summary of comments
received through November is in progress, as are formal replies to
them.)  At the very least, we want to hear from everyone who received a
copy of TEI P1 -- so please don't forget to complete and send in the
'User Response and Comment' form that came with your copy, if you have
one!


   Lou Burnard (LOU@VAX.OXFORD.AC.UK)
   Michael Sperberg-McQueen (U35395@UICVM.BITNET)
=========================================================================
Date:         Fri, 15 Feb 91 15:45:20 MET
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Harry Gaylord <galiard@LET.RUG.NL>
Subject:      Unicode

I have been asked by several people to say something about the implications
of the arrival of Unicode for TEI. Several useful comments in general have
appeared on Humanist, TEI-L, and 10646 about relevant issues. Yet it is
difficult to say anything succintly at this point.
  One thing is clear. No character set so far has tackled the problem of
the need to encode the lang characteristic in texts. This was already
pointed out in P1 and elsewhere. This, it seems to me, is very important
regardless of which coded character set one uses.
  There are advantages and disadvantages to both Unicode and ISO 10646 as
they are currently formulated. Hopefully they will be merged into one
ISO standard. There is no need for two multi-byte standards to be used in
different systems or even worse in single systems.
  Unicode and 10646 and the 8859 family of coded character sets have a
different understanding of what a character is how it will be used. Unicode
says nothing about the imaging of texts on a screen or printing on paper.
In a Unicode file the Greek letter alpha + IOTA SUBSCRIPT + ROUGH BREATHING
MARK + GRAVE ACCENT would be coded in 4 16-bit bytes. The software used to
image this text would have to recognize this combination of one spacing and
three non-spacing characters and put the image on your screen. ISO 10646
and the 8859 family the approach has been to have each combination as a
different coded character. Therefore this combination would be one byte in
10646. This would be a 32-bit byte if one were using the full 10646 set
or possibly 16 or 8 bits if one were using one of the compression techniques.
The software running the system with Unicode would also have to know that
since there are two accents above they have to be located differently above
the letter than if there is only one.
  On the other hand some languages have so many different combinations that
it is common practice to use "floating accents" or graphic character
combination encoding. An example of this is Hebrew which has 23 consonants
and 5 final forms. Its vowels and other signs are imaged in relation to the
consonants. If one had a coded character for each possible combination, it
would be enormous. Therefore present systems, e.g. Nota Bene SLS, and others
encode these separately. This is also true of Unicode and 10646. It is
uneconomical to do it otherwise.
  Two basic criticisms of the present proposals in 10646 are the very large
number of wasted control character positions in it, and inadequate provision
for graphic character combination encoding. In the latter there is an
appendix referring to they way this can be done under another ISO standard,
but this appendix is not a required part of the standard itself.
  The TG on character sets is in contact with Unicode and ISO with our
concerns for their work.
  We must remember that the final outcome of what is delivered is still very
uncertain. The standards have to be formulated and then hardware manufacturers
have to be convinced of the importance of them and implement them. This all
takes time. It is also important to note that the big players have people
working in the Unicode consortium and the ISO 10646 committee.
  One concern that I have is the need for representing text as it is contained
in older books and manuscripts. Neither standard as far as I can see has the
long s of English printing in earlier books. Yet we need it for many scholarly
purposes. From the standpoint of both of these standards it would be classified
as a "presentational variant" of s and be placed in a completely different
section of the character set. This is even more true of letter shapes as they
appear in manuscripts.
  There is room in each proposal for private use characters which can be used
by agreement of two or more parties. Yet the more that is included in a
standard as standard, the better off we are.
  There are currently attempts to combine the work of the Unicode consortium
and the committee for 10646. Let's hope they are successful and that the
results improve on both.
Harry Gaylord
=========================================================================
Date:         Mon, 18 Feb 91 12:16:54 MST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         lexical@NMSU.EDU
Subject:      model lexica opportunity

One of our new directives is the Consortium for
Lexical Research, in which good models of lexica
(and software to make them informative) could be
very influential.  Perhaps you'd like to help
build up and use this resource archive.  It's
sponsored by the ACL, who see the need not only
for standardization but also distribution, and
it is funded and encouraged by DARPA.


.rm CM
.DS
.DE
.nf
.vs 20
.ps 16
.B
.ce 8
The Consortium for Lexical Research


.nr PS 12
.nr VS 13
.ps 12
.vs 13
Rio Grande Research Corridor
Computing Research Laboratory
New Mexico State University
Box 30001, Las Cruces, NM 88003.

lexical@nmsu.edu
(505) 646-5466
Fax: (505) 646-6218
.R
.nr PS 11
.nr VS 13
.ps 11
.vs 13

.PP
Work in computational linguistics has reached the point where the
performance of many natural language processing systems is limited by
a "lexical bottleneck".
That is, such systems could handle much more text and produce much
more impressive application results were it not for the fact that their
lexicons are too
small.
.PP
The Association for Computational Linguistics has established the
Consortium for Lexical Research (CLR), and DARPA has agreed to fund this.
It will be sited at the Computing Research Laboratory, New Mexico, USA,
under its Director, Yorick Wilks, and an ACL committee
consisting of Roy Byrd, Ralph Grishman, Mark Liberman and
Don Walker.
.PP
The Consortium for Lexical Research will be an organization
for sharing lexical data and tools used to perform research  on  natural
language dictionaries and lexicons, and for communicating the results of
that  research.  Members of the Consortium will contribute resources
to a repository and withdraw resources from it
in order to perform their research.  There is no
requirement that  withdrawals  be  compensated  by  contributions in kind.
.PP
A basic premise of the proposal for cooperation on lexical research
is that the research
must be "precompetitive".  That is, the CLR will not
have as its goal the creation of commercial products.
The goal of precompetitive research would be to augment our
understanding of what lexicons contain and, specifically, to build
computational lexicons having those contents.
.PP
The task of the CLR is primarily to facilitate research, making
available to the whole natural language processing community certain
resources now held only by a few groups that have special
relationships with companies or dictionary publishers.
The CLR would as far as is practically possible accept contributions
from any source,
regardless of theoretical orientation, and make them
available as widely as possible for research.
.\"CHANGE new sentence above this.
There is also an underlying theoretical assumption or hope: that the contents of
major lexicons are very similar, and that some neutral, or
"polytheoretic," form of the information they contain can be
at least a research goal, and would be a great boon if it
could be achieved.
.\"CHANGE made above--L how does it look?
A major activity of the CLR will be to negotiate agreements
with "providers" on reassuring and advantageous terms to
both suppliers and researchers. Major funders of work in this area in the US
have indicated interest in making participation in the
CLR a condition for financial support of research.
An  annual  fee  will be charged for membership.
It is intended that after an initial start-up period,
the Consortium become self-supporting.
.PP
The Computing Research Lab (CRL)
already has an active research program in computational lexicons,
text processing, machine translation, etc., funded by DARPA and
NSF as well as a range of machines appropriate for advanced
computing on dictionaries.
.SH
Resources and Services of the Consortium
.PP
The following lists of
lexical data and tools seem to provide a reasonable starting content for
the repository.  We will continually solicit and encourage additions
to this list.
.ce
.LP
.B
Data
.R
.LP
1. word lists (proper nouns, count/mass nouns, causative verbs, movement verbs,
predicative adjectives, etc.)
.br
2. published dictionaries
.br
3. specialized terminology, technical glossaries, etc.
.br
4. statistical data
.br
5. synonyms, antonyms, hypernyms, pertainyms, etc.
.br
6. phrase lists
.br
.ce
.B
Tools
.R
.LP
1. lexical data base management tools
.br
2. lexical query languages
.br
3. text analysis tools (concordance, KWIC, statistical analysis,
collocation analysis, etc.)
.br
4. SGML tools (particularly tuned to dictionary encoding)
.br
5. parsers
.br
6. morphological analyzers
.br
7. user interfaces to dictionaries
.br
8. lexical workbenches
.br
9. dictionary definition sense taggers
.ce
.B
Services
.R
.PP
Repository management will involve cataloging and
storing material in disparate formats, and providing for their
retransmission (with conversion, where appropriate tools exist).
In addition, it will be
necessary to maintain a library of documentation describing the
repository's contents and containing research papers resulting from
projects that
use the material.  A brief description of the services to be provided
is as follows:
.IP a.
CRL will provide a catalog of, and act as a clearinghouse for,
utilities programs that have been written for existing online lexical data.
.IP b.
CRL will compile a list of known mistakes, misprints, etc. that
occur in each of the major published sources (dictionaries etc.).
.IP c.
CRL will set up a new memorandum series explicitly devoted
to the lexical center.
.IP d.
CRL will also be a clearinghouse for preprints and hard-to-find
reprints on machine-readable dictionaries.
.IP e.
CRL also expects to conduct workshops in this area, including an
inaugural workshop in late 1991 or early 1992.
.IP f.
CRL would provide a catalog for access to repositories of
corpus-manipulation tools held elsewhere.
.PP
We invite you to participate in the Consortium for Lexical
Research.
Anyone interested in participating \fIeven in principle\fR
as a provider or consumer of data, tools, or services should
send a message to

    \fBlexical@nmsu.edu\fR
 or
    \fBlexical@nmsu.bitnet\fR
.LP
as should anyone who would like to be on our lexical information list.
=========================================================================
Date:         Mon, 18 Feb 91 14:39:39 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         myl@COMA.ATT.COM
Subject:      out of the office from 18-25 February

I will be at the DARPA Speech and Natural Language workshop
                 Asilomar Conference Center, 408-372-8016, 7227 Fax
                 Pacific Grove, CA 93950

Your mail will be read when I return.
You can reach the Penn Linguistics department at 215-898-6046.

	Regards,

	Mark Liberman
=========================================================================
Date:         Mon, 18 Feb 91 14:42:36 -0500
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Don Walker <walker@FLASH.BELLCORE.COM>
Subject:      Away from the office from 18 to 25 February

I will be in California at the Speech and Natural Language Processing
Workshop.  For urgent matters, contact my secretary Elaine Molchan
at em@flash.bellcore.com or (+1-201)829-4594 for information on
how to reach me there.

Don Walker
=========================================================================
Date:         Tue, 19 Feb 91 11:22:51 +0100
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Timothy.Reuter@MGH.BADW-MUENCHEN.DBP.DE
Subject:      Unicode

I think it's important to think about the general aspects as well as about
whether Unicode does or does not have the variant form for this or that
letter in this or that writing system. Some general points occur to me:
a. Pace Harry Gaylord, Unicode seems to me to be biased towards display
rather than other forms of data processing. Some semantic distinctions are
observed, but roughly speaking, if things look substantially different,
even though semantically substantially the same, they get different codes
(e.g. medial and final sigma in Greek - or alphabets of Roman numerals or
letters in circles - or the various forms of cedilla at U+0300 up). If on
the other hand they look substantially the same, even though semantically
different, they may well get the same code (e.g. hacek is considered to be
identical with superscript v, and the overlaps are very acute in the
mathematical symbol area). Digraphs only get in if they are in existing
standards (German ss, Dutch ij, Slav Dz), i.e. since you can display, say,
Spanish "ch" as "c" followed by "h" there is no provision for a code to
mean "ch", though this might well be helpful in non-display contexts.
b. "Unicode makes no pretense to correlate character encoding with
collation or case" and indeed it doesn't. The basic setup (for those who
haven't seen the draft) is that the high byte is used to indicate a kind of
code page, which may contain one or more alphabets/syllabaries/symbol sets,
etc. There's no attempt to use bit fields of non-byte width within the 16
bits, except in so far as sequences within existing eight-bit standards
have done this. The difference between lc and uc can be 1, 32 or 48
(possibly others as well), while runs of letters can be interrupted by
numerals and non-letters. Previous standards play a role here, but there
seems to me to be no compelling reason if you're drawing up a 16-bit code
to say that you will take over all existing standards on the basis of
eight-bit code + fixed offset! It's an opportunity to eliminate rather than
perpetuate things which in any case only originated because of restrictions
which no longer apply.
c. Diacritics are trailing "non-spacing" separate characters (actually
they're backspacing). Diacritics modifying two letters follow the second
one. The point has already been made that you can't really do it any other
way (though in a 32-bit code you could probably do it with bit-fields).
However, trailing diacritics seem to me undesirable, because you have to
"maintain state" (something the Unicode people claim to eliminate) in any
programming you do. If you're reading a file or a string sequentially you
can't even send a character to the printer or the screen until you have
checked the one after it to make sure it's not a trailing diacritic! For
the user, the order of storage is irrelevant; for the programmer, preceding
diacritics are much easier to handle in most contexts. The slavish take-
over of existing eight-bit standards means that many diacritics are also
codable as "static" single characters - as has been pointed out, this leads
to potential ambiguities.
Diacritics apart, there seem to be conflicts of interest between different
applications, which *necessarily* lead to ambiguities or
difficulties for someone. Take the "s" problem. Harry Gaylord says he needs
long s as a code of its own; Unicode itself distinguishes between Greek
medial and final sigma, and between "s" + "s" and German "szet", on the
basis of existing standards. Any text containing these coding distinctions
can be displayed more easily and more faithfully to its original than it
can without them (though I would have thought there was no serious problem
about identifying final sigma and acting accordingly). But other kinds of
analysis become *more* difficult if such coding is used:
regular expressions involving "s" are much more difficult to construct, as
are collating and comparison sequences. This is an area where SGML-style
entities are positively advantageous, simply because they announce their
presence: if long s is always coded as &slong; in a base text, different
applications can be fed with different translations. Precisely because
Unicode puts so much emphasis on how things look rather than what they
mean, it won't eliminate the need for such "kludges", as someone on
HUMANIST thought it would.

Timothy Reuter, Monumenta Germaniae Historica, Munich
=========================================================================
Date:         Tue, 19 Feb 91 02:27:57 PST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Rindfleisch@SUMEX-AIM.STANFORD.EDU
Subject:      Away from my Mail

I will be gone and not reading my mail until Sunday, February 24.  Your
message regarding "     Unicode" will be read when I return.

If your message concerns something urgent, please contact Monica Wong
(Wong@SUMEX-AIM) or phone my office at (415) 723-5569.

Tom R.
=========================================================================
Date:         Tue, 19 Feb 91 08:38:29 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "Robert A. Amsler" <amsler@STARBASE.MITRE.ORG>
Subject:      Presentation vs. Descriptive CHARACTER Markup

Timothy Reuter's note on UNICODE suggests that we ought to be careful
that the same guidelines that have led the TEI to select descriptive
markup for text not be abandoned when we get to characters.

The TEI's concern should first and foremost be whether a character
representation represents the meaning of the characters to the authors,
and not their presentation format. Likewise, this also means that
how the representation is achieved is rather irrelevant to whether
or not the markup captures the meaning of the character.

I think it worth noting that to me there seems to be a need for
two standards for characters. One to represent their meaning, the
other to represent their print images. The print image representation
has a LOT of things to take into account, and may in fact only
be possible in some form such as the famous "Hersey fonts" released
long ago by the US National Bureau of Standards. That is, the
print image on characters and symbols may have to be accompanied by
representations as bit maps or equations as to how to draw the
characters within a specified rectangular block of space.

Within the descriptive markup, there clearly are enough problems
to solve without adding the burden of achieving consistent print
representations on all display devices. For example, one descriptive
issue is that of whether the representation is adequate for
spoken or only written forms of the language.

While the TEI has addressed the concerns of researchers in linguistics
dealing with speech, there exists a need to address the concerns of
ordinary text users concerned with the representation of information
about indicating spoken language information in printed form.
Some of this is a bit arcane, such as how to represent text dialogues
to be spoken with a foreign accent, but representing EMPHASIS is a continual
issue and emphasis can descend to the characteristics of individual
letters.
=========================================================================
Date:         Tue, 19 Feb 91 15:06:49 MET
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "E. van Konijnenburg" <konijn@AND.NL>
Subject:      Re: model lexica opportunity
In-Reply-To:  <B8B36C99FC7F20339A@HEARNVAX.nic.SURFnet.nl>; from
              "lexical@NMSU.EDU" at Feb 18, 91 12:16 pm

Hi.

Please include me in your information list.

Regards, Erik


AND Software bv -------------------------------------------------------
Attn. E. van Konijnenburg
Westersingel 108				Tel: +31 10 4367100
3015 LD  ROTTERDAM				Fax: +31 10 4367110
The Netherlands					Email: <konijn@and.nl>
=========================================================================
Date:         Tue, 19 Feb 91 16:15:56 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         "Robin C. Cover" <ZRCC1001@SMUVM1.BITNET>
Subject:      CHAR ENCODING AND TEXT PROCESSING

A propos of recent comments by Timothy Reuter and Robert A. Amsler on the
relationship between character encodings and (optimized) text processing,
two notes:

(1) Timothy writes that "Unicode seems to me to be biased towards display
rather than other forms of data processing."   We note that UNICODE indeed
does contain algorithms for formatting right-to-left text and bi-directional
text, but (as far as I know) it has no general support for indicating the
language in which a text occurs.

(2) On the matter of separating "form and function" (various two-level
distinctions germane to character encoding and writing systems: character
and graph; graph and image; language and script; writing system and script),
the following article by Gary Simons may be of interest.  (I do not know if
it represents his current thinking in every detail.)

Gary F. Simons, "The Computational Complexity of Writing Systems."
Pp. 538-553 in _The Fifteenth LACUS Forum 1988_ (edited by Ruth M. Brend and
David G. Lockwood). Lake Bluff, IL: Linguistic Association of Canada and the
United States, 1989.

<abstract>In this article the author argues that computer systems, like
their users, need to be multilingual. ``We need computers, operating
systems, and programs that can potentially work in any language and can
simultaneously work with many languages at the same time." The article
proposes a conceptual framework for achieving this goal.

Section 1, ``Establishing the baseline," focuses on the problem of graphic
rendering and illustrates the range of phenomena which an adequate solution
to computational rendering of writing systems must account for. These
include phenomena like nonsequential rendering, movable diacritics,
positional variants, ligatures, conjuncts, and kerning.

Section 2, ``A general solution to the complexities of character rendering,"
proposes a general solution to the rendering of scripts that can be printed
typographically. (The computational rendering of calligraphic scripts adds
further complexities which are not addressed.) The author first argues that
the proper modeling of writing systems requires a two-level system in which
a functional level is distinguished from a formal level. The functional
level is the domain of characters (which represent the underlying
information  units of the writing system). The formal level is the domain of
graphs (which represent the distinct graphic signs which appear on the
surface). The claim is then made that all the phenomena described in section
1 can be handled by mapping from characters to graphs via finite-state
transducers -- simple machines guaranteed to produce results in linear time.
A brief example using the Greek writing system is given.

Section 3, ``Toward a conceptual model for multilingual computing," goes
beyond graphic rendering to consider the requirements of a system that would
adequately deal with other language-specific issues like keyboarding,
sorting, transliteration, hyphenation, and the like. The author observes
that every piece of textual data stored in a computer is expressed in a
particular language, and it is the identity of that language which
determines how the data should be rendered, keyboarded, sorted, and so on.
He thus argues that a rendering-centered approach which simply develops a
universal character set for all languages will not solve the problem of
multilingual computing. Using examples from the world's languages, he goes
on to define language, script, and writing system as distinct concepts and
argues that a complete system for multilingual computing must model all
three.</abstract>

<note>Availability: Offprints of this article are available from the author
at the following Internet address: gary@txsil.lonestar.org. The volume is
available from LACUS, P.O. Box 101, Lake Bluff, IL  60044.</note>

Robin Cover
BITNET: zrcc1001@smuvm1
INTERNET: robin@ling.uta.edu
INTERNET: robin@txsil.lonestar.org
=========================================================================
Date:         Tue, 19 Feb 91 19:15:04 PST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Ken Whistler <whistler@ZARASUN.METAPHOR.COM>
Subject:      Re: CHAR ENCODING AND TEXT PROCESSING

Dear Mr. Cover,

I would like to respond to your recent note, and the implications of
the abstract you have made from Gary Simon's article.  (In this I
am speaking personally, and my opinions do not necessarily
represent those of the Unicode Technical Committee.)

First of all, I want to make it clear that Unicode is not, nor does
it purport to be, a text description language.  It is a character
encoding.  We need to code the LATIN CAPITAL LETTER A and the
ARABIC LETTER ALEF and the DEVANAGARI LETTER A in order for any
text to be encoded, and for any textual process to be programmed
to operate on that text.  However, assigning 16-bit values to
those characters (0041, 0627, and 0905, respectively) does not,
ipso facto, specify whether the LATIN CAPITAL LETTER A is being
used in an English, Czech, or Rarotongan text, or the ARABIC
LETTER ALEF in Arabic, Sindhi, or Malay, or the DEVANAGARI LETTER A
in Hindi or Nepali.  Trying to mix the character encoding with
specification of textual language is guaranteed to mess up the
character encoding; the appropriate place to handle this is at
a metalevel of text/document description above the level of
the character encoding.

On the other hand, the bidirectional text problem is specifiable
independent of any particular language--or even script, for that
matter, since the generic problem is the same for Hebrew as it
is for Arabic (scripts).  The fundamental reason why Unicode is
going to great lengths to include a bidirectional plain text model
is that without an explicit statement of how to do this, the
content of texts which contain both left-to-right and right-to-left
scripts mixed can be compromised or corrupted when such texts
are interchanged.  If we do not come down squarely in favor of
an implicit model (or an explicit model with direction-changing
controls, or a visual order model), then bidirectional Unitext will
regularly get scrambled, and no one will know how to interpret a
number embedded in bidi text, etc., etc.

Regarding form/function distinctions, I think you are preaching to
the converted.  I do not think you will be able to find another
multilingual character encoding of this scope which has been
developed with such a meticulous attention to the distinctions you
mention:

	character vs. glyph (i.e. "graph" as you quote Simons)

We have been educating people about this for years.  Granted, there
are glyphs encoded as characters in Unicode, too, but the main
reason they got there is because Unicode has to be interconvertible
to a lot of other "character" standards which couldn't distinguish
the two.  And why does Unicode have to be interconvertible?  A) Because
that is the only way to get it accepted and move into the future,
and B) Because that serves the purpose of creating better software
to handle text processing requirements for preexisting data.

	glyph vs. image

Also clearly distinguished amongst our discussions.  I think
Unicoders are supportive of the concept of proceeding to develop
a definitive registry of glyphs.  This would be most helpful to
font foundaries and font vendors, but also would help the software
makers in performing the correct operations to map characters
(in particular language and script contexts) into glyphs for
rendering as images.  But registry of glyphs is a different task
from encoding of characters.  For one thing, the universe of glyphs is
much larger than the universe of characters.  Unicode 1.0 is aimed
at completing the character encoding as expeditiously and
correctly as possible, rather than at taking on the larger glyph registry
problem.

	language vs. script

Also clearly distinguished.  Unicode characters, taken by blocks,
can be assigned to scripts.  Hence the characters from 0980 to
09F9 are all part of the Bengali script.  But no one is confusing
that with the fact that some subset of those is used in writing
the Bengali language and another subset in writing Assamese.

	script vs. writing system

Again, I think you will find us sympathetic and non unaware of the
distinctions involved.  For example, most of us have worked on
or are currently working on implementations of the Japanese writing system for
one product or another on computer.  Anyone with a smattering of
knowledge of Japanese knows that the writing system is a complicated
mix of two syllabaries, Han characters (kanji), and an adapted
form of European scripts which can be rendered either horizontally
or rotated for vertical rendering.  It is a complicated writing
system which is difficult to implement properly on computer--but
that is a separate issue from how to encode the characters.

You quote Gary Simons as stating that: "We need computers,
operating systems, and programs that can potentially work in any
language and can simultaneously work with many languages at the
same time."  I can guarantee you that this is the passionate
concern of those who have been working on Unicode for the last
two years.  It is precisely because the character encoding
alternatives (ISO 2022, ISO DIS 10646, various incomplete
corporate multilingual sets, and font-based encodings which
confuse characters and font-glyphs) are so dismal that we have
worked so hard to design a multilingual character set with the
correct attributes for support of multilingual operating
systems, multilingual applications, multilingual text interchange
and email, multilingual displays and printers, multilingual
input schemes, and yes, multilingual text processing.

Don't expect the holy grail by Tuesday, but if we really think
all those things are worth aiming for, it is vitally important
that those who build the operating systems, the networks,
the low-level software components, and the high-level applications
reach a reasonably firm consensus about the character encoding
now.

--Ken Whistler
=========================================================================
Date:         Tue, 19 Feb 91 20:52:12 PST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Ken Whistler <whistler@ZARASUN.METAPHOR.COM>
Subject:      Re:  Unicode

Dear Mr. Reuter,

I addressed some of your concerns in my reply to Robin Cover, but I
would like to respond to a few of the specific points which you have
raised.  (Disclaimer:  These are personal opinions, and do not
necessarily reflect the position of the Unicode Technical Committee.)

Regarding your point a., that Unicode seems biased towards display
rather than other forms of data processing.  First of all, you
must understand that Unicode has been visited with the sins of
our fathers.  The medial and final sigma are already distinguished
in the Greek standard.  We cannot unify them without Hellenic
catastrophe.  (In fact the Classicists inform us that there are
good reasons why we must introduce a third sigma, the "lunate
sigma", in order to have a correct and complete encoding.)  Nobody
likes the Roman numerals, or the parenthesized letters, or the
squared Roman abbreviations, ... The general reaction has been
Sheesh!  But important Chinese, Japanese, and Korean standards which
have to be interconvertible with Unicode have already encoded
such stuff, and we are stuck with it.  Why?  Because the design
goal of a perfect, de novo, consistent, and principled character
encoding is unattainable (believe me, we tried), and because the
higher goal of attaining a usable, implementable, and
well-engineered character encoding in a finite time is greatly
furthered by including as much as possible of the preexisting
character encoding standards.

You also noted that the semantic overlaps are very acute in the
mathematical symbol area.  Nobody can tell us how many distinct
semantic usages there are for "tilde", for example.  Should we
encode 1, 3, 7, 16 of them??  We made what I think is the best
compromise we could under the circumstances.  The TILDE OPERATOR
is encoded as a math operator (distinct from accents, whether
spacing or non-spacing), but no further attempt is made to separate
all the possible semantics applicable.  Note that if we start
trying to distinguish "difference" from "varies with" from "similar",
from "negation", etc., we would be forcing applications (and users)
to encode the correct semantic--even when they don't know or
can't distinguish them.  This has the potential for being
WORSE for text processing, rather than better.  Over-differentiation
in encoding is just as bad as under-differentiation.

I don't understand your concern about not distinguishing hacek and
superscript v.  Unicode does not encode superscript v at all.
Except for those superscripts grandfathered in from other standards
(remember the sins of our [grand]fathers), superscript variants
of letters are considered rendering forms outside the scope of
Unicode altogether.  If someone uses a font which has hacek
rendered in a form which looks like a superscript v, that is a
separate issue.  From a Unicode point of view that would simply
be mapping the character HACEK onto the glyph {LATIN SMALL V} in
some particular typeface for rendering above some other glyph.
A font vendor could do that.  It might even be the correct thing
to do, for example, in building a paleographic font for manuscript
typesetting.

Regarding your b. item concerns about the layout of Unicode:  First
of all, I am sensitive about your using the term "code page" in
referring to the Unicode charts.  "Code page" is properly
applied to 8-bit (or to some double 8-bit) encodings which can
be "swapped in" or "swapped out" to change the interpretation of
a particular numeric value as a character.  Unicode values are
fixed, unambiguous, and unswappable for anything else.  The charts
are simply a convenient packaging unit for human visual consumption
and education.  The fact that we tried to align new scripts with
high byte boundaries resulted from the implementation requirement
that software have easy and quick tests for script identity.

The subordering within script blocks does attempt to follow
existing standards, where feasible.  We tried the alternative of
simply enumerating all the characters in a script and then packing
them in next to each other in what would pass for the "best"
alphabetic order, but that introduces other problems AND makes
the relevant "owners" of that script gag at the introduction of
a layout unfamiliar to them.  In the end all such processes such
as case folding, sorting, parsing, rendering, etc. depend on
table lookup of attributes and properties.  There is no hard-coded
shortcut which will always work--even for 7-bit ASCII.  The
compromise which pleased the most competing interests (and which,
by the way, got us to a conclusion on this issue) was to follow
national standards orders as applicable.  You might note that the
one REALLY BIG case where we have to depart from this is in
unifying 18,000+ Han characters.  The only way to do this is to
depart from ALL of the Asian standards--so nobody can convert
from a Chinese, Japanese, or Korean standard to Unicode by a
fixed offset!  Believe me, that has occasioned much more grumbling
(to put it mildly) than any ordering issue for Greek or Cyrillic!

Concerning your point c., about diacritics being specified as
following a baseform rather than preceding it:  Clearly we had
to come down on one side or the other.  Not specifying it would
be disastrous.  So we made a choice.  Granted, that having diacritics
follow rather than precede baseforms favors rendering algorithms
over parsing algorithms.  To have made the opposite choice would
have reversed the polarity of benefits and costs.  It is a tradeoff with no
absolutely right answer.  Nevertheless, I think the choice made was
the correct one.

First, the rendering involved is not really as you have characterized
it.  "Non-spacing" diacritics are NOT backspacing.  Such terminology
is more properly applied to spacing diacritics (such as coded in
ISO 8859-1 or ISO DIS 10646), which for proper rendering use require the
sending of a BACKSPACE control code between a baseform and an accent.
That's the way composite characters used to be printed on daisy-wheel
printers, for example.  But that is a defective
rendering model which ignores the complex typographical relationship
between baseforms and diacritics.  The kind of rendering model we
are talking about involves "smart" fonts with kerning pair tables.
The "printhead" is not trundled back so that an accent can be
overstruck; instead, a diacritic "draws itself" appropriately, in
whatever medium, on a baseform in context.  The technology for
doing this is fairly well-understand but quite complex.  I think
it would be fair to say that if I were writing a text processing
program (and I have), I would rather have system support for such
rendering and deal with the look-ahead problem than have to deal
with font rendering problems in my program.

Second, the "state" that has to be maintained in parsing diacritics
is quite different from the "state" that Unicode claims to eliminate.
Parse states have to be maintained for all kinds of things.  If I
am parsing Unicode which uses non-spacing diacritics, then I have
to maintain a parse state to identify text elements; but even parsing
for word boundaries, for example (an elementary operation in editing)
has to maintain state to find boundaries which may depend on
combinations of punctuation, or on ambiguous interpretation of some
characters which can only be disambiguated in context, etc., etc.
More complicated parsing often maintains elaborate parse trees
with multiple states.  The "statefulness" that Unicode is trying to
eliminate is a state in which the interpretation of the bit
pattern for a character changes, depending on which state you are
in.  This is the "code page sickness", where one time the 94 means
"o-umlaut", and next time it means "i-circumflex", and next time it
means "partial differential symbol", depending on what code page you
are using, and what code page shift state you happen to be in.
The two-byte encodings currently are horrible in this respect, since
they may mix single-byte and two-byte interpretations in ways which
may mean that figuring out what a particular byte is supposed to
represent in any random location can be very difficult.  You have to
find an anchor position from which you can parse sequentially,
maintaining state, until you get to the byte in question to find out
what it means.  Unicode eliminates THAT kind of state maintenance.

I find myself agreeing with your statement that "there seem to be
conflicts of interest between different applications, which
*necessarily* lead to ambiguities or difficulties for someone."
The way I would put it, following a distinction made elegantly
by Joe Becker, is that there is no way that any encoding of
CODE ELEMENTS (i.e. the "characters" assigned numbers in Unicode)
will automatically result in one-to-one mappability to all the
TEXT ELEMENTS which might ever be of interest to anyone or have
to be processed as units for one application or another.  Your
mention of "ch" as a collation unit for Spanish is one obvious
example.  Fixing the CODE ELEMENTS of Unicode should not preclude
efforts to identify appropriate TEXT ELEMENTS for various
processes.  Such TEXT ELEMENTS will have to be identified as to
their appropriate domain of application--and that does include
language as well as other factors.  But it is not the job of
the character encoding to do that work.  The character encoding
should be designed so as not to impede TEXT ELEMENT identification
and processing--for example, it would be crazy to refuse to encode
LATIN LETTER SMALL I because it could be composed of a dotless-i
baseform and a non-spacing dot over!  But character encoding cannot
BE the TEXT ELEMENT encoding, however much we might desire
a simpler world to work with.

--Ken Whistler
=========================================================================
Date:         Wed, 20 Feb 91 15:31:17 -0500
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Katharina Klemperer <kathy@BAKER.DARTMOUTH.EDU>
Subject:      sgml editors

I would like to know if anyone has any experience with Macintosh SGML
editors.  By this I mean a "word processor" that assists in the insertion
of SGML tags into a document.

I saw Author/Editor, from SoftQuad, Inc., demonstrated a couple of years
ago, and it looked nice, but I would like to know if there are additional
similar products in the marketplace, and what experiences people have had
with them.

Kathy Klemperer
Dartmouth College Library
=========================================================================
Date:         Wed, 20 Feb 91 15:40:37 -0500
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Katharina Klemperer <kathy@BAKER.DARTMOUTH.EDU>
Subject:      sgml editors

I would like to know if anyone has any experience with Macintosh SGML
editors.  By this I mean a "word processor" that assists in the insertion
of SGML tags into a document.

I saw Author/Editor, from SoftQuad, Inc., demonstrated a couple of years
ago, and it looked nice, but I would like to know if there are additional
similar products in the marketplace, and what experiences people have had
with them.

Kathy Klemperer
Dartmouth College Library
=========================================================================
Date:         Thu, 21 Feb 91 07:37:39 CST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         FEEM@QUCDN.BITNET
Subject:      mailing list

 Subject: mailing list

 Please exclude my name from your mailing list.  Thank you.

 M. Fee, Strathy Language Unit, Queen's University
 Fleming Hall, Room 206, Kingston, Ont. (613) 545-2152
 FEEM@Qucdn
=========================================================================
Date:         Thu, 21 Feb 91 10:42:27 PST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Lynne_Price.PARC@XEROX.COM
Subject:      Re: sgml editors
In-Reply-To:  <91Feb20.123652pst.16169@alpha.xerox.com>

Kathy,
   Another SGML editor on the Mac is CheckMark from
	Software Exoterica
	383 Parkdale Ave. Suite 406
	Ottawa, Ontario
	Canada K1Y 4R4
	(613) 722-1700
I have used CheckMark fairly entensively and found it a valuable tool.  I
 believe it supports a richer subset of the optional SGML features than does
 Author/Editor.  In particular, it supports all markup minimization features.
 It can convert all or part of a document that uses minimization to a normalized
 form that does not.  Furthermore, CheckMark can continue checking for
 additional SGML errors whether or not the user fixes the first problem
 detected.  It has a special scroll bar for indicating where errors occur and
 allows the user to decide when to repair them.   However, CheckMark is not a
 word processor or document formatter.  It creates an SGML document, but has no
 provisions for displaying a formatted version of the document--for instance, it
 can9t center or italicize certain elements.
			--Lynne Price
=========================================================================
Date:         Mon, 25 Feb 91 20:42:24 EST
Reply-To:     Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
Sender:       Text Encoding Initiative public discussion list
              <TEI-L@UICVM.BITNET>
From:         Brian <MERRILEE@VM.EPAS.UTORONTO.CA>
Subject:      Re: mailing list
In-Reply-To:  Message of Thu, 21 Feb 91 07:37:39 CST from <FEEM@QUCDN>

Please take my name off the list. David Megginson will keep in touch on my
behalf.  Brian Merrilees, University of Toronto, MERRILEE@vm.epas.utoronto.ca>