This chapter describes elements which may appear in any kind of text
and the tags used to mark them in all TEI documents. Most of these
elements are freely floating phrases, which can appear at any point
within the textual structure, although they must generally be contained
by a higher-level element of some kind (such as a paragraph). A few of
the elements described in this chapter (for example, bibliographic
citations and lists) have a comparatively well-defined internal
structure, but most of them have no consistent inner structure of their
own. In the general case, they contain only a few words, and are often
identifiable in a conventionally printed text by the use of typographic
conventions such as shifts of font, use of quotation or other
punctuation marks, or other changes in layout.
To use the terminology introduced in section ,
most of the elements described in this chapter are members of the class
phrase, and a small number are members of the
classes chunk or
inter.
This chapter begins by describing the p tag used to mark
paragraphs, which serve as the fundamental formal unit for running text
in many base tag sets, and are available in all. This is followed, in
section , by a discussion of some specific problems
associated with the interpretation of conventional punctuation, and the
methods proposed by the current Guidelines for resolving ambiguities
therein.
The next section (section ) describes a number of
phrase-level elements commonly marked by typographic features (and thus
well-represented in conventional markup languages). These include
features commonly marked by font shifts (section ) and features commonly marked by quotation marks (section
) as well as such features as terms, cited
words, and glosses (section ).
The next section (section ) describes several
phrase-level and inter-level elements which, although often of interest
for analysis or processing, are rarely explicitly identified in
conventional printing. These include names (section ), numbers and measures (section ), dates and times (section ), abbreviations (section ),
and addresses (section ).
Section introduces some phrase-level elements
which may be used to record simple editorial emendation or correction of
the encoded text. The tags described here constitute a simple subset of
the full mechanisms for encoding such information (described in full in
chapter ), which should be adequate to most commonly
encountered situations.
In the same way, the following section (section )
presents only a subset of the facilities available for the encoding
of cross-references or text-linkage. The full story may be found in
chapter ; the tags presented here are intended to
be usable for a wide variety of simple applications.
Sections , and , describe two
kinds of quasi-structural elements, lists and notes, which may appear
either within chunk-level elements such as paragraphs, or between them.
Several kinds of lists are catered for, of an arbitrary complexity. The
section on notes discusses both notes found in the source and simple
mechanisms for adding annotations of an interpretive nature during
the encoding; again, only a subset of the facilities described in full
elsewhere (specifically, in chapter ) is discussed.
Next, section , describes methods of
encoding within a text the conventional system or systems used when
making references to the text. Some reference systems have attained
canonical authority and must be recorded to make the text useable in
normal work; in other cases, a convenient reference system must be
created by the creator or analyst of an electronic text.
Like lists and notes, the bibliographic citations discussed in
section , may be regarded as structural elements in
their own right. A range of possibilities is presented for the encoding
of bibliographic citations or references, which may be treated as
simple phrases within a running text, or as highly-structured
components suitable for inclusion in a bibliographic database.
Additional elements for the encoding of passages of verse or drama
(whether prose or verse) are discussed in section .
The chapter concludes with a technical overview of the structure and
organization of the tag set described here. This should be read in
conjunction with chapter , describing the structure of
the TEI document type definition.
Paragraphs
The paragraph is the fundamental organizational unit for all prose
texts, being the smallest regular unit into which prose can be divided.
Prose can appear in all TEI texts, not simply in those using the prose
base (section ); the paragraph is therefore described
here, as an element which can appear in any kind of text.
Paragraphs can contain any of the other elements described within
this chapter, as well as some other elements which are specific to
individual text types. We distinguish phrase-level
elements, which must be entirely contained within a paragraph and cannot
appear except within one, from chunks, which can appear
between, but not within, paragraphs, and from inter-level
elements, which can appear either within a single paragraph or between
paragraphs. The class of phrases includes emphasized or quoted phrases,
names, dates, etc. The class of inter-level elements includes
bibliographic citations, notes, lists, etc. The class of chunks
includes the paragraph itself.
Because paragraphs may appear in different base or additional tag
sets, their possible contents may differ in different kinds of
documents. In particular, additional elements not listed in this
chapter may appear in paragraphs in certain kinds of text. However, the
elements described in this chapter are always by default available in
all kinds of text.
The paragraph is marked using the p element:
marks paragraphs in prose.
If a consistent internal subdivision of paragraphs is desired, the
s or seg (segment) elements may
be used, as discussed in chapters and
respectively. More usually, however, paragraphs have no firm internal
structure, but contain prose encoded as a mix of characters, entity
references, phrases marked as described in the rest of this chapter, and
embedded elements like lists, figures, or tables.
Since paragraphs are usually explicitly marked in Western texts,
typically by indentation, the application of the p tag
usually presents few problems.
In some cases, the body of a text may comprise but a single
paragraph:
I fully appreciate Gen. Pope's splendid achievements with their
invaluable results; but you must know that Major Generalships in the
Regular Army, are not as plenty as blackberries.