Base Tag Set for Verse

This base tag set is intended for use when encoding texts which are entirely or predominantly in verse, and for which the elements for encoding verse structure already provided by the core tag set are inadequate.

The tags described in section include elements for the encoding of verse lines and line groups such as stanzas: these are available for any TEI document, irrespective of the base tag set it uses. Like the base tag sets for prose and for drama, the base tag set for verse additionally makes use of the tag set defined in chapter to define the basic formal structure of a text, in terms of front, body and back elements and the text-division elements into which these may be subdivided.

The base tag set for verse extends the facilities provided by these two tag sets in the following ways: numbered lg elements are provided, by analogy with the numbered divn class elements discussed in section (see section ) a special purpose caesura element is provided, to allow for segmentation of the verse line (see section ) a set of attributes is provided for the encoding of rhyme scheme and metrical information (see sections and )

To enable the base tag set for verse texts, a parameter entity TEI.verse must be declared within the document type subset, the value of which is INCLUDE, as further described in section . A document using this base tag set and no additional tag sets will thus begin as follows: ]> ]]> This declaration makes available the elements and attributes described in this chapter, in addition to those described in chapters and . Structure of the Base Tag Set for Verse

The base tag set for verse contains the following entity declarations: ]]>

The base tag set for verse contains the following element declarations: %TEI.structure.dtd; ]]&nil;> ]]> Structural Divisions of Verse Texts

Like other kinds of text, texts written in verse may be of widely differing lengths and structures. A complete poem, no matter how short, may be treated as a free-standing text, and encoded in the same way as a distinct prose text. A group of poems functioning as a single unit may be encoded either as a group or as a text, depending on the encoder's view of the text. For further discussion, including an example encoding for a verse anthology, see chapter .

Many poems consist only of ungrouped lines. This short poem by Emily Dickinson is a simple case: 1755 To make a prairie it takes a clover and one bee, One clover, and a bee, And revery. The revery alone will do, If bees are few. ]]>

Often, however, lines are grouped, formally or informally, into stanzas, verse paragraphs, etc. The lg element defined in the core tag set (in section ) may be used for all such groupings. It may thus serve for informal groupings of lines such as those of the following example from Allen Ginsberg: My Alba Now that I've wasted five years in Manhattan life decaying talent a blank talking disconnected patient and mental sliderule and number machine on a desk ]]>

It may also be used to mark the verse paragraphs into which longer poems are often divided, as in the following example from Samuel Taylor Coleridge's Frost at Midnight: The Frost performs its secret ministry, Unhelped by any wind.... Whose puny flaps and freaks the idling Spirit By its own moods interprets, every where Echo or mirror seeking of itself, And makes a toy of Thought. But O! how oft, How oft, at school, with most believing mind Presageful, have I gazed upon the bars, To watch that fluttering stranger! ... Dear Babe, that sleepest cradled by my side, ]]> Note, in the above example, the use of the part attribute on the l element, where a verse line is broken between two line groups, as discussed in section . (Note also that here and in some other examples in this chapter the end-tags for the l element, being redundant, have been omitted, as is allowed when the SGML feature OMITTAG is enabled.)

Most typically, however, the lg element is used to mark the highly regular line groups which characterize stanzaic and similar verse forms, as in the following example from Chaucer: Sire Thopas was a doghty swayn; White was his face as payndemayn, His lippes rede as rose; His rode is lyk scarlet in grayn, And I yow telle in good certayn, He hadde a semely nose. His heer, his ber was lyk saffroun, That to his girdel raughte adoun; ]]>

Like other text-division elements, lg elements may be nested hierarchically. For example, one particularly common English stanzaic form consists of a quatrain or sestet followed by a couplet. The lg element may be used to encode both the stanza and its components, as in the following example from Byron: In the first year of Freedom's second dawn Died George the Third; although no tyrant, one Who shielded tyrants, till each sense withdrawn Left him nor mental nor external sun: A better farmer ne'er brushed dew from lawn, A worse king never left a realm undone! He died ‐ but left his subjects still behind, One half as mad ‐ and t'other no less blind. ]]>

Note the use of the type attribute to name the type of unit encoded by the lg element; this attribute is common to all members of the divn class (see section ).For discussion of other attributes of this class, see . Sestet and couplet might conceivably also be used as the values of the met attribute in a metrical analysis, for which see below, section . The type attribute is intended solely for conventional names of different classes of text block; the met attribute is intended for systematic metrical analysis.

The above example uses un-numbered line groups which can nest within each other to any depth. When the base tag set for verse is in use, numbered line groups may also be used as an alternative. contains a first-level (i.e. largest) group of verse lines functioning as a formal unit e.g. a stanza, refrain, verse paragraph, etc. contains a second-level (i.e. second largest) group of verse lines functioning as a formal unit e.g. a stanza, refrain, verse paragraph, etc. The base tag set for verse defines up to five such numbered line group elements, lg1 to lg5 inclusive. These function in exactly the same way as the numbered divn class elements discussed in section .

As an example of their use, consider the Shakespearean sonnet. This may be divided into two parts: a concluding couplet, and a body of twelve lines, itself subdivided into three quatrains: My Mistres eyes are nothing like the Sunne, Currall is farre more red, then her lips red If snow be white, why then her brests are dun: If haires be wiers, black wiers grown on her head: I have seene Roses damaskt, red and white, But no such Roses see I in her cheekes, And in some perfumes is there more delight, Then in the breath that from my Mistres reekes. I love to heare her speake, yet well I know, That Musicke hath a farre more pleasing sound: I graunt I never saw a goddesse goe, My Mistres when shee walkes treads on the ground. And yet by heaven I think my love as rare, As any she beli'd with false compare. ]]>

Particularly lengthy poetic texts are often subdivided into units larger than stanzas or paragraphs, which may themselves be subdivided. Spenser's Faery Queene, for example, consists of twelve books each of which contains a prologue followed twelve cantos. Each prologue and each canto consists of nine-line stanzas, each of which follows the same regular pattern. Other examples in the same tradition are easy to find.

Large structures of this kind are most conveniently represented by the divn class elements such as div or div1 described in section . Thus the start of the Faery Queene might be encoded as follows: A noble knight was pricking on the plain Ycladd in mightie armes and silver shielde, ]]> The encoder must choose at which point in the hierarchy of structural units to introduce lg elements rather than a yet smaller div element: it would (for example) also be possible to encode the above example as follows: A noble knight was pricking on the plain Ycladd in mightie armes and silver shielde, ]]>

One reason for preferring the former version is that not all of Spenser's stanzaic verse is organized into both cantos and books. In a corpus containing other works as well as the Faery Queene, it would be inconvenient for stanzas to appear in one part as div3 elements, and in another as div2 elements.Another way of avoiding this problem would be to use un-numbered div elements; see further .

The numbered line group elements have the following formal definition: ]]> Components of the Verse Line

It is often convenient for various kinds of analysis to encode subdivisions of verse lines. The general purpose seg element defined in the tag set for segmentation and alignment (section ) is provided for this purpose: contains any arbitrary phrase-level unit of text (including other seg elements). Attributes include: specifies whether or not the segment is complete. Legal values are: the segment is incomplete either the segment is complete, or no claim is made as to its completeness the initial part of an incomplete segment a medial part of an incomplete segment the final part of an incomplete segment

To use this element together with the base tag set for verse, the tag set for segmentation and alignment must also be enabled, by an additional declaration in the document type subset, as further described in section . A document using the base tag set for verse and this additional tag sets will thus begin as follows: ]> ]]>

In Old and Middle English alliterative verse, individual verse lines are typically split into half lines. The seg element may be used to mark these explicitly, as in the following example from Langland's Piers Plowman: In a somer seson, whan softe was the sonne, I shoop me into shroudes as I a sheep were, In habite as an heremite unholy of werkes, Went wide in this world wondres to here. ]]>

The seg element can be nested hierarchically, in the same way as the lg element, down to whatever level of detailed structure is required. In the following example, the line has been divided into feet, each of which has been further subdivided into into syllables. (The eccentric formatting ensures that no spaces are introduced within words unless they are within SGML tags and thus not significant.) Arma virumque cano Troiae qui primus ab oris ]]>

The seg element may be used to identify any subcomponent of a line which has content; its type attribute may characterize such units in any way appropriate to the needs of the encoder. For the specific case of labeling each foot with its formal type (dactyl, spondee, etc.), and each syllable with its metrical or prosodic status (syllables bearing primary or secondary stress, long syllables, short syllables), however, the specialized attributes met and real are defined, which provide a more systematic framework than the type attribute; see section below.

In classical verse, a hexameter like that above may also be formally divided into two hemistiches. This example provides an interesting case, in that the boundary of the first hemistich falls in the middle of one of the feet (between the syllables no and Tro). If both kinds of segmentation are required, the part attribute might be used to mark the overlapping structure as follows. (This example uses the slightly artificial convention that word space is included within the syllable segments, and all other white space is to be ignored; though convenient for visual clarity in presentations like this one, this convention is not recommended for normal use.) Ar ma vi rum que ca no Tro iae qui ]]>

Numbered seg elements may also be used; these have the disadvantage---or advantage---of requiring a consistent, layer-by-layer analysis, and the advantage of providing a useful default for the type attribute, which must be specified only once for each level, as well as allowing the omission of many end-tags, which makes the presentation more compact. Arma vi rumque ca no Tro iae qui ]]>

Instead of using the part attribute on the seg2 element, it might be simpler just to mark the point at which the caesura occurs. An additional element is provided for analyses of this kind, in which what is to be marked are points between the words, which have some significance within a verse line: marks the point at which a metrical line may be divided. In classical prosody, the caesura, which occurs within a foot, is distinguished from a diaeresis, which occurs on a foot boundary (not to be confused with the division of a diphthong into two syllables, or the diacritic symbol used to indicate such division, each of which is also termed diaeresis). This distinction is rarely made nowadays, the term caesura being used for any division irrespective of foot boundaries. No special-purpose diaeresis element is therefore provided.

As an example of the caesura element, we refer again to the example from Langland. An encoder might choose simply to record the location of the caesura within each line, rather than encoding each half-line as a segment in its own right, as follows: In a somer seson, whan softe was the sonne, I shoop me into shroudes as I a sheep were, In habite as an heremite unholy of werkes, Went wide in this world wondres to here. ]]>

Logically, the opposite of caesura might be considered to be enjambement. The base tag set for verse defines an enjamb attribute for the l element, which alllows the presence or absence of enjambement to be signaled. The following lines demonstrate the use of the enjamb attribute to mark places where there is a discrepancy between the boundaries of the l elements and the syntactic structure of the verse (a discrepancy of some significance in some schools of verse): Un astrologue, un jour, se laissa choir Au fond d'un puits. ]]>

The elements discussed in this section are formally defined as follows: ]]> Rhyme and Metrical Analysis

When the base tag set for verse is in use, the following additional attributes are available to record information about rhyme and metrical form: contains a user-specified encoding for the conventional metrical structure of the element. contains a user-specified encoding for the actual realization of the conventional metrical structure applicable to the element. specifies the rhyme scheme applicable to a group of verse lines.

These attributes may be attached to the lg element, to any of its numbered equivalents lg1, lg2, etc., to the higher-level text-division elements div, div1, etc., or to the body element itself if the whole of a text is in verse. In general, the attributes should be specified at the highest level possible; they may not however be specifiable at the highest level if some of the subdivisions of a text are in prose and others in verse. All these attributes may also be attached to the l and seg elements, but the default notation for the rhyme attribute has no defined meaning when specified on l or seg. The value for these attributes may take any form desired by the encoder; the nature of the notation used will determine how well the attribute values can be processed by automatic means.

The primary function of the metrical attributes is to encode the conventional metrical or rhyming structure within which the poet is working, rather than the actual prosodic realization of each line; the latter can be recorded using the real attribute, as further discussed below. There is no provision at this time, however, for encoding the particular realization of a rhyme pattern. Sample Metrical Analyses

As a simple example of the use of these attributes, consider the following lines from Pope's Essay on Criticism: ... 'Tis hard to say, if greater Want of Skill Appear in Writing or in Judging ill; But, of the two, less dang'rous is th'Offence, To tire our Patience, than mis-lead our Sense: ]]>

The body of this text is written entirely in heroic couplets; each line is iambic pentameter (which, using a common notation, can be described with the formula -+|-+|-+|-+|-+/, each - denoting a metrically unstressed syllable, each + a metrically stressed one, each | a foot boundary, and the / a line-end), and the couplets rhyme (which can be represented with the conventional formula aa).

Because both rhyme pattern and metrical form are consistent throughout the poem, they may be most conveniently specified on the body element; the values given for the attributes will be inherited by any metrical unit contained within this body element, and must be interpreted in the appropriate way.

Since the notation used in the met, real, and rhyme attributes is user-defined, no binding description can be given of its details or of how its interpretation must proceed. (A default notation is provided for the rhyme attribute, which however the encoder can replace with another; see section .) It is expected, however, that software should be able to support these attributes in useful ways; the more intelligent the software is, and the more knowledge of metrics is built into it, the better it will be able to support these attributes. In the extract given above, for example, the met and rhyme attribute values specified on the body element are inherited directly by the lg elements nested within the body. Since the met value specifies the metrical form of a single verse line, the structure of the lg as a whole is understood to involve as many repetitions of the pattern as there are lines in the verse paragraph. The same attribute value, when inherited in turn by the l element, must be understood not to repeat. With sufficiently sophisticated software, segments within the line might even be understood as inheriting precisely that portion of the formula which applies to the segment in question; this will, however, be easier to accomplish for some languages than for others.

The rhyme attribute in this example uses the default notation to specify a rhyme scheme applicable only to pairs of lines. As elsewhere, the default notation for the rhyme attribute has no meaning for metrical units at the line level or below. In verse forms where line-internal rhyme is structurally significant, e.g. in some skaldic poetry, the default notation is incapable of expressing the required information, since the rhyme pattern may need to be specified for units smaller than the line. In such cases, a user-specified rhyme notation must be substituted for the default notation, or else the rhyme pattern must be described using some alternative method (e.g. by using the link mechanism described below).

The precise semantics of the met attribute and the inferences which software is expected or able to draw from it, are implementation-dependent; so are the semantics and processing of the rhyme attribute, when user-specified notations are used.

A formal definition of the significance of each component of the pattern given as the value of the met attribute may be provided in the metDecl element within the encodingDecl element in the TEI header (see section ). The encoder is free to invent any notation appropriate to his or her analytic needs, provided that it is adequately documented in this element. The notation may define metrical components using invented or traditional names (such as iamb or hexameter) or in terms of basic units such as codes for stressed or unstressed syllables, or a combination of the two.

The real (for realization) attribute may optionally be specified to indicate any deviation from the pattern defined by the met attribute which the encoder wishes to record. By default, the real attribute has the same value as the met attribute on the same element; it is only necessary to provide an explicit value when the realization differs in some way from the abstract metrical pattern. The tension between conventional metrical pattern and its realization may thus be recorded explicitly. For example, many readers of the above passage would stress the word But at the beginning of the third line rather than the word of following it, as the metrical pattern would normally require. This variation might be encoded as follows: But of the two... ]]>

Where the real attribute is used to over-ride the default or conventional metrical pattern, it applies only to the element on which it is specified. The default pattern for any subsequent lines is unaffected.

As it happens, this particular kind of variation is very common in the English iambic pentameter;It even has a name: anapaestic substitution. an encoder might therefore choose to regard this not as an instance of a variant realization, but as an instance of a variant metrical form: But of the two... ]]> Alternatively, a different metrical notation might be defined, in which this kind of variation was permitted throughout the text.

In choosing whether to over-ride a metrical specification in this way or by using the real attribute, the encoder is required to determine whether the change is a systematic or conventional one (as in this example) or an occasional variation, perhaps for local effect. In the following example, from Goethe's Auf dem See, the variation is a matter of local realization: Und frische Nahrung, neues Blut Saug' ich aus freier Welt; Wie ist Natur so hold und gut, Die mich am Busen haelt! Die Welle wieget unsern Kahn Im Rudertakt hinauf, Und Berge, wolkig himmelan, Begegnen unserm Lauf. ]]> On the other hand, the famous inserted alexandrine in Pope's Essay on Criticism, might be encoded as follows: A needless alexandrine ends the song, That, like a wounded snake, drags its slow length along. ]]> Here the met attributes indicates that a different metrical convention (the alexandrine) is in force, while the real attribute indicates that there is a variation from that convention. As with many other aspects of metrical analysis, however, this is of necessity an entirely interpretive judgment. Segment-Level versus Line-level Tagging

The examples given so far have encoded information about the realization of metrical conventions at the level of the whole verse-line. This has obvious advantages of simplicity, but the disadvantage that any deviation from metrical convention is not marked at its precise point of occurrence in the text. Greater precision may be achieved, but only at the cost of marking deviant metrical units explicitly. This may be done with the seg element, giving the variant realization as the value of the real attribute on that element. Using this method, the example given immediately above might be encoded as follows: A needless alexandrine ends the song, That, like a wounded snake, drags its slow length along. ]]> The marking of the foot boundaries with the symbol | in the met attribute value of the l element allows the human reader, or a sufficiently intelligent software program, to isolate the correct portion of that attribute value as the default value for the same attribute on the seg1 elements for feet, namely -+. It is of course up to the encoder to decide whether or not to include the n attribute of seg here, and whether or not also to tag the feet in the line in which there is no deviation from the metrical convention. The ability of software to infer which foot is being marked, if not all are tagged, will depend heavily on the language of the text and the knowledge of prosody built into the software; the fuller and more explicit the markup, the easier it will be for software to handle it. It may prove useful, however, to mark metrical deviations in the manner shown, even if the available software is not sufficiently intelligent to scan lines without aid from the markup. Human readers who are interested in prosody may well be able to exploit the markup in useful ways even with less sophisticated software.

There are circumstances where it may also be useful to use the met attribute of seg. If we wish to identify the exact location of the different types of foot in the first line of Virgil's Aeneid, the text could be encoded as follows (for simplicity's sake the caesura has been omitted): Arma vi rumque ca no Tro iae qui primus ab oris ]]> An appropriate value of the met attribute might also be encoded on the enclosing div or body element, to indicate that each foot may be made up of a dactyl or a spondee, so that the values given here for met at the level of the foot may be considered a series of local variations on this fundamental pattern; in cases like this, of course, the local variations may also be considered aspects of realization rather than of convention, in which case the real attribute may be used instead of met, if desired. Metrical Analysis of Stanzaic Verse

The method described above may be used to encode quite complex verse forms, for instance various kinds of fixed-form stanzas. Let us take one of Dante's canzoni, in which each stanza except the last has the same combination of eleven-syllable and seven-syllable lines, and the same rhyme scheme: Doglia mi reca nello core ardire ]]>

Here the met attribute specifies a metrical pattern for each of the twenty-one lines making up a stanza of the canzone. Each stanza inherits this definition from the parent div0 element. The rhyme attribute specifies a rhyme scheme for each stanza, in the same way.

In the metrical notation used here, the letter E represents a line containing nine syllables which may or may not be metrically prominent, a tenth which is prominent and an optional non-prominent eleventh syllable. The letter S is used to represent a line containing five syllables which may or may not be metrically prominent, a sixth which is prominent and an optional non-prominent seventh syllable. A suitable definition for this notation might be given by a metDecl element like the following: xxxxxxxxx+o xxxxx+o metrically prominent or non-prominent metrically prominent optional non prominent line division ]]>

As noted above, the metrical pattern specified on the div0 applies to each lg (stanza) element contained within the div0. In fact however, after seven stanzas of this type, there is a final stanza, known as a commiato or envoi, which follows a different metrical and rhyming scheme. The solution to this problem is simply to specify a new met attribute on the eighth stanza itself, which will override the default value inherited from parent div0, as follows: ... Canzone, presso di qui ]]>

Note that, in the same way as for the real attribute, over-riding of this kind does not affect subsequent elements at the same hierarchic level. Any lg1 element following the commiato above would be assumed to use the same metrical and rhyming scheme as the one preceding the commiato. Moreover, although it is quite regular (in the sense that the last stanza of each canzone is a commiato), the over-riding must be specified for each case. Rhyme

The rhyme attribute is used to specify the rhyme pattern of a verse form. Like the met attribute, it can be used with a user-specified notation documented by the metDecl element in the TEI header. Unlike met, however, the rhyme attribute has a default notation; if this default notation is used, no metDecl element need be given.

The default notation for rhyme offers the ability to record patterns of rhyming lines, using the traditional notation in which distinct letters stand for rhyming lines. For a work in rhyming couplets, like the Pope example above, the rhyme attribute simply specifies aa, indicating that pairs of adjacent lines rhyme with each other. For a slightly more complex scheme, applicable to groups of four lines, in which lines 1 and 3 rhyme, as do lines 2 and 4, this attribute would have the value abab. The traditional Spenserian stanza has the pattern ababbcbcc, indicating that within each nine line stanza, lines 1 and 3 rhyme with each other, as do lines 2, 4, 5 and 7, and lines 6, 8 and 9.

Non-rhyming lines within such a group may be represented using a hyphen or an x, as in the following example: The sunlight on the garden Hardens and grows cold, We cannot cage the minute Within its nets of gold When all is told We cannot beg for pardon. ]]> Note however that the default notation includes no specific way of recording internal rhyme, such as that between the end of the first line and the start of the second in this particular poem. For this, a special user-defined notation would need to be declared using the metDecl element in the header. Alternatively, rhyme, like alliteration or assonance, may be considered as a special form of correspondence, and hence encoded using the mechanisms defined for that purpose in section .

To use the correspondence mechanisms to represent the complex rhyming pattern of the above example, we first need to delimit and identify each rhyming sequence within the text: the seg element may be used for this purpose, as follows: The sunlight on the garden Hardens and grows cold, We cannot cage the minute Within its nets of gold When all is told We cannot beg for pardon. ]]> Now that each rhyming word, or part-word, has been tagged and allocated an arbitrary identifier, the general purpose link element may be used to indicate which of the seg elements share the same rhyme, as follows: ]]>

For further discussion of the link and linkGrp element, see section . Encoding Procedures For Other Verse Features

A number of procedures that may be of particular concern to encoders of verse texts are dealt with elsewhere in these guidelines. Some aspects of layout and physical appearance, especially important in the case of free verse, are dealt with in chapter . Some initial recommendations for the encoding of phonetic or prosodic transcripts, which may be helpful in the analysis of sound structures in poetry, are to be found in chapter ; it may also be found convenient to use standard entity names (those proposed for the International Phonetic Alphabet suggest themselves) to mark positions of suprasegmentals such as primary and secondary stress, or other aspects of accentual structure.

As already indicated, chapter contains much which will be found useful for the aligning of multiple levels of commentary and structure within verse analysis. Encoders of verse (as of other types of literary text) will frequently wish to attach identifying labels to portions of text that are not part of a system of hierarchical divisions, may overlap with one another, and/or may be discontinuous; for instance passages associated with particular characters, themes, images, allusions, topoi, styles, or modes of narration. Much of the computerized analysis of verse seems likely to require dividing texts up into blocks in this way. The span element discussed in provides the means for doing this. Finally, the procedures for the tagging of feature structures, described in chapter , provide a powerful means of encoding a wide variety of aspects of verse literature, including not only the metrical structures discussed above, but also such stylistic and rhetorical features as metaphor.

For other features it must for the time being be left to encoders to devise their own terminology. Elements such as metaphor tenor= ... vehicle= ... ... /metaphor might well suggest themselves; but given the problems of definition involved, and the great richness of modern metaphor theory, it is clear that any such format, if pre-defined by these Guidelines, would have seemed objectionable to some and excessively restrictive to many. Leaving the choice of tagging terminology to individual encoders carries with it one vital corollary, however: the encoder must be utterly explicit, in the TEI header, about the methods of tagging used and the criteria and definitions on which they rest. Where no formal elements are currently proposed, such information may readily be given as simple prose description within the encodingDesc element defined in section .