Next: , Previous: Language, Up: Language



4.1 Tokens

PSPP divides most syntax file lines into series of short chunks called tokens. Tokens are then grouped to form commands, each of which tells PSPP to take some action—read in data, write out data, perform a statistical procedure, etc. Each type of token is described below.

Identifiers
Identifiers are names that typically specify variables, commands, or subcommands. The first character in an identifier must be a letter, #, or @. The remaining characters in the identifier must be letters, digits, or one of the following special characters:
          
. _ $ # @

Identifiers may be any length, but only the first 64 bytes are significant. Identifiers are not case-sensitive: foobar, Foobar, FooBar, FOOBAR, and FoObaR are different representations of the same identifier.

Some identifiers are reserved. Reserved identifiers may not be used in any context besides those explicitly described in this manual. The reserved identifiers are:

          
ALL AND BY EQ GE GT LE LT NE NOT OR TO WITH

Keywords
Keywords are a subclass of identifiers that form a fixed part of command syntax. For example, command and subcommand names are keywords. Keywords may be abbreviated to their first 3 characters if this abbreviation is unambiguous. (Unique abbreviations of 3 or more characters are also accepted: FRE, FREQ, and FREQUENCIES are equivalent when the last is a keyword.)

Reserved identifiers are always used as keywords. Other identifiers may be used both as keywords and as user-defined identifiers, such as variable names.

Numbers
Numbers are expressed in decimal. A decimal point is optional. Numbers may be expressed in scientific notation by adding e and a base-10 exponent, so that 1.234e3 has the value 1234. Here are some more examples of valid numbers:
          -5  3.14159265359  1e100  -.707  8945.
     

Negative numbers are expressed with a - prefix. However, in situations where a literal - token is expected, what appears to be a negative number is treated as - followed by a positive number.

No white space is allowed within a number token, except for horizontal white space between - and the rest of the number.

The last example above, 8945. will be interpreted as two tokens, 8945 and ., if it is the last token on a line. See Forming commands of tokens.

Strings
Strings are literal sequences of characters enclosed in pairs of single quotes (') or double quotes ("). To include the character used for quoting in the string, double it, e.g. 'it''s an apostrophe'. White space and case of letters are significant inside strings.

Strings can be concatenated using +, so that "a" + 'b' + 'c' is equivalent to 'abc'. Concatenation is useful for splitting a single string across multiple source lines. The maximum length of a string, after concatenation, is 255 characters.

Strings may also be expressed as hexadecimal, octal, or binary character values by prefixing the initial quote character by X, O, or B or their lowercase equivalents. Each pair, triplet, or octet of characters, according to the radix, is transformed into a single character with the given value. If there is an incomplete group of characters, the missing final digits are assumed to be 0. These forms of strings are nonportable because numeric values are associated with different characters by different operating systems. Therefore, their use should be confined to syntax files that will not be widely distributed.

The character with value 00 is reserved for internal use by PSPP. Its use in strings causes an error and replacement by a space character.

Punctuators and Operators
These tokens are the punctuators and operators:
          
, / = ( ) + - * / ** < <= <> > >= ~= & | .

Most of these appear within the syntax of commands, but the period (.) punctuator is used only at the end of a command. It is a punctuator only as the last character on a line (except white space). When it is the last non-space character on a line, a period is not treated as part of another token, even if it would otherwise be part of, e.g., an identifier or a floating-point number.

Actually, the character that ends a command can be changed with SET's ENDCMD subcommand (see SET), but we do not recommend doing so. Throughout the remainder of this manual we will assume that the default setting is in effect.