Next: , Previous: Syntax for Strings, Up: String Type


2.3.8.2 Non-ascii Characters in Strings

You can include a non-ascii international character in a string constant by writing it literally. There are two text representations for non-ascii characters in Emacs strings (and in buffers): unibyte and multibyte. If the string constant is read from a multibyte source, such as a multibyte buffer or string, or a file that would be visited as multibyte, then the character is read as a multibyte character, and that makes the string multibyte. If the string constant is read from a unibyte source, then the character is read as unibyte and that makes the string unibyte.

You can also represent a multibyte non-ascii character with its character code: use a hex escape, ‘\xnnnnnnn’, with as many digits as necessary. (Multibyte non-ascii character codes are all greater than 256.) Any character which is not a valid hex digit terminates this construct. If the next character in the string could be interpreted as a hex digit, write ‘ (backslash and space) to terminate the hex escape—for example, ‘\x8e0\  represents one character, ‘a’ with grave accent. ‘ in a string constant is just like backslash-newline; it does not contribute any character to the string, but it does terminate the preceding hex escape.

Using a multibyte hex escape forces the string to multibyte. You can represent a unibyte non-ascii character with its character code, which must be in the range from 128 (0200 octal) to 255 (0377 octal). This forces a unibyte string.

See Text Representations, for more information about the two text representations.