Proposed Revision of RTP Abstract The protocol description contained in this memo is based on the outline of proposed changes presented at the March IETF meeting. It is the result of several email and teleconference discussions among Steve Casner, Ron Frederick, Van Jacobson and Henning Schulzrinne to define the protocol details not addressed in March. Most of the protocol details have been resolved here, but there are some remaining open questions for which wider working group input is sought. To keep this memo short and help focus attention on the protocol changes, the introductory text and appendices have been left out, and as a result there are a few unresolved references. This memo is to be followed by a complete protocol specification with more explanatory text once feedback from the working group has been received. Please note the "TBD" (to be determined or to be discussed) notation identifying some specific issues. For details, see the section 'Open Issues'. 1 Changes This draft contains a number of changes discussed at the March 1994 IETF meeting. This draft does not completely specify certain aspects of the protocol and is likely to change in some details. o Control information is no longer carried as options in the RTP data packets. Instead, control information is carried in RTCP packets on a separate lower-layer transport association (e.g., UDP port). RTCP packets are self-contained units that can be processed independently. There is a hook provided to allow profile-specific extensions to the header of data packets should a requirement arise. o The RTP timestamp now measures time in a clock rate defined by the encoding rather than a fixed 65536 Hz clock. The timestamp itself has no predictable relation to wallclock time, but the relation to an NTP timestamp may be carried explictly in an RTCP packet. o SSRC has changed from a 16-bit locally unique identifer inserted by translators to a 32-bit globally unique random identifier present in every packet. The same values are used for CSRC identifiers. o The channel identifier has been dropped. Multiplexing is provided by lower layers or encapsulations. o The fixed RTP header contains a counter for the number of CSRC items present. o Encryption covers the whole packet and is indicated by validity checks on the header fields rather than an option. No authentication is specified at this time. o The version field has a value of two instead of one. It is also called type to allow for the future use of the value three to indicate an encapsulation for authentication or other purposes. o The payload type field has been increased from 6 to 7 bits. Other bits have been slightly rearranged. o The synchronization bit has been renamed to a marker bit, with its meaning defined by the application. o A padding bit indicates end-of-packet padding for encryption. o Sender/receiver reports, which are similar to the QOS option in version 1 of RTP, are established as an integral and required part of the protocol to manage distribution of the data. Similarly, the SDES CNAME is required to provide the external binding for the random SSRC identifiers. o The unicast reverse path RTP control packet is eliminated because the change in SSRC identifers and the encryption method preclude reverse mapping in translators. Normal (multicast) RTCP is used instead. o The BOS option (sequence number of start of sync unit) is eliminated. Encodings which need that information may include it within the payload. 2 Open Issues There are several specific issues for which additional input from other working group members is sought. Please consider the following questions in light of applications you may have in mind. o A description is included here for the RTCP FMT packet. Should this be kept? An argument in favor is that there are combinatorially more formats that a workstation's audio hardware and software may support than may be encoded into the payload type field, and that the creator of a session won't know which of those formats some participant may need to use to play a prerecorded clip during a session, so this method of defining dynamic payload type values is required. The counter argument is that only a few of the many possible formats will actually 2 be used to record clips due to requirements for compatibility, and that including this method of defining payload types requires keeping track of the payload types per source. o This draft does not fully specify RTP authentication, as the details will depend on external machinery such as authentication servers or signature hierarchies which have not yet been deployed. However, a means to add authentication is provided using type (version) 3 to mark an encapsulation header that would contain the information necessary for authentication. o The description of SDES CNAME (Section 4.5.1) includes a TBD paragraph proposing that in some circumstances the CNAME identifier may be less unique than a fully qualified domain name. This may not be feasible because the tools are not going to know when a more or less specific identifier is needed unless the higher level protocol tells them. To keep the CNAME unique, we want it to be set automatically, not by a person unless there's no other way. o For IP multicast operation, sender/receiver reports are a necessary part of the protocol. For purely unidirectional systems, receiver cannot send reports, but should the sender still be required to send sender reports? This is still useful to establish the relation between RTP timestamps and real time for synchronization purposes. Under what circumstances should it be allowed that sender/receiver reports be omitted? o Separation of RTP data and control onto two transport associations would require two ST-II connections. This might be advantageous since the characteristics of the data and control flows may differ considerably. However, separate connections may have a higher cost and introduce strange failure modes if only one connection succeeds. It would be possible to define an encapsulation that provided demultiplexing of the control and data to allow both to be sent in one connection, and perhaps to allow both control and data packets into one ST-II packet. (An encapsulation was previously proposed to provide a checksum.) What should be done? o Jon Crowcroft had asked for a means to include Ian Wakeman's congestion probe scheme in the data packets, but this is not included as part of the spec as it stands. Van Jacobson believes that the receiver reception report scheme described here conveys at least as much information as the sender initiated reporting scheme described in [1] and does so with equivalent or better time bounds. If this is so, the control algorithm in [1] could simply be driven off of RTP reception reports and an additional option would not be needed. Wakeman and Jacobson have yet to discuss this. o The X bit (for header extension) has been taken from the CSRC count field, reducing that field to 4 bits. We believe that 15 sources is more than enough for audio mixing (typically no more than 3 or 4 3 sources will actually be mixed), and that for applications where 15 is not enough it is also likely that 31 is not enough. Those applications might for a larger count either with a header extension or within the payload. Other choices would be to put the X bit in the second byte with the marker and payload type, or to omit the header extension function entirely. o In this draft, there are several length fields that specify length in octets for consistency with IP and UDP. However, some of these lengths, such as in the header extension and in the RTCP header, must always be a multiple of 4 octets. Van Jacobson suggests that those lengths be in 32-bit words rather than octets to engineer out a potential error rather than test for it at runtime. Furthermore, he suggests that those lengths not count the word containing the length to avoid the need to check for 0 length as protection against an infinite loop. This simplifies the parsing code; currently it is: check that bottom two bits are zero, check that length is non-zero, check that length doesn't go off end of packet. Changing the length would make it: shift left 2, check that length doesn't go off end of packet. Comments? In addition to these issues which affect the the main RTP spec, there are two more issues affecting the audio and video profiles, which we must write as companion documents. o The bits in the second byte of the RTP data header may be allocated as needed by a profile. The default allocation specified here is for one marker bit and 7 bits of payload type (compared to 6 bits for the format field in RTPv1). For the audio and video profiles, we must choose the number of marker bits and their meaning. For audio, we could have one bit marking the beginning of a talkspurt, or one bit marking the end, or both. In RTPv1, there was just one bit which always marked the end, since one can deduce that the next packet must be the beginning. However, this requires that both packets be received to work, and for audio it is the beginning mark that we need most to make playout delay adjustments. Henning Schulzrinne has suggested allocating both marker bits so that the end marker can be used to control the choice of gap fill or between-talkspurt silence fill. Van Jacobson claims that an end marker serves no purpose since the process of audio fill always involves an interpolation (between the last received frame and either silence or the end of the gap) and the algorithm is unaffected by whether there is or is not an end marker. For video, the end-of-frame marker is useful to control displaying an image. For those encodings that need to know the start of a frame, is that information implicit in the encoding? o Two methods are defined for encryption. At the RTP level, encryption covers the entire packet. The alternative, which allows encrypting only the data while leaving the headers in the clear, is to define 4 additional payload types for encrypted encodings. Which should we choose as the default method for the audio and video profiles? Under the general security principle that as much information should be encrypted as possible, RTP level encryption should be chosen. However, this precludes third party monitoring and requires that recorders operating without the key be able to work with only observed packet timing rather than source timing. What applications are being considered for which one or the other choice of encryption method is a hard requirement? 3 RTP Data Transfer Protocol 3.1 RTP Fixed Header Fields The RTP header has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | CSRCs | | .... | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The first twelve octets are present in every RTP packet, while the list of CSRC identifiers is present only when inserted by a bridge. The fields have the following meaning: type (T): 2 bits Identifies the type of RTP packet. The type of the packet described here is two (2). (The value of 2 was chosen to easily distinguish packets from those of the prior version of RTP and the protocol used by the vat audio tool.) padding (P): 1 bit If the padding bit is one, the packet contains one or more additional bytes at the end which are not part of the payload. The very last byte of the packet is a count of how many padding bytes should be ignored. Padding may be needed by some encryption algorithms with fixed block 5 sizes or for carrying several RTP packets in a lower-layer protocol data unit. extension (X): 1 bit The bit indicates that the fixed header is followed by exactly one header extension, with a format defined below. CSRC count (CC): 4 bits This field contains the number of CSRC identifiers that follow the fixed header. marker (M): 1 bit The interpretation of this field is defined by a profile. A profile may define additional marker bits by reducing the number of bits in the payload type field. payload type (PT): 7 bits The payload type forms an index into a table defined through profiles, the RTCP FMT packet (see Section 4.7) and non-RTP mechanisms (see Section ??). The mapping establishes the format of the RTP payload and determines its interpretation by the application. A profile specifies a standard mapping. An initial set of default mappings for audio and video is specified in the companion profile document RFC TBD, and may be extended in future editions of the Assigned Numbers RFC. sequence number: 16 bits The sequence number counts RTP packets. The sequence number increments by one for each packet sent. The sequence number may be used by the receiver to detect packet loss and to restore packet sequence. The initial value of the sequence number is random (unpredictable) to make known-plaintext attacks on encryption more difficult, even if the source itself does not encrypt. timestamp: 32 bits The timestamp is incremented with a clock frequency determined by the format of data carried as payload. For example, for fixed-rate audio, the timestamp would likely increment by one for each sample. The clock frequency is determined statically by a profile for a set of payload types, during a session through RTCP FMT packets or through other, non-RTP means. Several consecutive RTP packets may have equal timestamps if they are (logically) generated at once, e.g., belong to the same video frame. The initial value of the timestamp is random, as for the sequence number. SSRC: 32 bits Synchronization source identifier. This value is chosen randomly, with the intent that no two synchronization sources within the same channel have the same SSRC value. Details are described in Section 3.2. 6 CSRC: up to 15 items, 32 bits each Zero or more content source identifiers. The number of content source identifiers is given by CC. There can be no more than 15 content sources. CSRC identifiers are inserted only by bridges, using the SSRC identifiers of contributing sources. For example, for audio packets, all sources that were mixed together to create a packet are enumerated, allowing correct talker indication at the receiver. A CC value of 0 indicates that the SSRC is the content source. If CC is non-zero, the SSRC (a bridge) is not a content source; a bridge that is also a content source for some packet must explicitly include itself in the CSRC list for that packet. The CSRC list is not modified by translators. 3.2 SSRC Random Identifier Allocation The SSRC identifier described above is a random 32-bit quantity that is intended to be globally unique within a media session. An example of how to generate such an identifier is presented in Section ??. A source should monitor the transmission of other sources before picking its own value. If a source discovers at any time that another source is already using the same SSRC identifier, it randomly chooses a different SSRC identifier. (A source should send a BYE control packet with the old SSRC identifier before switching to allow applications to clear any records for this SSRC.) If N is the number of sources and L the length of the identifier (here, 32 bits), the probability that two sources independently pick the same value can be approximated for large N [2, p. 33] as 1 - exp(- N**2 / 2**(L+1)). For N=1000, the probability is roughly 0.01%. Because the random identifiers are globally unique, they can be used to detect loops that may be introduced by bridges. For each source, both the SSRC and CSRC identifiers should be recorded (they will be the same for sources that are not received through a bridge). If a packet arrives with a different SSRC than the one recorded, that packet is ignored. Intentional changes in the path are accommodated because the SSRC and CSRC state for the old path will be discarded after a timeout since the packets with that SSRC will have ceased. 3.3 RTP Header Extension A mechanism is provided for extending the RTP data packet header in an application-specific or profile-specific way. If an extension is needed across all profiles, a new version of RTP should be defined instead to make a permanent change to the fixed header. 7 If the X bit in the RTP header is one, the RTP header (including any CSRC list) is followed by a variable-length header extension. The header extension contains a 16-bit length field that counts the number of octets in the extension, including the four-octet extension header. The first 16 bits of the header extension is defined by the profile. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | defined by profile | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The main RTP specification does not define any header extensions. Every conformant RTP application needs to be able to skip, but not process the header extension. 4 RTP Control Protocol --- RTCP 4.1 Introduction The RTP control protocol (RTCP) provides two functions: (1) monitoring the distribution of data, and (2) conveying minimal session information. The first function is performed by the RTCP sender or receiver report packets, described below. This function is an integral part of the RTP's role as a transport protocol, and is mandatory when RTP is used in the IP multicast environment. The second RTCP function provides support for "loosely controlled" sessions, i.e., where participants enter and leave without membership control and parameter negotiation. RTCP packets are sent to all members of a session, using the same distribution mechanism as for data packets. The underlying protocol must provide multiplexing of the data and control packets, for example using separate port numbers with UDP. The period between RTCP packets should be varied randomly to avoid synchronization of all sources. Its mean should increase with the number of participants in the session to limit the growth of the overall network and host interrupt load to a small fraction of the load induced by the media data. An algorithm for calculating the period is given in Appendix ??. The length of the period determines, for example, how long a receiver joining a session has to wait until it can identify the source. A receiver may remove from its list of active sites a site that it has not been heard from for a given time-out period; the time-out period may depend on the number of sites or the observed average interarrival time of RTCP messages. Note that not every RTCP packet has to contain all RTCP 8 descriptions for a source; for example, SDES EMAIL might only be sent every few messages. 4.2 RTCP packet format Each RTCP packet begins with a fixed part similar to that of RTP data packets, followed by structured elements that may be of variable length but always end on a 32-bit boundary. The length field and alignment requirement are included to make RTCP packets "stackable". Multiple RTCP packets may be sent in a single packet of the lower layer protocol such as UDP to combine as much information as possible into one packet, particularly for translators and bridges. This is advisable since per-packet processing overhead in the network and in many operating systems is high. For example, in a Unix operating system running the X windowing system, each packet is likely to cause a hardware interrupt, a software interrupt, a context switch and an X event. Any combination of RTCP packets may be stacked in one lower-layer packet, and each RTCP packet is processed independently. An application may skip RTCP packets with types unknown to it. Additional RTCP packet types may be registered with the Internet Assigned Numbers Authority. The first RTCP packet is always a report packet, which may be in either of two forms: a sender report (SR) for source that have recently transmitted RTP data packets or receiver reports (RR) for sources that have not recently sent RTP data. It may optionally be followed by more receiver report (RR) packets if the number of sources being reported exceeds 31, the number that will fit into one SR or RR packet. These one or more report packets are followed by an SDES packet containing at least the CNAME item. Finally, FMT, APP, BYE or other, yet to be defined packet types may follow in any order. Packet types may appear more than once. 9 4.3 SR: Sender report 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P| RC | PT=SR | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NTP timestamp, most significant word | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NTP timestamp, least significant word | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTP timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's packet count | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | sender's byte count | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC_1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cumulative number of packets received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cumulative number of packets expected | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | interarrival jitter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | last SR (LSR) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since last SR (DLSR) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC_2 | ... +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | application-specific extensions | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The sender report packet consists of two sections. The first section, the actual sender report, is 24 octets long and is present in every sender report packet. The second section contains zero or more reception reports depending on the number of sources heard since the last report. The fields have the following meaning: type (T): 2 bits The current value of the type identifier is 2 (two), as in RTP packets. padding (P): 1 bit If the padding bit is one, the packet contains some additional bytes at the end which are not part of the payload. The very last byte of the 10 packet is a count of how many padding bytes should be ignored. Padding may be needed by some encryption algorithms with fixed block sizes. reception report count (RC): 5 bits This field contains the number of reception report blocks contained in this packet. A value of zero is valid. length: 16 bits The length of this RTCP packet in octets, including the header and any padding. SSRC: 32 bits Synchronization source identifier for the sender of this packet. NTP timestamp: 64 bits The NTP timestamp corresponds to the wallclock time when this traffic report is sent so that it may be used in combination with timestamps returned in reception reports from other receivers to measure round-trip propagation to those receivers. Receivers should expect that the measurement accuracy of the timestamp may be limited to far less than the resolution of the NTP timestamp. The measurement uncertainty of the timestamp is not transmitted as it is usually difficult to estimate with any degree of reliability. A sender that can keep track of real time but has no notion of wallclock time may use the elapsed time of the session instead. It is permissible to use the sampling clock to estimate elapsed wallclock time. This is assumed to be less than 68 years, so the high bit will be zero. A sender that has no notion of real time may set the NTP timestamp to zero. RTP timestamp: 32 bits Reference timestamp that corresponds to the same time as the NTP timestamp (above). This correspondence may be used for intra- and inter-media synchronization for sources whose NTP timestamps are synchronized, and may be used by media-independent receivers to estimate the nominal RTP clock frequency. sender's packet count: 32 bits Counts the total number of RTP packets transmitted by the source since the source has started transmission and until the time this SR packet was generated. sender's byte count: 32 bits Counts the total number of bytes transmitted in RTP packets by the source since the source has started transmission and until the time this sender report packet was generated. The byte count includes only the payload of RTP data packets. This field can be used to estimate the overall payload data rate. Each reception report in the second section of the sender packet conveys statistics on the reception of RTP packets from a single synchronization 11 source. These statistics are: source identifier: 32 bits Identifies the source to which the information in this report pertains. cumulative number of packets received: 32 bits The field contains the total number of RTP packets received from the source since the beginning of reception. By taking the difference in this number between two reception reports from a given source, and dividing by the interval between those two reports, a received packet rate may be calculated. cumulative number of packets expected: 32 bits The field contains the total number of packets expected by the receiver, which may be computed according to the algorithm in Appendix ??. Together with the cumulative number of packets received, a monitor can measure the packet loss rate over both short and long time periods. The number of packets expected may also be used to judge the statistical validity of any loss estimates. (For example, 1 out of 5 packets lost has a different significance than 200 out of 1000.) There will be no loss indication (and likely no reception report issued) for a source if all recent packets from that source have been lost. interarrival jitter: 32 bits The interarrival jitter is computed by the following algorithm: TBD. last SR (LSR): 32 bits The middle 32 bits of the last NTP timestamp received as part of the RTCP reception report packet from the source being reported. delay since last SR (DLSR): 32 bits Delay, expressed in units of 1/65536 seconds, between receiving the sender's SR packet and sending this SR packet. The 'last SR' and 'delay since last SR' fields allow the computation of round trip time by the sender of the SR. This may be used to cluster nodes according to propagation delay. If the reception report for SSRC S from receiver R arrives at time A at S, S can compute the round-trip time to R as A -- LSR -- DLSR. Note that round-trip may be of limited use for many real-time applications and that some links have very asymmetric delays. All reported numbers except interarrival jitter are cumulative. The difference between two reports can be used to estimate recent quality of the distribution. A fixed clock (NTP timestamp) is chosen so that quality monitors do not have to be cognizant of the clock rate for the current encoding. If a source cannot compute a particular value, it inserts a value of zero. A receiver (end system or bridge) should send sender/receiver report packets 12 including a reception report for each source from which it has received RTP packets since the last report, or for as many such sources as will fit. A bridge should not send reception reports on one side for sources it has received on the other side. A profile may define application specific extensions to the sender report if there is additional information that should be reported regularly about the sender or receivers. If information about receivers is to be included, that data may be structured as an array of blocks parallel to the array of receiver reports in the second section of the sender report. 4.4 RR: Receiver report 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P| RC | PT=RR | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC_1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cumulative number of packets received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cumulative number of packets expected | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | interarrival jitter | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | last SR (LSR) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | delay since last SR (DLSR) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC_2 | ... +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | application-specific extensions | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The RR packet is issued in place of an SR packet only if the application has not recently sent any RTP data packets. (Unless specified by a profile, the timeout delay between sending the last RTP packet and ceasing to send SR packets should be a small multiple of the current reporting interval.) The packet fields have the same meaning as for the SR packet. Additional RR packets may follow the initial SR or RR packet if there are more than 31 sources to be reported. 13 4.5 SDES: Source description 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P| CC | PT=SDES | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC/CSRC_1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDES items | | ... | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | SSRC/CSRC_2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDES items | | ... | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ The SDES packet is composed of a header and zero or more chunks containing items describing the sources identified in those chunks. The items are described individually below. type (T), padding (P), payload type (SDES), length: As described for the SR packet. CC: 5 bits This field contains the number of SSRC/CSRC chunks included in this SDES packet. Each chunk consists of an SSRC/CSRC identifier followed by a list of zero or more items, which carry information about the SSRC/CSRC. Each chunk starts on a 32-bit boundary. Each item consists of an 8-bit type field, an 8-bit octet count (including this two-byte header) and text. Items are contiguous, i.e., items are not individually padded to a 32-bit boundary. Text is not zero terminated. The list of items in each chunk is terminated by one or more zeroes to denote the end of the list and pad until the next 32-bit boundary. An SDES packet with zero chunks or a chunk with zero items is valid but useless. End systems send one SDES packet containing their own source identifier (the same as the SSRC in the fixed RTP header). A bridge sends one SDES packet containing a chunk for each content source from which it is receiving SDES information, or more than one SDES packet if there are more than 31 such sources. The following SDES items are currently defined. Additional items may be defined in a profile; some items shown here may be useful for particular profiles only. Not all items need to be sent with every SDES packets, 14 except for the CNAME item, which is mandatory.(1) 4.5.1 CNAME: Canonical end-point identifier 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CNAME | length | user and domain name ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The CNAME identifier serves four purposes: o Because the randomly allocated SSRC identifier may change if a conflict is discovered or if a program is restarted, the CNAME item is required to provide the binding to an identifier for the source that remains constant. o Like the SSRC identifier, the CNAME identifier should also be unique within one medium of a session. o To provide a binding among multiple media tools used in a session by one participant, the CNAME should be fixed for that participant. o To facilitate third-party monitoring, the CNAME should be suitable for either a program or a person to locate the source. Therefore, the CNAME should be derived algorithmically and not entered manually. To meet these requirements, the following format should be used unless a profile specifies an alternate syntax or semantics. The CNAME item should have the format "user@host" or "host", where "host" is the fully qualified domain name of the host from which the real-time data originates, formatted according to the rules specified in RFC 1034, RFC 1035 and Section 2.1 of RFC 1123. The "host" form may be used if a user name is not available, for example on single-user systems. Only if a system cannot obtain a valid domain name, it may use the printable representation of its lowest numbered numeric network address. Hosts using IP Version 4 use the 'dotted decimal' (also known as 'dotted quad') representation. Examples are: "doe@sleepy.megacorp.com" or "sleepy.megacorp.com" or "doe@192.35.149.160" or "192.35.149.160" ------------------------------ 1. Items are defined here rather than in the profile to simplify profile-independent applications, using common type numbers. 15 The user name should be in a form that a program such as "finger" or "talk" could use, i.e., it typically is the login name rather than the real-life name. Note that the host name is not necessarily identical to the electronic mail address of the participant. Note that this syntax will not provide unique identifiers for each source if an application permits a user to generate multiple sources from one host. Such an application would have to rely on the SSRC to further identify the source, or the profile for that application would have to specify additional syntax for the CNAME identifier. [TBD: In some cases it may be desirable to only use the domain name (as in "megacorp.com"), rather than the full host name, or to permit that a user intentionally designates two senders as equivalent by assigning them the same CNAME. This allows to maintain the same identity when changing hosts locally or when using separate hosts for multiple applications in the same conference, e.g., in a video and audio session. This choice should be used with great care and only after ascertaining that the resulting identifier is indeed globally unique. (In particular, a user identifier must be unique within the domain.) Should this generic format be allowed?] 4.5.2 NAME: User name 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | NAME | length | common name of source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The real name used to describe the source, e.g., "John Doe, Bit Recycler, Megacorp". This name may be in any form desired by the user. For applications such as conferencing, this form of name may be the most desirable for display in participant lists, and therefore might be sent most frequently (profiles may establish such priorities). The NAME value is expected to remain constant at least for the duration of a session. It should not be relied upon to be unique across the session. 4.5.3 EMAIL: User's electronic mail address 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EMAIL | length | email address of source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 16 The email address is formatted according to RFC 822, for example, "John.Doe@megacorp.com". The EMAIL value is expected to remain constant for the duration of a session. 4.5.4 LOC: Geographic user location 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LOC | length | geographic location of site ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Depending on the application, different degrees of detail are appropriate for this item. For conference applications, a string like "Murray Hill, New Jersey" may be sufficient, while, for an active badge system, strings like "Room 2A244, AT&T BL MH" might be appropriate. The degree of detail is left to the implementation and/or user, but format and content may be prescribed by a profile. The LOC value is expected to remain constant for the duration of a session, except for mobile hosts. 4.5.5 TXT: Text describing the source 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TXT | length | text describing source ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Message describing the current state of the source, e.g., "can't talk, having lunch". During a seminar, this field might be used to convey the title of the talk. The TXT value is likely to change during a session. 17 4.6 BYE: Goodbye 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P| CC | BYE | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC/CSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reason for leaving ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The BYE packet indicates that one or more sources are no longer active. type (T), padding (P), payload type (BYE), length: As described for the SR packet. CC: 5 bits This field contains the number of SSRC/CSRC identifiers included in this SDES packet. A count value of zero is valid, but meaningless. If a BYE packet is received by a bridge, the bridge forwards the BYE packet with the identifier(s) unchanged. If a bridge shuts down, it should send a BYE packet listing all content sources it handles, as well as its own SSRC value. Optionally, the BYE packet may include a string indicating the reason for leaving, e.g., "equipment malfunction". 4.7 FMT: Payload type mapping 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P|reserved | PT=FMT | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reserved | RTP-PT | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | nominal clock frequency, integer part | fraction | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | format name | 18 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | format-dependent data | | ... | A FMT mapping establishes the interpretation of a given format value carried in the fixed RTP header starting at the packet containing the FMT option. The new interpretation applies only to packets from the same synchronization source as the packet containing the FMT option, i.e., it is source-specific. An existing mapping, e.g., one established through a profile or directory service, must not be changed. Sources should not send FMT packets for encodings that are defined by a profile. [TBD: Need for FMT/implications need to be discussed.] type (T), padding (P), payload type (PT), length: See description of SR packet. SSRC: 32 bits Synchronization source identifier for the sender of this packet. RTP payload type (RTP-PT): 8 bits The payload type field corresponds to the index value from the payload type field in the RTP header. This mapping is expected to remain constant for the duration of a session. nominal clock frequency: 32 bits The nominal clock rate is specified as the closest fixed point value consisting of a 24 bit integer and an 8 bit binary fraction. A value of zero indicates that the clock rate is unknown or variable. Note that the actual clock rate of a source may well differ slightly from this nominal value. For example, an audio source might indicate a nominal clock rate of 8000 Hz, while it emits samples at an actual rate of 8005 Hz, due to imperfections in the local crystal oscillator. The nominal clock rate is used since measuring the actual clock rate may be difficult. Furthermore, for precise synchronization over longer time periods, the SR clock values are to be used. The clock rate information allows recorders to play back RTP packets without knowledge of the semantics of the payload data. format name: 32 bits The format name describes the format in an unambiguous way and is registered with the Internet Assigned Numbers Authority. The format name is interpreted as a sequence of four ASCII characters, with uppercase and lowercase characters treated as distinct. Format names beginning with the letter 'X' are reserved for experimental use and not subject to registration. format-dependent data: variable length Format-dependent data may or may not appear in a FMT packet. It is interpreted by the application and not RTCP itself. 19 4.8 APP: Application-defined 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |T=2|P| subtype | PT=APP | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SSRC/CSRC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | name (ASCII) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | application-dependent data ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The APP packet is intended for experimental use as new applications and new features are developed, without requiring packet type value registration. APP packets with unrecognized names should be ignored. After testing and if wider use is justified, it is recommended that each APP packet be redefined without the subtype and name fields and registered with the Internet Assigned Numbers Authority using an RTCP packet type. type (T), padding (P), packet type (APP), length: As defined for the SR packet. subtype: 5 bits May be used as a subtype to allow a set of APP packets to be defined under one unique name, or for any application-dependendent data. name: 4 octets A name chosen by the person defining the set of APP packets to be unique with respect to other APP packets this application might receive. The application creator might choose to use the application name, and then coordinate the allocation of subtype values to others who want to define new packet types for the application. Alternatively, it is recommended that others choose a name based on the entity they represent, then coordinate the use of the name within that entity. The name is interpreted as a sequence of four ASCII characters, with uppercase and lowercase characters treated as distinct. application-dependent data: variable length Application-dependent data may or may not appear in an APP packet. It is interpreted by the application and not RTP itself. 20 5 Security This section defines a confidentiality security service and defines standard algorithms for both RTP and RTCP. Other services, other implementations of services and other algorithms may be defined in the future. The selection presented here is meant to simplify implementation of interoperable, secure applications and provide guidance to implementors. No claim is made that the methods presented here are appropriate for a particular security need. A profile specifies which of the services and algorithms should be offered by applications, and may provide guidance as to their appropriate use. Confidentiality means that only the intended receiver(s) can decode the received packets; for others, the packet contains no useful information. Confidentiality of the content is achieved by encryption. All RTP and RTCP packets in a single lower-layer protocol data unit are encrypted as a unit. For RTCP, it is allowed to send some such lower-layer packets encrypted, others in the clear. (This accomodates monitors that are not privy to the encryption key.) For RTP, no additional data structures are required. For RTCP, a 32-bit random number is prepended to the unit before encryption to deter known plaintext attacks. The presence of encryption and the use of the correct key are confirmed by the receiver through header or payload consistency checks. An example of such a consistency check is given in Section ?? (TBD!). [TBD: Alternative methods for encrypting only part of the RTCP information would be to encrypt each RTCP packet separately but not encrypt the first four bytes of the RTCP headers so the length is exposed, or to have an encryption wrapper around encrypted RTCP packets.] The default encryption algorithm is the Data Encryption Standard (DES) algorithm in CBC (cipher block chaining) mode, as described in Section 1.1 of RFC 1423 [5], except that padding to a multiple of 8 octets is indicated as described for the P bit in Section 3.1. The initialization vector is zero because random values are supplied in the RTP header or by the random prefix for RTCP packets. For details on the use of CBC initialization vectors, see [6]. Implementations that support encryption should always support the DES algorithm in CBC mode. As an alternative to encryption at the RTP level as described above, profiles may define additional payload types for encrypted encodings. Those encodings must specify how padding and other aspects of the encryption should be handled. This method allows encrypting only the data while leaving the headers in the clear for applications where that is desired. It may be particularly useful for hardware devices that will hangle both decryption and decoding. 21 References [1] J. Bolot, T. Turletti and I. Wakeman, "Scalable feedback control for multicast video distribution in the Internet", in Proceedings of SIGCOMM '94, London, England, Sept. 1994. Also available as cs.ucl.ac.uk:darpa/multicast-congestion.ps [2] W. Feller, An Introduction to Probability Theory and its Applications, Volume 1, New York, New York: John Wiley and Sons, third ed., 1968. [3] B. Kaliski, "The MD2 Message-Digest algorithm," Request for Comments (Informational) RFC 1319, Internet Engineering Task Force, Apr. 1992. [4] R. Rivest, "The MD5 Message-Digest algorithm," Request for Comments (Informational) RFC 1321, Internet Engineering Task Force, Apr. 1992. [5] D. Balenson, "Privacy enhancement for internet electronic mail: Part III: algorithms, modes, and identifiers," Request for Comments (Proposed Standard) RFC 1423, Internet Engineering Task Force, Feb. 1993. Obsoletes RFC1115. [6] V. L. Voydock and S. T. Kent, "Security mechanisms in high-level network protocols," ACM Computing Surveys, vol. 15, pp. 135--171, June 1983. 22