



In this new blog series, we cover RTCP, the companion protocol to RTP.
Each RTP stream should also have a parallel RTCP stream providing information about and control of that stream. Like RTP this is also a packet-based protocol, though its bandwidth tends to be much lower than the corresponding RTP stream, as it is not conveying any media itself. RTCP is defined alongside RTP in RFC 3550.
The core RTCP messages are periodic and informational, providing feedback about the RTP stream. However, numerous extensions have been added to RTCP since its inception, some of which request the far end to take certain actions. These extensions are normally negotiated via the rtcp-fb SDP attribute (see that section of this blog).
Note that, like RTP, RTCP sent across UDP is prone to packet loss; the receipt of any given RTCP packet cannot be guaranteed. This is generally not problematic for periodic feedback messages but poses a design and implementation issue for one-off instructions to the far end. In most cases this is solved by monitoring to ensure the action in question has been taken.
The RTCP format
RTCP defines a number of different packet types for conveying different types of information, but they all share the same basic 4-byte header:
The format of this header is as follows:
- Version (V): 2 bits representing the version. As with RTP, the value is always 2.
- Padding (P): 1 bit indicating additional bytes of padding at the end of the packet. If ‘1’ then there are one or more bytes of padding after the payload – as with RTP the final byte is a count of how many padding bytes there are (including itself).
- Item Count (IC): 5 bits counting the number of items in the packet, if relevant to the packet type. Different RTCP packet types use this for different purposes – see the individual packet type descriptions.
- Payload Type (PT): 8 bit value, with fixed values for each packet type.
- Length: 16 bits corresponding to the length of the packet in 32-bit words, not including the initial 32-bit header, but including any padding. As such, 0 is a valid length representing a packet that includes only this header.
Key RTCP packet types
RFC 3550 defines five RTCP packet types:
- Sender Report (SR): Statistics about media streams being transmitted and received, sent by entities that are sending media and may also be receiving it.
- Receiver Report (RR): Statistics about media streams being received, sent by entities that are only receiving media. Very similar in format to the SR, but without sender information.
- Source Description (SDES): Information about the media stream, some of which is important for the synchronization of streams.
- Goodbye (BYE): An announcement that a stream is ending.
- Application-specific (APP): An application-defined message that can be used to extend the protocol and provide additional functionality.
A number of further RTCP packet types have since been standardized, of which the most widely adopted in video conferencing by far is:
- Feedback (FB): Immediate feedback messages from receivers to senders to allow for adaption and repair to improve problematic media streams.
A receiver should ignore RTCP packets with a payload type it does not recognize.
Sender Report (SR)
The format of an SR in the RFC can appear intimidating, but in practice it’s simply made up of a few logical components:
After the common header, where the Payload Type for Sender Reports is 200 and the Item Count is equal to the number of Report Blocks in the packet, there is a 32-bit Reporter SSRC field. This SSRC should match the current SSRC of the RTP stream being sent. There is then a Sender Info Block and 0-31 Report Blocks, one for each RTP stream being received on this session. If this is a sendonly media session, then no report blocks should be included.
Sender Info
The Sender Info block contains the following fields:
- NTP timestamp: A 64 bit field that contains the time at which the report was sent in the format of the Network Time Protocol (NTP), which is seconds such 00:00 UTC on 1 January 1970, available via many standard libraries. This is often referred to as the wallclock time. The first 32 bits represents the number of seconds, while the second 32 bits represents fractions of a second. RTCP does not define a required precision for the fractional portion, but it should be precise to at least the millisecond to be useful for purposes such as calculating round-trip times. Note that, while it uses the NTP format, the information need not come from NTP; since the value is mostly used for relative calculations absolute accuracy is not particularly important, so long as an implementation uses a consistent system clock for the sender report on all its streams.
- RTP timestamp: A 32 bit field that contains the time at which the report was sent, but in the units/offset of the RTP media stream. This allows for an association to be made between the RTP packets and wallclock time. Note that implementations should calculate this for the Sender Report, rather than just using the most recent RTP packet timestamp, as the RTCP packet and RTP packets will not be sent concurrently.
- Sender’s packet count: A 32 bit field containing the total number of RTP media packets sent with the current SSRC since transmission began. As such if the SSRC changes the count should reset to 0. Note that this is the total count, not the count since the previous SR.
- Sender’s octet count: A 32 bit field containing the total bytes of RTP payload data (not including headers, padding, etc) sent with the current SSRC since transmission began. If the SSRC changes the count should reset to 0. Again, note that this is the total count, not the count since the previous SR.
Report Block
One Report Block should be included per RTP stream being received as part of the RTP session. Note that if one stream is being received and its SSRC changes the next SR/RR should still only include a single report block for that stream. If no packets for a stream have been received since the last report block was sent, the SR/RR should not contain a report block for it. A report block contains the following fields:
- SSRC: A 32 bit field containing the current SSRC of the RTP stream to which this report block corresponds.
- Fraction lost: An 8 bit field containing the fraction of packets lost of the RTP stream since the last report block about the stream was sent (or since the start of the call, if this is the first report sent), calculated based on the sequence number of packets received. The format corresponds to multiplying the fraction by 256 and rounding down to an integer. Note that RFC 3550 includes a sample algorithm in appendix A-3, and implementers are recommended to use it.
- Cumulative number of packets lost: A signed 24-bit field containing the total number of packets of the RTP stream lost since the stream began (not since the last report block, unlike the previous field) as determined by sequence numbers. The fact that the field is signed means that if the number is greater than 8388607 (0x7FFFFF) it should be reported as 8388607, not wrap around. As with the fraction lost field, appendix A-3 of RFC 3550 contains a solid method for calculating this.
- Extended highest sequence number received: A 32 bit field containing the highest 16-bit RTP sequence number received extended by an additional 16 bits. This means that the 16 bit sequence number in the RTP is augmented with a 16 bit rollover counter, which starts at 0 and is incremented by 1 each time the RTP sequence number rolls over from 65535 back to 0. Note that, when picking the highest sequence number, the whole extended value needs to be considered (as a sequence number of 1 with a rollover counter of 5 is larger than a sequence number of 65532 with a rollover counter of 4). Internally an implementation can either store the extended value directly, or store the received RTP value and rollover counter separately and just write the 16-bit rollover counter into the top 16 bits of the value and the RTP sequence number into the bottom 16 bits.
- Interarrival jitter: A 32 bit field containing the amount of variation in the time between the arrival of RTP packets, also known as jitter. The units and methodology for calculating this are defined in RFC 3550 along with a sample algorithm in appendix A-8, and it is highly recommended that implementations use this or a variant derived from it.
- Last Sender Report timestamp: A 32 bit field containing the middle 32 bits of the most recent NTP timestamp from the Sender Info associated with this RTP stream. The reason it is the middle 32 bits of the 64 bit timestamp is that they are the most relevant – the top 16 bits only increment once every 18 hours, while the bottom 16 are well into the microsecond range, so the middle 32 bits provide most of the relevant information. If no Sender Info has been received for the stream this value should be 0.
- Delay since last Sender Report: A 32 bit field containing the time, in units of 1/65536ths of a second, between receiving the last Sender Info associated with this RTP stream and sending this Report Block. If no Sender Info has been received for the stream this value should be 0.
Using Report Block Data
Sender Reports can seem highly finicky to implement, and in less mature implementations it may seem tempting to skip sending them and/or processing them when received. However, they offer considerable value in a range of ways.
Firstly, they are a key diagnostic tool when media is having issues, or not working at all. Implementations are highly recommended to make the information they contain easily available. Some key values such as the current fraction of packet loss for each media stream being sent and received should be visible to the user alongside other information such as resolution, framerate and bitrate.
These and the other more detailed values should also be logged periodically, either directly at the rate at which SR/RR packets arrive, or less frequently in some aggregated fashion if logging capacity is limited. These kind of logs are vital for diagnosing customer-reported issues around media quality, and it is recommended that, along with the raw numbers, the system itself either produces graphs of all these values, or a tool is created to parse the logs and create graphs when diagnosing customer issues. The higher the granularity of the data, the easier it will be to diagnose issues that happen for brief periods, so avoid aggregating the data if at all possible.
Along with the values directly reported, a sender can use the Delay since last SR value to calculate the round-trip time (RTT) between them and the receiver. To calculate the RTT, take the arrival time of the Report Block and subtract both the Last SR timestamp and Delay since last SR values. Note that the latter two share the same units (1/65536ths of a second), which can help simplify an implementation, particularly if the arrival time is stored in the same format of the middle 32 bits of the NTP time.
The logs should also include whether or not Report Blocks were being received, along with the number of RTCP packets. This allows someone diagnosing a call to determine if the number of packets sent that were lost in transmission is recorded as 0 because all packets were arriving, or because no Report Blocks were received and hence there is no data. When no Report Blocks are being received for a stream being sent out, it indicates that either the media being sent is not arriving at the far end, or that RTCP being sent in response is not reaching the media originator. By checking whether RTCP packets are being received, a person diagnosing a problematic call can determine which is the case.
Note that some implementations that receive media packets but are unable to process any of them (due to encryption failures, or lack of a video keyframe) may not generate Report Blocks.
Also note that RFC 3611 defines a new RTCP message type, Extended Reports (XR), which can include Report Blocks that allow for much more fine-grained detail. These have not been widely adopted in the industry, but as an implementor, if you are looking to add more detailed reporting between your own devices, you should be aware of this specification.
The next blog will cover the Receiver Report, and other reports that allow for the synchronization of media streams (e.g., lipsync).
Source link
No Comment! Be the first one.