A Wireshark View of RTP
A Wireshark View of RTP
PROTOCOL (RTP)
December 8, 2014 · by Andrew Prokop · in Real-Time Protocol, RTP, SIP · 7 Comments
Despite the fact that we’ve entered the holiday season, the weeks between Thanksgiving and New Years
are proving to be some of the busiest of the year. Last week I was in Tampa and this week I travel to Salt
Lake City and Phoenix. It doesn’t stop there, though. Next Monday, I fly to Detroit for a three day SIP
engagement. While I am excited that people are that interested in hearing me speak, I am not thrilled
about being away from home for half of December.
Okay, now that I’ve gotten that out of my system, let’s get on to today’s subject – a Wireshark view of
Real-Time Protocol (RTP).
As I am sure you already know, SIP is a signaling protocol. While it is certainly responsible for
establishing media connections, it is not itself a media protocol. It leaves that to Session Description
Protocol (SDP) and Real-Time Protocol. SDP is used to describe media and RTP is used to transmit the
media.
RTP is a datagram protocol that is nearly always carried in a UDP (User Datagram Protocol) packet. This
means that RTP is an unreliable protocol. A sender sends an RTP packet without any assurance that the
packet will ever be received. Unreliable also means that even if a packet is received by the far-end, the
sender will never know if that packet was corrupted during transmission. It makes a best attempt to send
it and hopes that it arrives. There are no retransmissions for lost or dropped packets.
Of course, it doesn’t make sense to retransmit real-time media. Once a voice or video stream has begun,
you can’t go backwards in time. The receiver decodes and plays what it receives as it receives it.
Note: RTP isn’t limited to just SIP. H.323 also uses RTP for transmitting media.
Different codecs and sampling rates play a part in the number of packets that make up a voice or video
conversation, but in all cases, there will be a lot of them. It takes as little as five SIP messages to establish
a voice call, but that call might generate thousands and thousands of RTP packets. The longer the
conversation, the more packets that are sent by all parties.
An RTP message includes the following parameters:
Sequence Number: The sequence number is used to put an identifying number on each RTP packet
sent. The sender will increment the number by one for each new packet.
Timestamp: The timestamp is used to allow the receiver to play back the packets at the appropriate
intervals.
Payload Type: This seven-bit value describes the protocol carried by RTP. For instance, this is where
G.711, G.729, or H.264 are indicated.
RTP Payload: This is the media and the amount of data sent is dependent on the codec and sample
interval. For example, it might be 20 bytes of G.729 when used with a 20ms voice payload size. G.711
with that same sample size of 20ms yields 160 bytes of data. The important thing to realize is that any
codec’s data (G.729a, G.711, iBLC, etc.) will be contained here.
The following is an example of a 20 bytes of G.711 data send during a simple point-to-point audio call.
Wireshark makes understanding the packet extremely simple. It can even play back the RTP packets
allowing you to recreate a captured conversation. Of course, this is because we haven’t encrypted the data
with Secure RTP (SRTP). Wireshark cannot display or play SRTP packets.
That’s really all there is to it. You need a signaling protocol like SIP to establish a media connection, but
RTP does the heavy lifting of moving digitized data between all the parties in a multimedia call.