6Transport-Part2
6Transport-Part2
2
Recall: Components of a solution for reliable
transport
v Checksums (for error detection)
v Timers (for loss detection)
v Acknowledgments
§ Cumulative
§ Selective
v Sequence numbers (duplicates, windows)
v Sliding Windows (for efficiency)
§ Go-Back-N (GBN)
§ Selective Repeat (SR)
3
What does TCP do?
Many of our previous ideas, but some key differences
vChecksum
4
TCP Header
Sequence number
Acknowledgment
Computed
over header HdrLen 0 Flags Receive window
and data
Checksum Urgent pointer
(SAME AS UDP)
Options (variable)
Data
5
What does TCP do?
6
TCP “Stream of Bytes” Service ..
Application @ Host A
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
Application @ Host B
7
.. Provided Using TCP “Segments”
Host A
Byte 0
Byte 1
Byte 2
Byte 3
TCP Data
Host B
Byte 0
Byte 1
Byte 2
Byte 3
Byte 80
8
TCP Maximum Segment Size
MTU
IP Data
TCP Data (segment) TCP Hdr IP Hdr
v IP packet
§ No bigger than Maximum Transmission Unit (MTU) of link layer
§ E.g., up to 1500 bytes with Ethernet
v TCP packet
§ IP packet with a TCP header and data inside
§ TCP header ³ 20 bytes long
v TCP segment
§ No more than Maximum Segment Size (MSS) bytes
§ E.g., up to 1460 consecutive bytes from the stream
§ MSS = MTU – 20 (minimum IP header ) – 20 ( minimum TCP header )
9
Sequence Numbers
ISN (initial sequence number)
k bytes
Host A
Sequence number
= 1st byte in segment =
ISN + k
Sequence numbers:
• byte stream “number” of first byte in
segment’s data
10
Sequence & Ack Numbers
ISN (initial sequence number)
k
Host A
Host B
11
TCP Header
Data
12
What does TCP do?
14
ACKing and Sequence Numbers
15
An Example
Host A Host B
ISN=100
Seq=???, Data=50
Seq=100, Data=50
17
Piggybacking Piggybacking
Client Server Client Server
v So far, we’ve assumed
• So far, we’ve distinct
assumed
“sender” and “receiver”
distinct roles
“sender” and
“receiver” roles
v Usually both sides of a
connection (i.e. the application
• Insend
processes) some
reality, data both
usually
sides of a connection
send some data
– request/response is a
… …
common pattern Without With
Piggybacking Piggybacking
18
Example
Note: Connection establishment not shown. Alice’s end point selects the initial
sequence number as 0 while Bob’s end point selects the initial sequence number as 10
19
Another Example
Note: Connection establishment not shown. Alice’s end point selects the initial
sequence number as 0 while Bob’s end point selects the initial sequence number as 10
Seq
= ?, 2
KBy
ACK tes of Seq = 2149
=? data
ACK =1024 + 1024 = 2048
21
TCP seq. numbers, ACKs
Host A Host B
Seq=g, ACK =h
22
What does TCP do?
23
Loss with cumulative ACKs
v Assume the fifth packet (seq. no. 500) is lost, but no others
25
TCP round trip time, timeout
26
TCP round trip time, timeout
Q: how to set TCP timeout Q: how to estimate RTT?
value? § SampleRTT:measured time
§ longer than RTT, but RTT varies! from segment transmission until
ACK receipt
§ too short: premature timeout,
• ignore retransmissions
unnecessary retransmissions
§ SampleRTT will vary, want
§ too long: slow reaction to estimated RTT “smoother”
segment loss • average several recent
measurements, not just current
SampleRTT
27
TCP round trip time, timeout
EstimatedRTT = (1- a )*EstimatedRTT + a *SampleRTT
§ exponential weighted moving average (EWMA)
§ influence of past sample decreases exponentially fast
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
§ typical value: a = 0.125 350
RTT (milliseconds)
300
200
sampleRTT
150
EstimatedRTT
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconds)
time (seconnds)
Practice Problem:
http://wps.pearsoned.com/ecs_kurose_compnetw_6/216/55463/14198700.cw/index.html
29
TCP round trip time, timeout
(EstimatedRTT+4*DevRTT)
RTT
EstimatedRTT
DevRTT
Origin Origin
al Tran al Tran
smissi smissi
on on
ACK
Retra Retra
nsmis SampleRTT nsmis
sion sion
SampleRTT
ACK
31
PUTTING IT
32
Note: You may neglect delayed ACKs in the
TCP ACK generation [RFC 1122, RFC 2581] exam unless explicitly told to consider it
33
TCP: retransmission scenarios
Host A Host B Host A Host B
SendBase=92
Seq=92, 8 bytes of data Seq=92, 8 bytes of data
timeout
ACK=100
X
ACK=100
ACK=120
SendBase=120
timeout
ACK=100
timeout
X
ACK=120 ACK=?
ACK = 92
Seq=120, 15 bytes of data
36
TCP fast retransmit
Host A Host B
TCP fast retransmit
if sender receives 3 additional
ACKs for same data (“triple Seq=92
Seq=1
, 8 byte
s of da
ta
duplicate ACKs”), resend unACKed 00, 20
bytes
of data
segment with smallest seq # X
§ likely that unACKed segment lost,
=100
so don’t wait for timeout ACK
=100
timeout
ACK
=100
ACK
=100
Receipt of three duplicate ACKs ACK
37
Quiz: TCP Sequence Numbers?
A TCP Sender is about to send a segment of size 100 bytes with
sequence number 1234 and ack number 436. What is the highest
sequence number up to (and including) which this sender has received
from the receiver?
A.1233
B.436
C.435
D.1334
E.536
38
Quiz: TCP Sequence Numbers?
39
Quiz: TCP Sequence Numbers?
40
Transport Layer Outline
3.1 transport-layer services 3.5 connection-oriented
3.2 multiplexing and transport: TCP
demultiplexing § segment structure
3.3 connectionless transport: § reliable data transfer
UDP § flow control
§ connection management
3.4 principles of reliable data
transfer 3.6 principles of congestion
control
3.7 TCP congestion control
41
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers
TCP
code
Network layer
delivering IP datagram
payload into TCP
socket buffers IP
code
from sender
42
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers
TCP
code
Network layer
delivering IP datagram
payload into TCP
socket buffers IP
code
from sender
43
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers
TCP
code
receive window
flow control: # bytes
receiver willing to accept IP
code
from sender
44
TCP flow control
application
Q: What happens if network Application removing
process
layer delivers data faster than data from TCP socket
buffers
application layer removes TCP socket
data from socket buffers? receiver buffers
TCP
flow control code
45
TCP flow control
§ TCP receiver “advertises” free buffer
space in rwnd field in TCP header to application process
• RcvBuffer size set via socket
options (typical default is 4096 bytes) RcvBuffer buffered data
• many operating systems autoadjust
rwnd free buffer space
RcvBuffer
§ sender limits amount of unACKed
(“in-flight”) data to received rwnd TCP segment payloads
46
TCP flow control
flow control: # bytes receiver willing to accept
47
TCP flow control
v What if rwnd = 0?
§ Sender would stop sending data
§ Eventually the receive buffer would have space when the application process
reads some bytes
§ But how does the receiver advertise the new rwnd to the sender?
v Sender keeps sending TCP segments with one data byte to the
receiver
v These segments are dropped but acknowledged by the receiver
with a zero-window size
v Eventually when the buffer empties, non-zero window is advertised
48
Transport Layer Outline
3.1 transport-layer services 3.5 connection-oriented
3.2 multiplexing and transport: TCP
demultiplexing § segment structure
3.3 connectionless transport: § reliable data transfer
UDP § flow control
§ connection management
3.4 principles of reliable data
transfer 3.6 principles of congestion
control
3.7 TCP congestion control
49
TCP connection management
before exchanging data, sender/receiver “handshake”:
§agree to establish connection (each knowing the other willing to establish connection)
§agree on connection parameters (e.g., starting seq #s)
application application
network network
51
2-way handshake scenarios
choose x
req_conn(x)
ESTAB
acc_conn(x)
ESTAB
data(x+1) accept
ACK(x+1) data(x+1)
connection
x completes
No problem!
52
2-way handshake scenarios
choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)
ESTAB
req_conn(x)
connection
client x completes server
terminates forgets x
ESTAB
acc_conn(x)
Problem: half open
connection! (no client)
53
2-way handshake scenarios
choose x
req_conn(x)
ESTAB
retransmit acc_conn(x)
req_conn(x)
ESTAB
data(x+1) accept
data(x+1)
retransmit
data(x+1)
connection
x completes server
client
terminates forgets x
req_conn(x)
ESTAB
data(x+1) accept
data(x+1)
Problem: dup data
accepted!
TCP 3-way handshake
SYN Consumes 1 Sequence No
Server state
Client state
serverSocket = socket(AF_INET,SOCK_STREAM)
serverSocket.bind((‘’,serverPort))
serverSocket.listen(1)
clientSocket = socket(AF_INET, SOCK_STREAM) connectionSocket, addr = serverSocket.accept()
LISTEN
clientSocket.connect((serverName,serverPort)) LISTEN
choose init seq num, x
send TCP SYN msg
SYNSENT SYNbit=1, Seq=x
choose init seq num, y
send TCP SYNACK
msg, acking SYN SYN RCVD
SYNbit=1, Seq=y
ACKbit=1; ACKnum=x+1
received SYNACK(x)
ESTAB indicates server is live;
send ACK for SYNACK;
this segment may contain ACKbit=1, ACKnum=y+1
client-to-server data
Seq =x+1 received ACK(y)
indicates client is live
ESTAB
55
What if the SYN Packet Gets Lost?
v Suppose the SYN packet gets lost
§ Packet is lost inside the network, or:
§ Server discards the packet (e.g., it’s too busy)
58
TCP: closing a connection
v client, server each close their side of connection
§ send TCP segment with FIN bit = 1
v respond to received FIN with ACK
§ on receiving FIN, ACK can be combined with own FIN
v simultaneous FIN exchanges can be handled
59
Normal Termination, One at a Time
FIN Consumes 1 Sequence No
LAST_ACK
FINbit=1, seq=y
TIMED_WAIT can no longer
send data
ACKbit=1; ACKnum=y+1
timed wait
for 2*max CLOSED
segment lifetime
CLOSED
TIMED_WAIT: Can retransmit ACK if last ACK is lost
60
Normal Termination, Both Together
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FINbit=1, seq=x
send but can
receive data CLOSE_WAIT
ACKbit=1; ACKnum=x+1 FIN + ACK
wait for server together LAST_ACK
TIMED_WAIT close FINbit=1, seq=y can no longer
send data
CLOSED
61
Simultaneous Closure
client state server state
ESTAB ESTAB
clientSocket.close()
FIN_WAIT_1 can no longer FIN_WAIT_1
can no longer send data
FINbit=1, seq=x
send but can
receive data Send Ack
FINbit=1, seq=y CLOSING
wait for server
close
TIMED_WAIT
CLOSING ACKbit=1,
ACKnum=x+1
TIMED_WAIT
ACKbit=1,
ACKnum=y+1
CLOSED
CLOSED
62
Abrupt Termination
B
SYN A
A CK
RST
A CK
RST
SYN
Data
Data
CK
A
time
64
TCP SYN Cookie
v On receipt of SYN, server does not create connection state
v It creates an initial sequence number (init_seq) that is a hash of
source & dest IP address and port number of SYN packet (secret
key used for hash)
§ Replies back with SYN ACK containing init_seq
§ Server does not need to store this sequence number
v If original SYN is genuine, an ACK will come back
§ Same hash function run on the same header fields to get the initial sequence
number (init_seq)
§ Checks if the ACK is equal to (init_seq+1)
§ Only create connection state if above is true
v If fake SYN, no harm done since no state was created
65
Quiz: TCP Connection Management?
66
Quiz: TCP Connection Management
Assume that one end point of the TCP connection sends a FIN
segment. If it never receives an ACK, what should it do?
C.Transmit an ACK
D.Start crying
67
Transport Layer: Outline
3.1 transport-layer services 3.5 connection-oriented
3.2 multiplexing and transport: TCP
demultiplexing § segment structure
3.3 connectionless transport: § reliable data transfer
UDP § flow control
§ connection management
3.4 principles of reliable data
transfer 3.6 principles of congestion
control
3.7 TCP congestion control
68
Principles of congestion control
congestion:
v informally: “too many sources sending too much data too
fast for network to handle”
v different from flow control!
v manifestations:
§ lost packets (buffer overflow at routers)
§ long delays (queueing in router buffers)
v a top-10 problem!
69
Congestion Congestion
Ugh. I so
can’t deal
Trash with this right
now!
Router
Router’s buffer.
Incoming rate is faster than
outgoing link can support.
70
Congestion Collapse
Congestion Collapse
… Link A Link B
…
…
71
Congestion
Congestion Collapse Collapse
… Link A Link B
…
One sender starts,
but there’s still
capacity at link A.
…
S1
72
Congestion Collapse
Congestion Collapse
S2
… Link A Link B
…
…
S1
73
Congestion Collapse
Congestion Collapse
S2
… Unrelated traffic
passes through and
congests link B.
… Link A Link B
…
…
S1
74
Congestion Collapse
Congestion Collapse
S2’s traffic is being dropped at
S2 Link B, so it starts retransmitting
on top of what it was sending.
…
… Link A Link B
…
…
S1
(This is very bad. S2 is now sending lots of traffic over link A
that has no hope of crossing link B.) 75
Congestion Collapse
Congestion Collapse
S2
… Link A Link B
…
Increased traffic from S2
causes Link A to become
congested. S1 starts
…
retransmitting.
S1
76
Congestion Collapse
Congestion Collapse
S2
…
Congestion
Link A Link B
…
propagates
backwards…
…
S1
77
Without congestion control
congestion:
v Increases delays
§ If delays > RTO, sender retransmits
v Increases loss rate
§ Dropped packets also retransmitted
v Increases retransmissions, many unnecessary
§ Wastes capacity of traffic that is never delivered
§ Increase in load results in decrease in useful work done
v Increases congestion, cycle continues …
78
Cost of Congestion
packet
knee cliff
loss
Throughput
v Knee – point after which
§ Throughput increases slowly congestion
§ Delay increases fast collapse
Load
Delay
v Cliff – point after which
§ Throughput starts to drop to zero
(congestion collapse)
§ Delay approaches infinity
Load
79
Congestion Collapse
80
Approaches towards congestion control
81
Transport Layer: Outline
3.1 transport-layer services 3.5 connection-oriented
3.2 multiplexing and transport: TCP
demultiplexing § segment structure
3.3 connectionless transport: § reliable data transfer
UDP § flow control
§ connection management
3.4 principles of reliable data
transfer 3.6 principles of congestion
control
3.7 TCP congestion control
82
TCP’s Approach in a Nutshell
v TCP connection maintains a window
§ Controls number of packets in flight
cwnd
rate ~
~ bytes/sec
RTT
last byte last byte
ACKed sent, not- sent
yet ACKed
(“in-
flight”)
84
CWND
85
Two Basic Questions
86
Detection Congestion: Infer Loss
v Duplicate ACKs: isolated loss
§ dup ACKs indicate network capable of delivering some segments
87
RECAP: TCP fast retransmit (dup acks)
Host A Host B
timeout ACK=100
ACK=100
ACK=100
ACK=100
Seq=100, 20 bytes of data
v Basic structure:
§ Upon receipt of ACK (of new data): increase rate
§ Upon detection of loss: decrease rate
89
TCP Slow Start (Bandwidth discovery)
Host A Host B
v when connection begins, increase
rate exponentially until first loss
event: one segm
ent
RTT
§ initially cwnd = 1 MSS
§ double cwnd every RTT (all ACKs) two segm
ents
§ Simpler implementation achieved by
incrementing cwnd for every ACK
received four segm
ents
§ cwnd += 1 for each ACK
v summary: initial rate is slow but
ramps up exponentially fast
time
90
Adjusting to Varying Bandwidth
v Slow start gave an estimate of available bandwidth
91
AIMD
v approach: sender increases transmission rate (window size), probing for usable
bandwidth, until another congestion event occurs
§ additive increase: increase cwnd by 1 MSS every RTT until loss detected
• For each successful RTT (all ACKs), cwnd = cwnd +1 (in multiples of MSS)
• Simple implementation: for each ACK, cwnd = cwnd + 1/cwnd (since
there are cwnd/MSS packets in a window)
§ multiplicative decrease: cut cwnd in half after loss
time 92
Leads to the TCP “Sawtooth”
Window Loss
Exponential t
“slow start”
93
Slow-Start vs. AIMD
v When does a sender stop Slow-Start and start Congestion Avoidance?
94
Implementation
v State at sender
§ CWND (initialized to a small constant)
• the slides use multiple of MSS
§ ssthresh (initialized to a large constant)
§ [Also dupACKcount and timer, as before]
v Events
§ ACK (new data)
§ dupACK (duplicate ACK for old data)
§ Timeout
95
Event: ACK (new data)
v If CWND < ssthresh • Hence after one RTT (All ACKs
§ CWND += 1 with no drops):
CWND = 2xCWND
96
Event: ACK (new data)
v If CWND < ssthresh
Slow start phase
§ CWND += 1
v Else
§ CWND = CWND + “Congestion
1/CWND Avoidance” phase
(additive increase)
• Hence after one RTT (All ACKs
with no drops):
CWND = CWND + 1
97
Event: dupACK
v dupACKcount ++
98
Event: TimeOut
v On Timeout
§ ssthresh ß CWND/2
§ CWND ß 1
99
Example
Window
Timeout SSThresh
Fast
Retransmission Set to Here
In the figure how many slow start intervals can you identify?
A.0
B.1
C.2
D.3
E.4
107
Quiz: TCP Timeout
108