0% found this document useful (0 votes)
26 views40 pages

CSCI-1680 Transport Layer II Data Over TCP: Rodrigo Fonseca

This document discusses transport layer protocols and TCP in particular. It covers TCP header formats, connection states, and sending data over TCP connections. Specifically, it discusses flow control using window sizes and Nagle's algorithm for when to transmit data. It also covers congestion control, including a brief history of TCP development and how congestion collapses occurred before congestion control was implemented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views40 pages

CSCI-1680 Transport Layer II Data Over TCP: Rodrigo Fonseca

This document discusses transport layer protocols and TCP in particular. It covers TCP header formats, connection states, and sending data over TCP connections. Specifically, it discusses flow control using window sizes and Nagle's algorithm for when to transmit data. It also covers congestion control, including a brief history of TCP development and how congestion collapses occurred before congestion control was implemented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

CSCI-1680

Transport Layer II
Data over TCP

Rodrigo Fonseca

Based  partly  on  lecture  notes  by  David  Mazières,  Phil  Levis,  John  Janno<  
CLOSED
Last Class
Passive open Close
Close

LISTEN

•  Introduction to TCP
–  Header format SYN/SYN + ACKSYN/SYN + ACKSend/SYN
SYN_RCVD SY
–  Connection state diagram
ACK SYN + ACK/ACK
•  Today: sending data
Close/FIN ESTABLISHED

Close/FIN FIN/ACK
FIN_WAIT_1 CL
A FIN/ACK
ACK C
K
+
First Goal

•  We should not send more data than the receiver


can take: flow control
•  When to send data?
–  Sender can delay sends to get larger segments
•  How much data to send?
–  Data is sent in MSS-sized segments
•  Chosen to avoid fragmentation
Flow Control

•  Part of TCP specification (even before 1988)


•  Receiver uses window header field to tell sender
how much space it has
Flow Control
Sending application Receiving application

TCP TCP
LastByteWritten LastByteRead

LastByteAcked LastByteSent NextByteExpected LastByteRcvd


(a) (b)
•  Receiver: AdvertisedWindow
= MaxRcvBuffer – ((NextByteExpected-1) – LastByteRead)
•  Sender: LastByteSent – LastByteAcked <= AdvertisedWindow
EffectiveWindow = AdvertisedWindow – (BytesInFlight)
LastByteWritten – LastByteAcked <= MaxSendBuffer
Flow Control
Sending application Receiving application

TCP TCP
LastByteWritten LastByteRead

LastByteAcked LastByteSent NextByteExpected LastByteRcvd


(a) (b)
•  Advertised window can fall to 0
–  How?
–  Sender eventually stops sending, blocks application
•  Sender keeps sending 1-byte segments until window
comes back > 0
When to Transmit?
•  Nagle’s algorithm
•  Goal: reduce the overhead of small packets
If available data and window >= MSS
Send a MSS segment
else
If there is unAcked data in flight
buffer the new data until ACK arrives
else
send all the new data now
•  Receiver should avoid advertising a window <=
MSS aer advertising a window of 0
Delayed Acknowledgments

•  Goal: Piggy-back ACKs on data


–  Delay ACK for 200ms in case application sends data
–  If more data received, immediately ACK second
segment
–  Note: never delay duplicate ACKs (if missing a
segment)
•  Warning: can interact very badly with Nagle
–  Temporary deadlock
–  Can disable Nagle with TCP_NODELAY
–  Application can also avoid many small writes
Limitations of Flow Control

•  Network may be the bottleneck


•  Signal from receiver not enough!
•  Sending too fast will cause queue overflows,
heavy packet loss
•  Flow control provides correctness
•  Need more for performance: congestion
control
Second goal

•  We should not send more data than the


network can take: congestion control
A Short History of TCP

•  1974: 3-way handshake


•  1978: IP and TCP split
•  1983: January 1st, ARPAnet switches to TCP/IP
•  1984: Nagle predicts congestion collapses
•  1986: Internet begins to suffer congestion collapses
–  LBL to Berkeley drops from 32Kbps to 40bps
•  1987/8: Van Jacobson fixes TCP, publishes seminal
paper*: (TCP Tahoe)
•  1990: Fast transmit and fast recovery added
(TCP Reno)
*  Van  Jacobson.  CongesBon  avoidance  and  control.  SIGCOMM  ’88  
Congestion Collapse
Nagle, rfc896, 1984

•  Mid 1980’s. Problem with the protocol


implementations, not the protocol!
•  What was happening?
–  Load on the network à buffers at routers fill up à
round trip time increases
•  If close to capacity, and, e.g., a large flow
arrives suddenly…
–  RTT estimates become too short
–  Lots of retransmissions à increase in queue size
–  Eventually many drops happen (full queues)
–  Fraction of useful packets (not copies) decreases
TCP Congestion Control

•  3 Key Challenges
–  Determining the available capacity in the first place
–  Adjusting to changes in the available capacity
–  Sharing capacity between flows

•  Idea
–  Each source determines network capacity for itself
–  Rate is determined by window size
–  Uses implicit feedback (drops, delay)
–  ACKs pace transmission (self-clocking)
Dealing with Congestion

•  TCP keeps congestion and flow control windows


–  Max packets in flight is lesser of two
•  Sending rate: ~Window/RTT
•  e key here is how to set the congestion
window to respond to congestion signals
Starting Up

•  Before TCP Tahoe


–  On connection, nodes send full (rcv) window of packets
–  Retransmit packet immediately aer its timer expires
•  Result: window-sized bursts of packets in network
2 CONSERVATION AT EQUILIBRIUM: ROUND-TRIP TIMING 5

Bursts of Packets
Figure 3: Startup behavior of TCP without Slow-start
70


•••
•••

•• •• •
••• •••
60

•• ••
• ••• •••
••• •• ••
••• •• ••• ••
•• ••• ••• •• •••
• ••
•• •• ••• •••
••• ••• •• • •

• ••• •••
50


•• ••• ••
••• •• •••
• • ••
Packet Sequence Number (KB)

••• ••• •• •
••• ••
••
••

• ••
•• ••
•• •• ••• • •
• ••• •
40

•• •••
•• •• ••
••• •• •••
•• •• ••
• •• ••
••• • ••
•• ••• ••
•• •• •
•• •• •••
•• •• ••
30

•• •• ••
••• •••
••• ••
• •
• •• •
••• ••
••• •••
•• •••
•• ••
20

•• •••

••
••• ••
• •••
••• •
••
••
••
••
10

••• •
••
••
••
••
••
•••
•••
••
0

0 2 4 6 8 10
Send Time (sec)

Trace data of the start of a TCP Graph  


conversation between two Sunfrom  
3/50s Van   Jacobson  
running Sun OS and  
3.5Karels,  1988  
Determining Initial Capacity

•  Question: how do we set w initially?


–  Should start at 1MSS (to avoid overloading the network)
–  Could increase additively until we hit congestion
–  May be too slow on fast network
•  Start by doubling w each RTT
–  en will dump at most one extra window into network
–  is is called slow start
•  Slow start, this sounds quite fast!
–  In contrast to initial algorithm: sender would dump entire
flow control window at once
2 CONSERVATION AT EQUILIBRIUM: ROUND-TRIP TIMING 6

160
Startup behavior
Figure 4: Startup with
behavior of TCP Slow Start
with Slow-start

••
•••••
•••
•• ••
140


••• •• •
•••••
•••
•••••
••
•••
120

••
••
•••••
•••
Packet Sequence Number (KB)

••• •
•••
100

•••••
••
•• •••
••
•••••
•••
80

•••••
••
• •••

•••
•••••
60

••
•• •••
•••
•••••
•••
••••
40

• •
• ••
••••
•••

•••••
• ••
••••
20


•• ••



•• ••
••• •••
• • •
0

0 2 4 6 8 10
Send Time (sec)

Same conditions as the previous figure (same time of day, same Suns, same network path,
Slow start implementation

•  Let w be the size of the window in bytes


–  We have w/MSS segments per RTT
•  We are doubling w aer each RTT
–  We receive w/MSS ACKs each RTT
–  So we can set w = w + MSS on every ACK
•  At some point we hit the network limit.
–  Experience loss
–  We are at most one window size above the limit
–  Remember window size (ssthreah) and reduce
window
Dealing with Congestion

•  Assume losses are due to congestion


•  Aer a loss, reduce congestion window
–  How much to reduce?
•  Idea: conservation of packets at equilibrium
–  Want to keep roughly the same number of packets network
–  Analogy with water in fixed-size pipe
–  Put new packet into network when one exits
How much to reduce window?

•  Crude model of the network


–  Let Li be the load (# pkts) in the network at time I
–  If network uncongested, roughly constant Li = N
•  What happens under congestion?
–  Some fraction γ of packets can’t exit the network
–  Now Li = N + γLi-1, or Li ≈ giL0
–  Exponential increase in congestion
•  Sources must decrease offered rate exponentially
–  i.e, multiplicative decrease in window size
–  TCP chooses to cut window in half
How to use extra capacity?
•  Network signals congestion, but says nothing of
underutilization
–  Senders constantly try to send faster, see if it works
–  So, increase window if no losses… By how much?
•  Multiplicative increase?
–  Easier to saturate the network than to recover
–  Too fast, will lead to saturation, wild fluctuations
•  Additive increase?
–  Won’t saturate the network
–  Remember fairness (third challenge)?
Chiu Jain Phase Plots

Fair:  A  =  B  
Flow  Rate  B  

Goal:  fair  and  efficient!  

Efficient:  A+B  =  C  

Flow  Rate  A  
Chiu Jain Phase Plots

Fair:  A  =  B  
MI  MD  
Flow  Rate  B  

Efficient:  A+B  =  C  

Flow  Rate  A  
Chiu Jain Phase Plots

Fair:  A  =  B  
AI  AD  
Flow  Rate  B  

Efficient:  A+B  =  C  

Flow  Rate  A  
Chiu Jain Phase Plots

Fair:  A  =  B  
AI  MD  
Flow  Rate  B  

Efficient:  A+B  =  C  

Flow  Rate  A  
AIMD Implementation
•  In practice, send MSS-sized segments
–  Let window size in bytes be w (a multiple of MSS)
•  Increase:
–  Aer w bytes ACKed, could set w = w + MSS
–  Smoother to increment on each ACK
•  w = w + MSS * MSS/w
•  (receive w/MSS ACKs per RTT, increase by MSS/(w/MSS) for
each)
•  Decrease:
–  Aer a packet loss, w = w/2
–  But don’t want w < MSS
–  So react differently to multiple consecutive losses
–  Back off exponentially (pause with no packets in flight)
AIMD Trace

•  AIMD produces AIMD


sawtooth pattern of window
trace
size
–  Always trace
• Window produces
probing sawtooth pattern:
available bandwidth
70
60
50
40
KB

30
20
10

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Time (seconds)
Putting it together

•  TCP has two states: Slow Start (SS) and Congestion


Avoidance (CA)
•  A window size threshold governs the state transition
–  Window <= threshold: SS
–  Window > threshold: congestion avoidance
•  States differ in how they respond to ACKs
–  Slow start: w = w + MSS
–  Congestion Avoidance: w = w + MSS2/w (1 MSS per RTT)
•  On loss event: set w = 1, slow start
How to Detect Loss
•  Timeout
•  Any other way?
–  Gap in sequence numbers at receiver
–  Receiver uses cumulative ACKs: drops => duplicate
ACKs
•  3 Duplicate ACKs considered loss

•  Which one is worse?


Putting it all together

cwnd  

Timeout  

Timeout  
AIMD  
AIMD  

ssthresh  

Slow   Slow   Slow   Time  


Start   Start   Start  
RTT

•  We want an estimate of RTT so we can know a packet


was likely lost, and not just delayed
•  Key for correct operation
•  Challenge: RTT can be highly variable
–  Both at long and short time scales!
•  Both average and variance increase a lot with load
•  Solution
–  Use exponentially weighted moving average (EWMA)
–  Estimate deviation as well as expected value
–  Assume packet is lost when time is well beyond reasonable
deviation
Originally

•  EstRTT = (1 – α) × EstRTT + α × SampleRTT


•  Timeout = 2 × EstRTT
•  Problem 1:
–  in case of retransmission, ack corresponds to which send?
–  Solution: only sample for segments with no retransmission
•  Problem 2:
–  does not take variance into account: too aggressive when
there is more load!
Jacobson/Karels Algorithm (Tahoe)

•  EstRTT = (1 – α) × EstRTT + α × SampleRTT


–  Recommended α is 0.125
•  DevRTT = (1 – β) × DevRTT + β | SampleRTT – EstRTT |
–  Recommended β is 0.25
•  Timeout = EstRTT + 4 DevRTT
•  For successive retransmissions: use exponential
backoff
2 CONSERVATION AT EQUILIBRIUM: ROUND-TRIP TIMING 7

Old RTT Estimation


12
10
Figure 5: Performance of an RFC793 retransmit timer

• ••

••
8


RTT (sec.)



6



•••• •
• •••
•• •
••• ••
• ••
4

• •• • ••• •• • •
• • • •
• •
• • • • •
•• •
•• • • • •
• • •• •••••• • •
• • • ••• •• •
•• • • •• •
2

• ••• •
• •• • •
0

0 10 20 30 40 50 60 70 80 90 100 110
Packet
3 ADAPTING TO THE PATH: CONGESTION AVOIDANCE 8

Tahoe RTT Estimation


Figure 6: Performance of a Mean+Variance retransmit timer
12
10

• ••

••
8


RTT (sec.)



6



•••• •
• •••
•• •
••• ••
• ••
4

• •• • ••• •• • •
• • • •
• •
• • • • •
•• •
•• • • • •
• • •• •••••• • •
• • • ••• •• •
•• • • •• •
2

• ••• •
• •• • •
0

0 10 20 30 40 50 60 70 80 90 100 110
Packet

Same data as above but the solid line shows a retransmit timer computed according to the
Slow start every time?!

•  Losses have large effect on throughput


•  Fast Recovery (TCP Reno)
–  Same as TCP Tahoe on Timeout: w = 1, slow start
–  On triple duplicate ACKs: w = w/2
–  Retransmit missing segment (fast retransmit)
–  Stay in Congestion Avoidance mode
Fast Recovery and Fast Retransmit

cwnd  

AI/MD  

Slow  Start  

Fast  retransmit  

Time  
3 Challenges Revisited
•  Determining the available capacity in the first
place
–  Exponential increase in congestion window
•  Adjusting to changes in the available capacity
–  Slow probing, AIMD
•  Sharing capacity between flows
–  AIMD
•  Detecting Congestion
–  Timeout based on RTT
–  Triple duplicate acknowledgments
•  Fast retransmit/Fast recovery
–  Reduces slow starts, timeouts
Next Class

•  More Congestion Control fun


•  Cheating on TCP
•  TCP on extreme conditions
•  TCP Friendliness
•  TCP Future

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy