0% found this document useful (0 votes)
27 views150 pages

Low 201108 TCP Cambridge

This document provides an overview and tutorial on congestion control for the Internet. It begins with an introduction to mathematical models of congestion control and the goal of tightly integrating theory, design, and experimentation in algorithm design. The document then covers specific congestion control protocols like TCP Tahoe and Reno, how they work, and variants like Vegas and FAST. It also discusses active queue management techniques and concludes by outlining the agenda for the tutorial.

Uploaded by

Thinh Tran Van
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views150 pages

Low 201108 TCP Cambridge

This document provides an overview and tutorial on congestion control for the Internet. It begins with an introduction to mathematical models of congestion control and the goal of tightly integrating theory, design, and experimentation in algorithm design. The document then covers specific congestion control protocols like TCP Tahoe and Reno, how they work, and variants like Vegas and FAST. It also discusses active queue management techniques and concludes by outlining the agenda for the tutorial.

Uploaded by

Thinh Tran Van
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 150

Congestion Control & Optimization

Steven Low
netlab.CALTECH.edu
Cambridge 2011

Goal of tutorial
Top-down summary of congestion
control on Internet
Introduction to mathematical models
of congestion control
Illustration of theory-guided CC
algorithm design

Theory-guided design
Tight integration of theory, design, experiment
Analysis done at design time, not after

Theory does not replace intuitions or heuristics


Refines, validates/invalidates them

Theory provides structure and clarity


Guides design
Suggests ideas and experiments
Explores boundaries that are hard to experiment

Theory-guided design
Integration of theory, design, experiment
can be very powerful
Each needs the other
Combination much more than sum

Tremendous progress in the last decade


Not as impossible as most feared
Very difficult; but worth the effort
Most critical: mindset

How to push theory-guided design


approach further ?

Agenda
9:00 Congestion control protocols
10:00 break
10:15 Mathematical models
11:15 break
11:30 Advanced topics
12:30 lunch

CONGESTION CONTROL
PROTOCOLS

Congestion control protocols


Why congestion control?
Where is CC implemented?
Window control mechanism
CC protocols and basic structure
Active queue management (AQM)

Congestion collapse
October 1986, the first congestion collapse on the
Internet was detected
Link between UC Berkeley and LBL
400 yards, 3 hops, 32 Kbps
throughput dropped to 40 bps
factor of ~1000 drop!

WHY ?

1988, Van Jacobson proposed TCP congestion


control
throughput

load

Network milestones
1969

1974

81 83

1988

1991

Tahoe

HTTP

1996

1999

2003 2006

TCP/IP

ARPANet

TCP

Backbone speed:

Cutover
to TCP/IP

50-56kbps, ARPANet
T1
NSFNet
T3, NSFNet
OC12
MCI
Network is exploding

OC48
vBNS
OC192
Abilene

Application milestones
1971 1973
1969 1972

ARPANet

Network TCP
Mail

50-56kbps, ARPANet
Telnet

81 83

1988

TCP/IP

Cutover
to TCP/IP

1993 1995

1990

Internet
Talk
Radio
Tahoe
Tahoe

HTTP

2004 2005

Napster
music

Internet
Phone

iTunes
video

AT&T
VoIP

Whitehouse
online
T1

YouTube

NSFNet
T3, NSFNet

File
Transfer

Simple applications

OC12
MCI
OC48
vBNS
OC192
Diverse & demanding applications
Abilene

Network Mail (1971)


First Internet (ARPANet) application

The first network email was sent by Ray Tomlinson between these two
computers at BBN that are connected by the ARPANet.

Internet applications (2006)

Telephony

TV & home theatre

Music

Mail

Library at your finger tip

Finding your way

Friends

Games

Cloud computing

Congestion collapse
1969

1974

81
1983
83

TCP/IP
TCP/IP

ARPANet

Cutover
to TCP/IP

Network TCP
Mail

50-56kbps, ARPANet
Telnet

File
Transfer

1988

1993 1995

1990

Internet
Talk
Radio
Tahoe
Tahoe

HTTP

2004

Napster
music

Internet
Phone

2006

iTunes
video

AT&T
VoIP

Whitehouse
online
T1

YouTube

congestion collapseNSFNet
detected at LBL
T3, NSFNet
OC12
MCI
OC48
vBNS
OC192
Abilene

Congestion collapse
October 1986, the first congestion collapse on
the Internet was detected
Link between UC Berkeley and LBL
400 yards, 3 hops, 32 Kbps
throughput dropped to 40 bps
factor of ~1000 drop!

1988, Van Jacobson proposed TCP congestion


control
throughput

load

Why the 1986 collapse

congestion collapse
detected at LBL

Why the 1986 collapse


5,089 hosts on Internet (Nov 1986)
Backbone speed: 50 56 kbps
Control mechanism focused only on receiver
congestion, not network congestion
Large number of hosts sharing a slow (and
small) network
Network became the bottleneck, as opposed to
receivers
But TCP flow control only prevented overwhelming
receivers

Jacobson introduced feedback control to


deal with network congestion in 1988

Tahoe and its variants (1988)


Jacobson, Sigcomm 1988
+ Avoid overwhelming network
+ Window control mechanisms
Dynamically adjust sender window based on
congestion (as well as receiver window)
Loss-based AIMD
Based on idea of Chiu, Jain, Ramakrishnan

important considering that TCP spans a range from 800 Mbps


Cray channels to 1200 bps packet radio links
-- Jacobson, 1988

TCP congestion control


1969

1974

81
1983
83

TCP/IP
TCP/IP

ARPANet

Cutover
to TCP/IP

Network TCP
Mail

50-56kbps, ARPANet
Telnet

1988

1993 1995

1990

Internet
Talk
Radio
Tahoe
Tahoe

HTTP

2004

Napster
music

Internet
Phone

2006

iTunes
video

AT&T
VoIP

Whitehouse
online
T1

YouTube

congestion collapseNSFNet
detected at LBL
T3, NSFNet

Flow control:
File
Transfer
Prevent
overwhelming receiver

OC12
MCI
OC48
vBNS
+ Congestion control:
Prevent overwhelming network

OC192
Abilene

Transport milestones
1969

1974

1983

1988

94 96 98

00

2006

TCP/IP

ARPANet

TCP

Tahoe

DECNet
AIMD

Vegas
delay
based

NUM

p
formula

reverse
engr TCP

systematic
design
of TCPs

Congestion control protocols


Why congestion control?
Where is CC implemented?
Window control mechanism
CC protocols and basic structure
Active queue management (AQM)

Packet networks
Packet-switched as opposed to circuitswitched
No dedicated resources
Simple & robust: states in packets

More efficient sharing of resources


Multiplexing gain

Less guarantee on performance


Best effort

Network mechanisms
Transmit bits across a link
encoding/decoding, mod/dem, synchronization

Medium access
who transmits when for how long

Routing
choose path from source to destination

Loss recovery
recover packet loss due to congestion, error,
interference

Flow/congestion control
efficient use of bandwidth/buffer without
overwhelming receiver/network

Protocol stack
Network mechanisms implemented as
protocol stack
Each layer designed separately, evolves
asynchronously
application

Many control mechanisms

transport

Error control, congestion control (TCP)

network

Routing (IP)

link

Medium access control

physical

Coding, transmission, synchronization

The Internet hourglass


Applications
Web

Search

Mail

News

Video

Audio

Friends

TCP

IP
Ethernet 802.11

3G/4G

ATM

Optical

Link technologies

Satellite

Bluetooth

IP layer
Routing from source to destination
Distributed computation of routing decisions
Implemented as routing table at each router
Shortest-path (Dijkstra) algorithm within an
autonomous system
BGP across autonomous systems

Datagram service
Best effort
Unreliable: lost, error, out-of-order

Simple and robust


Robust against failures
Robust against, and enables, rapid
technological evolution above & below IP

TCP layer
End-to-end reliable byte stream
On top of unreliable datagram service
Correct, in-order, without loss or duplication

Connection setup and tear down


3-way handshake

Loss and error recovery


CRC to detect bit error
Sequence number to detect packet
loss/duplication
Retransmit packets lost or contain errors

Congestion control
Source-based distributed control

Protocol data format


Applications (e.g. Telnet, HTTP)
TCP

UDP
IP

ICMP
ARP

Link Layer (e.g. Ethernet, ATM)


Physical Layer (e.g. Ethernet, SONET)

Protocol data format


Application Message
MSS

TCP Segment

TCP hdr TCP data


IP Packet

IP hdr
Ethernet Frame

20 bytes

IP data

20 bytes

Ethernet Ethernet data


14 bytes

MTU 1500 bytes

4 bytes

Congestion control protocols


Why congestion control?
Where is CC implemented?
Window control mechanism
CC protocols and basic structure
Active queue management (AQM)

Early TCP
Pre 1988
Go-back-N ARQ
Detects loss from timeout
Retransmits from lost packet onward

Receiver window flow control


Prevents overflow at receive buffer
Receiver sets awnd in TCP header of each ACK
Closes when data received and acked
Opens when data delivered to application

Sender sets W = awnd

Self-clocking

TCP congestion control


Post 1988
ARQ, awnd from ACK, self-clocking
In addition:
Source calculates cwnd from indication of
network congestion
Packet loss
Packet delay
Marks, explicit congestion notification

Source sets W = min (cwnd, awnd)


Algorithms to calculate cwnd
Reno, Vegas, FAST, CUBIC, CTCP,

Congestion control protocols


Why congestion control?
Where is CC implemented?
Window control mechanism
CC protocols and basic structure
Active queue management (AQM)

Key references
TCP/IP spec
RFC 791 Internet Protocol
RFC 793 Transmission Control Protocol
AIMD idea: Chiu, Jain, Ramakrishnan 1988-90
Tahoe/Reno: Jacobson 1988
Vegas: Brakmo and Peterson 1995
FAST: Jin, Wei, Low 2004
CUBIC: Ha, Rhee, Xu 2008
CTCP: Kun et al 2006
RED: Floyd and Jacobson 1993
REM: Athuraliya, Low, Li, Yin 2001
There are many many other proposals and references

TCP Congestion Control


Has four main parts

Slow Start (SS)


Congestion Avoidance (CA)
Fast Retransmit
Fast Recovery

Tahoe

Reno

ssthresh: slow start threshold determines


whether to use SS or CA
Assumption: packet losses are caused by
buffer overflow (congestion)

TCP Tahoe

(Jacobson 1988)

window

time
SS

CA

SS: Slow Start


CA: Congestion Avoidance

TCP Reno

SS

(Jacobson 1990)

CA

Fast retransmission/fast recovery

Delay-based TCP: Vegas


(Brakmo & Peterson 1994)
window

time
SS

CA

Reno with a new congestion avoidance


algorithm
Converges (provided buffer is large) !

TCP CC variants
Differ mainly in Congestion Avoidance

Vegas: delay-based
FAST: delay-based, scalable
CUBIC: time since last congestion
CTCP: use both loss & delay
dupACKs
congestion
avoidance

FR/FR
tim eout

slow start

retransm it

Congestion avoidance
Reno
Jacobson
1988

Vegas
Brakmo
Peterson
1995

for every ACK {


W + = 1/W
}
for every loss {
W = W /2
}

(AI)
(M D )

for every ACK {


if W /RTTm in W /RTT < then W + +
if W /RTTm in W /RTT > then W -}
for every loss {
W = W /2
}

Congestion avoidance
FAST
Jin, Wei, Low
2004

periodically
{

baseRTT
W
W
RTT
}

Congestion control protocols


Why congestion control?
Where is CC implemented?
Window control mechanism
CC protocols and basic structure
Active queue management (AQM)

Feedback control
pl(t)

xi(t)

Example congestion measure pl(t)


Loss (Reno)
Queueing delay (Vegas)

TCP/AQM
pl(t)
TCP:
Reno
Vegas
FAST

xi(t)

AQM:
DropTail
RED
REM/PI
AVQ

Congestion control is a distributed asynchronous algorithm to


share bandwidth
It has two components

TCP: adapts sending rate (window) to congestion


AQM: adjusts & feeds back congestion information

They form a distributed feedback control system

Equilibrium & stability depends on both TCP and AQM


And on delay, capacity, routing, #connections

Implicit feedback
Drop-tail
FIFO queue
Drop packet that arrives at a full buffer

Implicit feedback
Queueing process implicitly computes and
feeds back congestion measure
Delay: simple dynamics
Loss: no convenient model

Active queue management


Explicit feedback
Provide congestion information by
probabilistically marking packets
2 ECN bit in IP header allocated for AQM

Supported by all new routers but usually


turned off in the field

RED

(Floyd & Jacobson 1993)

Congestion measure: average queue length


bl(t+1) = [bl(t) + yl(t) - cl]+
rl(t+1) = (1-) rl(t) + bl(t)
Embedding: p-linear probability function
marking
1

Avg queue

Feedback: dropping or ECN marking

REM

(Athuraliya & Low 2000)

Congestion measure: price


bl(t+1) = [bl(t) + yl(t) - cl]+
pl(t+1) = [pl(t) + (l bl(t)+ xl (t) - cl )]+
L in k m a rk in g p ro b a b ilit y

Embedding: exponential probability function


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

8
10
12
Link conges tion measure

14

16

18

20

Feedback: dropping or ECN marking

REM
Clear buffer and match rate
pl (t 1) [ pl (t ) ( l bl (t ) x l (t ) cl )]
Clear buffer

Match rate

Sum prices
1

pl ( t )

p s (t )

Theorem (Paganini 2000)


Global asymptotic stability for general utility
function (in the absence of delay)

Summary: CC protocols
End-to-end CC implemented in TCP
Basic window mechanism
TCP performs connection setup, error recovery,
and congestion control,
CC dynamically computes cwnd that limits max
#pkts enroute

Distributed feedback control algorithm


TCP: adapts congestion window
AQM: adapts congestion measure

Agenda
9:00 Congestion control protocols
10:00 break
10:15 Mathematical models
11:15 break
11:30 Advanced topics
12:30 lunch

MATHEMATICAL
MODELS

Mathematical models
Why mathematical models?
Dynamical systems model of CC
Convex optimization primer
Reverse engr: equilibrium properties
Forward engr: FAST TCP

Why mathematical models

application
transport
network
link
physical

Protocols are critical, yet


difficult, to understand and
optimize
Local algorithms, distributed
spatially and vertically
global behavior
Designed separately,
deployed asynchronously,
evolves independently

Why mathematical models

application
transport
network
link
physical

Need systematic way to


understand, design, and
optimize
Their interactions
Resultant global behavior

Why mathematical models


Not to replace intuitions, expts, heuristics
Provides structure and clarity

Refines intuition
Guides design
Suggests ideas
Explores boundaries
Understands structural properties

Risk
All models are wrong
some are useful
Validate with simulations & experiments

Structural properties
Equilibrium properties
Throughput, delay, loss, fairness

Dynamic properties
Stability
Robustness
Responsiveness

Scalability properties
Information scaling (decentralization)
Computation scaling
Performance scaling

L., Peterson, Wang, JACM 2002

Limitations of basic model


Static and deterministic network
Fixed set of flows, link capacities, routing
Real networks are time-varying and random

Homogeneous protocols
All flows use the same congestion measure

Fluid approximation
Ignore packet level effects, e.g. burstiness
Inaccurate buffering process

Difficulty in analysis of model


Global
in presence
of feedback delay
basic stability
model has
been generalized

Robustness,
responsiveness
to address
these
issues to various degrees

Mathematical models
Why mathematical models?
Dynamical systems model of CC
Convex optimization primer
Reverse engr: equilibrium properties
Forward engr: FAST TCP

TCP/AQM
pl(t)
TCP:
Reno
Vegas
FAST

xi(t)

AQM:
DropTail
RED
REM/PI
AVQ

Congestion control is a distributed asynchronous algorithm to


share bandwidth
It has two components

TCP: adapts sending rate (window) to congestion


AQM: adjusts & feeds back congestion information

They form a distributed feedback control system

Equilibrium & stability depends on both TCP and AQM


And on delay, capacity, routing, #connections

Network model
Network
Links l of capacities

cl and congestion measure pl(t)

Sources i
Source rates

xi(t)

Routing matrix R
x1(t)

1 1 0

1
0
1

x1 x2 c1

x1 x3 c2

p1(t)
x2(t)

p2(t)
x3(t)

Network model
x

F1
Network

TCP

G1

FN
q

AQM
GL

Rli 1 if source i uses link l


TCP
CC
model
consists
of
T
x(t1) F (x(t), R p(t))
specs for Fi and Gl
p(t1) G (Rx(t), p(t))

p
IP routing
Reno, Vegas
Droptail, RED

Examples
Derive (Fi, Gl) model for
Reno/RED
Vegas/Droptail
FAST/Droptail

Focus on Congestion Avoidance

Model: Reno
for every ack (ca)
{ W + = 1/W }
for every loss
{ W := W /2 }

wi t

xi (t)(1 qi (t))
wi

wi (t)
xi (t)qi (t)
2

Model: Reno
for every ack (ca)
{ W + = 1/W }
for every loss
{ W := W /2 }

wi t

throughput

xi (t)(1 qi (t))
wi (t)

window size

wi (t)
xi (t)qi (t)
2

qi (t) Rli pl (t)


l

round-trip
loss probability

link loss
probability

Model: Reno
for every ack (ca)
{ W + = 1/W }
for every loss
{ W := W /2 }

wi t

xi (t)(1 qi (t))
wi (t)
2
i

1 x
xi (t1) xi (t) 2 qi (t)
Ti
2

Fi xi (t),qi (t)

wi (t)
xi (t)qi (t)
2
Uses:
wi (t)
xi (t)
Ti
qi (t) 0

Model: RED
yl (t) Rli xi (t)

marking prob
1

queue length

aggregate
link rate

bl (t1) bl (t) yl (t) cl


pl (t) min bl (t),1

pl (t)Gl yl (t), pl (t)

source
rate

Model: Reno/RED
2
i

1 x
xi (t1) xi (t) 2 qi (t)
Ti
2

xi (t1)Fi xi (t),qi (t)

qi (t) Rli pl (t)


l

bl (t1) bl (t) yl (t) cl


pl (t) max bl (t),1

pl (t)Gl yl (t), pl (t)

yl (t) Rli xi (t)


i

Decentralization structure
x

yy

F1
Network

TCP

G1

AQM

FN
qq

GL
R

x(t1) F(x(t), q(t))


p(t1) G(y(t), p(t))

qi (t) Rli pl (t)


l

yl (t) Rli xi (t)


i

Validation Reno/REM

30 sources, 3 groups with RTT = 3, 5, 7 ms


Link capacity = 64 Mbps, buffer = 50 kB
Smaller window due to small RTT (~0 queueing
delay)

Queue

REM

queue = 1.5 pkts


utilization = 92%

p = Lagrange multiplier!

DropTail
queue = 94%

= 0.05, = 0.4, = 1.15

p decoupled from queue

p increasing in queue!

RED
min_th = 10 pkts
max_th = 40 pkts
max_p = 0.1

Model: Vegas/Droptail
for every RTT
{ if W /RTTm in W /RTT < then W + +
if W /RTTm in W /RTT > then W -- }
for every loss
W := W /2

Fi:

Gl:

queue size

1
xi t1 xi (t) 2
Ti (t)

1
xi t1 xi (t) 2
Ti (t)

if wi (t) di xi (t) i di

xi t1 xi (t)

else

pl(t+1) = [pl(t) + yl (t)/cl - 1]+

if wi (t) di xi (t) i di

Ti (t) di qi (t)

Model: FAST/Droptail
periodically
{
baseRTT
W :
W
RTT
}

i
xi (t1) xi (t)
i xi (t)qi (t)
Ti (t)

1
pl (t1) pl (t) yl (t) cl
cl

L., Peterson, Wang, JACM 2002

Validation: matching transients


1
p
c

wi (t i f )
f

i (t i ) x0 (t ) c
w
i d i p (t )

[Jacobsson et al 2009]
Same RTT, no cross traffic

Same RTT, cross traffic

Different RTTs, no cross traffic

Recap
Protocol (Reno, Vegas, FAST, Droptail, RED)

x(t1) F (x(t), q(t))


p(t1) G (y(t), p(t))

Equilibrium
Performance
Throughput, loss, delay
Fairness
Utility

Dynamics
Local stability
Global stability

Mathematical models
Why mathematical models?
Dynamical systems model of CC
Convex optimization primer
Reverse engr: equilibrium properties
Forward engr: FAST TCP

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Called convex program if Ui are concave


functions
L in k m a rk in g p ro b a b ilit y

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

8
10
12
14
Link congestion measure

16

18

20

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Called convex program if Ui are concave


functions
Local optimum is globally optimal
First order optimality (KKT) condition is
necessary and sufficient

Convex programs are polynomial-time


solvable
Whereas nonconvex programs are generally
NP hard

Background: optimization

U ( x )

max

x0

subject to

Rx c

Theorem
Optimal solution x* exists
It is unique if Ui are strictly concave
1
0.9

Link marking probability

0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

8
10
12
14
Link congestion measure

16

18

strictly concave

20

not

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Theorem

x is optimal if and only if there exists p 0


*

such that

Ui ' x q : R p
*
i

*
i

*
li l

cl

y : R x
*
l

*
li i

c
if
p

l
l 0

Lagrange
multiplier
Complementary
slackness: all
bottlenecks are
fully utilized

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Theorem

p* can be interpreted as prices


Optimal x* maximizes its own benefit
i
max Ui xi xi R p
xi

*
li l

incentive compatible

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Theorem
Gradient decent algorithm to solve the dual
problem is decentralized

pl (t1) pl (t) yl (t) cl


xi (t) U

'1
i

qi (t)

law of supply & demand

qi (t) Rli pl (t)


l

yl (t) Rli xi (t)


i

Background: optimization
max
x0

U ( x )
i

subject to

Rx c

Theorem
Gradient decent algorithm to solve the dual
problem is decentralized

pl (t1) pl (t) yl (t) cl


xi (t) U

'1
i

qi (t)

Gradient-like algorithm to solve NUM


defines TCP CC algorithm !

reverse/forward
engineer TCP

Mathematical models
Why mathematical models?
Dynamical systems model of CC
Convex optimization primer
Reverse engr: equilibrium properties
Forward engr: FAST TCP

Duality model of TCP/AQM


TCP/AQM

x* F (x*, RT p* )
p* G (Rx*, p* )

Equilibrium (x*,p*) primal-dual optimal:


max U i ( xi )
subject to Rx c
x0

F determines utility function U


G guarantees complementary slackness
p* are Lagrange multipliers
Kelly, Maloo, Tan 1998
Low, Lapsley 1999

Uniqueness of equilibrium
x* is unique when U is strictly
concave
p* is unique when R has full row rank

Duality model of TCP/AQM


TCP/AQM

x* F (x*, RT p* )
p* G (Rx*, p* )

Equilibrium (x*,p*) primal-dual optimal:


max U i ( xi )
subject to Rx c
x0

F determines utility function U


G guarantees complementary slackness
p* are Lagrange multipliers
Kelly, Maloo, Tan 1998
Low, Lapsley 1999

The underlying convex program also


leads to simple dynamic behavior

Duality model of TCP/AQM


Equilibrium (x*,p*) primal-dual optimal:

max
x0

U ( x )
i

subject to

Rx c

Mo & Walrand 2000:

if 1

(1 ) 1 xi1

if 1

U i ( xi )

log xi

Vegas, FAST, STCP


HSTCP
Reno
XCP (single link
only)

Low 2003

Duality model of TCP/AQM


Equilibrium (x*,p*) primal-dual optimal:

max
x0

U ( x )
i

subject to

Rx c

Mo & Walrand 2000:

if 1

(1 ) 1 xi1

if 1

U i ( xi )

log xi

maximum throughput
proportional fairness
min delay fairness
maxmin fairness

Low 2003

Some implications
Equilibrium
Always exists, unique if R is full rank
Bandwidth allocation independent of AQM or
arrival
Can predict macroscopic behavior of large scale
networks

Counter-intuitive throughput behavior


Fair allocation is not always inefficient
Increasing link capacities do not always raise
aggregate throughput
[Tang, Wang, Low, ToN 2006]

Forward engineering: FAST TCP


Design, analysis, experiments
[Wei, Jin, Low, Hegde, ToN 2006]

Equilibrium throughput
Reno

xi

1
0.5
Ti qi

HSTCP

xi

1
0.84
Ti qi

Vegas, FAST

xi

qi

= 1.225 (Reno), 0.120 (HSTCP)

Reno penalizes long flows


Renos square-root-p throughput formula
Vegas, FAST: equilibrium cond = Littles Law

Vegas/FAST: effect of RTT error


Persistent congestion can arise due to
Error in propagation delay estimation

Consequences
Excessive backlog
Unfairness to older sources

Theorem
A relative error of s in propagation delay
estimation distorts the utility function to

U s(xs ) (1 s ) s log xs sxs

Evalidation

Without estimation error

With estimation error

Single link, capacity = 6 pkt/ms, s = 2 pkts/ms, ds = 10


ms
With finite buffer: Vegas reverts to Reno

Validation
Source rates (pkts/ms)
# src1
src2
src3
src4
src5
1 5.98 (6)
2 2.05 (2) 3.92 (4)
3 0.96 (0.94) 1.46 (1.49) 3.54 (3.57)
4 0.51 (0.50) 0.72 (0.73) 1.34 (1.35) 3.38 (3.39)
5 0.29 (0.29) 0.40 (0.40) 0.68 (0.67) 1.30 (1.30) 3.28 (3.34)

#
1
2
3
4
5

queue (pkts)
19.8 (20)
59.0 (60)
127.3 (127)
237.5 (238)
416.3 (416)

baseRTT (ms)
10.18 (10.18)
13.36 (13.51)
20.17 (20.28)
31.50 (31.50)
49.86 (49.80)

Mathematical models
Why mathematical models?
Dynamical systems model of CC
Convex optimization primer
Reverse engr: equilibrium properties
Forward engr: FAST TCP

Reno design
Reno TCP
Packet level

ACK: W

W + 1/W

Loss: W

W 0.5W

Flow level

1.5
qi

Equilibrium

xiTi

Dynamics

1
2 2
i (t)
w
1 wi (t)qi (t)

Ti
3

pkts

(Mathis formula 1996)

Reno design
Packet level
Designed and implemented first

Flow level
Understood afterwards

Flow level dynamics determines


Equilibrium: performance, fairness
Stability

Design flow level equilibrium & stability


Implement flow level goals at packet level

Forward engineering
1. Decide congestion measure
Loss, delay, both

2. Design flow level equilibrium properties


Throughput, loss, delay, fairness

3. Analyze stability and other dynamic properties


Control theory, simulate, improve model/algorithm

4. Iterate 1 3 until satisfactory


5. Simulate, prototype, experiment
Compare with theoretical predictions
Improve model, algorithm, code

Iterate 1 5 until satisfactory

Forward engineering
Tight integration of theory, design, experiment
Performance analysis done at design time
Not after

Theory does not replace intuitions and heuristics


Refines, validates/invalidates them

Theory provides structure and clarity


Guides design
Suggests ideas and experiments
Explores boundaries that are hard to expt

Packet level description


Reno
AIMD(1, 0.5)

HSTCP
AIMD(a(w), b(w))

STCP
MIMD(a, b)

FAST

ACK: W

W + 1/W

Loss: W

W 0.5W

ACK: W

W + a(w )/W

Loss: W

W b(w )W

ACK: W

W + 0.01

Loss: W

W 0.125W

RTT : W W

baseRTT

RTT

Flow level:

Reno, HSTCP, STCP, FAST

Common flow level dynamics!


i (t)
w
window
adjustment

qi (t)
(t) 1 '
Ui (t)

control
gain

flow level
goal

Different gain and utility Ui


They determine equilibrium and stability

Different congestion measure qi


Loss probability (Reno, HSTCP, STCP)
Queueing delay (Vegas, FAST)

Flow level:

Reno, HSTCP, STCP, FAST

Common flow level dynamics!


i (t)
w
window
adjustment

qi (t)
(t) 1 '
Ui (t)

control
gain

flow level
goal

Small adjustment when close, large far away


Need to estimate how far current state is wrt target
Scalable

Reno, Vegas: window adjustment independent of qi


Depends only on current window
Difficult to scale

NetLab

Lee Center
rsrg SISL

prof steven low


Caltech FAST Project
2000

2001

2002

Lee
Center

2003

FAST TCP
theory

2004

2005

2006

2007

WAN-in-Lab
Testbed

IPAM Wkp

Internet: largest distributed


nonlinear feedback control system
Reverse engineering: TCP is realtime distributed algorithm over
Internet to maximize utility
max
x0

U ( x )
i

s. t. Rx c

Forward engineering: Invention of


FastTCP based on control theory &
convex optimization
i

i xi (t ) Rli pl (t )
Ti
l

p l Rli xi (t ) cl
cl i

theory

experiment

testbed

deployment

Collaborators: Doyle (Caltech), Newman (Caltech), Paganini


(Uruguay), Tang (Cornell), Andrew (Swinburne), Chiang (Princeton);
CACR, CERN, Internet2, SLAC, Fermi Lab, StarLight, Cisco

SC02
Demo

theory

control & optimization of networks

testbed
WAN-in-Lab : one-of-a-kind windtunnel in academic networking, with
2,400km of fiber, optical switches,
routers, servers, accelerators

experiment

deployment

Scientists have used FastTCP to


break world records on data transfer
between 2002 2006

FAST is commercialized by FastSoft;


it accelerates worlds 2nd largest
CDN and Fortune 100 companies
FastTCP

TCP
eq 2

eq 3

SC 2004
eq 1

Internet2 LSR
SuperComputing BC

Internet

FAST in a box

x i

with
FAST

without
FAST

Some benefits
Transparent interaction among components
TCP, AQM
Clear understanding of structural properties

Understanding effect of parameters


Change protocol parameters, topology,
routing, link capacity, set of flows
Re-solve NUM
Systematic way to tune parameters

Extreme resilience to loss


Without FAST
throughput: 1Mbps

With FAST
throughput: 120Mbps

Heavy packet loss in Sprint network:


FAST TCP increased throughput by 120x !

SF New York
June 3, 2007

10G appliance customer data

Average download speed 8/24 30, 2009, CDN customer (10G appliance)
FAST vs TCP stacks in BSD, Windows, Linux

Summary: math models


Integration of theory, design, experiment
can be very powerful
Each needs the other
Combination much more than sum

Theory-guided design approach


Tremendous progress in the last decade; not
as impossible as most feared
Very difficult; but worth the effort
Most critical: mindset

How to push theory-guided design


approach further ?

Agenda
9:00 Congestion control protocols
10:00 break
10:15 Mathematical models
11:15 break
11:30 Advanced topics
12:30 lunch

ADVANCED
TOPICS

Advanced topics
Heterogeneous protocols
Layering as optimization
decomposition

The world is heterogeneous


Linux 2.6.13 allows users to choose
congestion control algorithms
Many protocol proposals
Loss-based: Reno and a large number of variants
Delay-based: CARD (1989), DUAL (1992), Vegas
(1995), FAST (2004),
ECN: RED (1993), REM (2001), PI (2002), AVQ
(2003),
Explicit feedback: MaxNet (2002), XCP (2002),
RCP (2005),

Some implications
homogeneous

heterogeneous

equilibrium

unique

bandwidth
allocation
on AQM

independent

bandwidth
allocation
on arrival

independent

Throughputs depend on AQM

FAST throughput
buffer size = 80 pkts

buffer size = 400 pkts

FAST and Reno share a single bottleneck router


NS2 simulation
Router: DropTail with variable buffer size
With 10% heavy-tailed noise traffic

Multiple equilibria:

throughput

depends on arrival

Dummynet experiment

eq 2

eq 1

eq 2

Path 1

52M

13M

path 2

61M

13M

path 3

27M

93M

eq 1

Tang, Wang, Hegde, Low, Telecom Systems, 2005

Multiple equilibria:

throughput

depends on arrival

Dummynet experiment

eq 2

eq 1

eq 2

Path 1

52M

13M

path 2

61M

13M

path 3

27M

93M

eq 3 (unstable)

eq 1

Tang, Wang, Hegde, Low, Telecom Systems, 2005

Duality model:

max
x0

U ( x )
i

s.t. Rx c

R p ,x

x Fi
*
i

*
li l

*
i

Why cant use Fis of FAST and Reno in


duality model?
They use different prices!

Fi xi i xi Rli pl
Ti
l

delay for FAST

1
xi2
Fi 2
Ti
2

loss for Reno

li

pl

Duality model:

max
x0

U ( x )
i

s.t. Rx c

R p ,x

x Fi
*
i

*
li l

*
i

Why cant use Fis of FAST and Reno in


duality model?
They use different prices!

Fi xi i xi Rli pl
Ti
l

1
xi2
Fi 2
Ti
2

li

pl

1
p l
cl

R x (t ) c
li i

p l g l pl (t ), Rli xi (t )
i

Homogeneous protocol
x

F1

G1

Network

TCP
FN

GL

xi (t 1) Fi

li

pl (t ), xi (t )

R m p (t ) ,
j

same price
for all sources

xi (t 1) Fi
j

AQM

li

xi (t )

Heterogeneous protocol
x

F1

G1

Network

TCP
FN

GL

xi (t 1) Fi

li

pl (t ), xi (t )

xi (t 1) Fi
j

AQM

R m p (t ) ,

li

xi (t )

heterogeneous
prices for
type j sources

Heterogeneous protocols
Equilibrium: p that satisfies

xi ( p ) f i
j

R m
li

( pl )

cl
yl ( p ) : R x ( p )
i,j
cl
j j
li i

if pl 0

Duality model no longer applies !


pl can no longer serve as Lagrange
multiplier

Heterogeneous protocols
Equilibrium: p that satisfies

xi ( p ) f i
j

R m
li

( pl )

cl
yl ( p ) : R x ( p )
i,j
cl
j j
li i

if pl 0

Need to re-examine all issues


Equilibrium: exists? unique? efficient? fair?
Dynamics: stable? limit cycle? chaotic?
Practical networks: typical behavior? design
guidelines?

Notation
Simpler notation: p is equilibrium if
y ( p ) c on bottleneck links
Jacobian: J ( p ) : y ( p )

Linearized dual algorithm:


*

p J ( p ) p(t)

Tang, Wang, L., Chiang, ToN, 2007


Tang, Wei, L., Chiang, ToN, 2010

Existence
Theorem
Equilibrium p exists, despite lack of
underlying utility maximization
Generally non-unique
There are networks with unique bottleneck
set but infinitely many equilibria
There are networks with multiple bottleneck
set each with a unique (but distinct)
equilibrium

Regular networks
Definition
A regular network is a tuple (R, c, m, U) for
which all equilibria p are locally unique, i.e.,
y
det J ( p ) : det
( p) 0
p
Theorem
Almost all networks are regular
A regular network has finitely many and
odd number of equilibria (e.g. 1)

Global uniqueness
m lj [al ,21/ L al ] for any al 0
m lj [a j ,21/ L a j ] for any a j 0
Theorem
If price heterogeneity is small, then equilibrium is
globally unique
Implication
a network of RED routers with slope inversely
proportional to link capacity almost always has
globally unique equilibrium

Local stability
m lj [al ,21/ L al ] for any al 0
m lj [a j ,21/ L a j ] for any a j 0
Theorem
If price heterogeneity is small, then the unique
equilibrium p is locally stable
If all equilibria p are locally stable, then it is
globally unique
Linearized dual algorithm:

p J ( p * ) p(t)

Equilibrium p is locally stable if

Re J ( p ) 0

Summary
homogeneous

heterogeneous

equilibrium

unique

non-unique

bandwidth
allocation
on AQM

independent

dependent

bandwidth
allocation
on arrival

independent

dependent

Interesting characterizations of equilibrium


But not much understanding on dynamics

Efficiency
Result
Every equilibrium p* is Pareto efficient

Proof:
Every equilibrium p* yields a (unique) rate x(p*)
that solves

max ij ( p * )U i j ( xij )
x0

s. t. Rx c

Efficiency
Result
Every equilibrium p* is Pareto efficient

Measure of optimality

: max U i ( xi )

Achieved:

x 0

V ( p ) :
*

s. t. Rx c

U
j

j
i

( xi ( p ))

Efficiency
Result
Every equilibrium p* is Pareto efficient
Loss of optimality:
j

min ml
V(p )

*
V
max m lj
*

Measure of optimality

: max U i ( xi )

Achieved:

x 0

V ( p ) :
*

s. t. Rx c

U
j

j
i

( xi ( p ))

Efficiency
Result
Every equilibrium p* is Pareto efficient
Loss of optimality:
j

min ml
V(p )

*
V
max m lj
*

e.g. A network of RED routers with default


parameters suffers no loss of optimality

Intra-protocol fairness
Result
Fairness among flows within each type is
unaffected, i.e., still determined by their utility
functions and Kellys problem with reduced link
capacities
Proof idea:
Each equilibrium p chooses a partition of link
capacities among types, cj:= cj(p)
Rates xj(p) then solve
j
j
j j
j
max
U
(
x
)
s.
t.
R
x

c
i i
j
x 0

Inter-protocol fairness
Theorem
Any fairness is achievable with a linear scaling of
utility functions

j
j
j j
x j : arg max
U
(
x
)
s.
t.
R
x c
i i
j
x 0

all achievable rates X : x a x


j

Slow timescale control


Slow timescale scaling of utility function

xij (t )

j qij (t )

f i
j
i (t )

scaling of end--to-end price

i j (t 1) i j i j (t ) (1 i j )

j
m
l ( pl (t ))
l

p (t )
l

slow timescale update of scaling factor

ns2 simulation:

buffer=80pks

FAST throughput

without slow timescale control

with slow timescale control

ns2 simulation:

buffer=400pks

FAST throughput

without slow timescale control

with slow timescale control

Advanced topics
Heterogeneous protocols
Layering as optimization
decomposition

The Internet hourglass


Applications
Web

Search

Mail

News

Video

Audio

Friends

TCP

IP
Ethernet 802.11

3G/4G

ATM

Optical

Link technologies

Satellite

Bluetooth

But what is architecture


Architecture involves or facilitates

System-level function (beyond components)


Organization and structure
Protocols and modules
Risk mitigation, performance, evolution

but is more than the sum of these


-- John Doyle, Caltech

the architecture of a system defines how


the system is broken into parts and how those
parts interact.
-- Clark, Sollins, Wroclawski, , MIT

But what is architecture


Things that persist over time
Things that are common across networks
Forms that enable functions
Frozen but evolves
It is intrinsic but artificial
Key features (John Doyle, Caltech)
Layering as optimization decomposition
Constraints that deconstrain
Robust yet fragile

Layering as opt decomposition


Each layer designed separately and
evolves asynchronously
Each layer optimizes certain objectives

application

Minimize response time (web layout)

transport

Maximize utility (TCP/AQM)

network

Minimize path costs (IP)

link

Reliability, channel access,

physical

Minimize SIR, max capacities,

Layering as opt decomposition


Each layer is abstracted as an optimization
problem
Operation of a layer is a distributed solution
Results of one problem (layer) are parameters of
others
Operate at different timescales
Application: utility

application
transport
network
link
physical

max
x 0

U ( x )
i

Phy: power

subj to Rx c( p )
x X
IP: routing

Link: scheduling

Layering as opt decomposition


Each layer is abstracted as an optimization
problem
Operation of a layer is a distributed solution
Results of one problem (layer) are parameters of
others
Operate at different timescales

application
transport
network
link
physical

1) Understand each layer in isolation, assuming


other layers are designed nearly optimally
2) Understand interactions across layers
3) Incorporate additional layers
4) Ultimate goal: entire protocol stack as
solving one giant optimization problem, where
individual layers are solving parts of it

Layering as opt decomposition


Network
Layers
Layering
Interface

application
transport
network
link
physical

generalized NUM
subproblems
decomposition methods
functions of primal or dual vars

1) Understand each layer in isolation, assuming


other layers are designed nearly optimally
2) Understand interactions across layers
3) Incorporate additional layers
4) Ultimate goal: entire protocol stack as
solving one giant optimization problem, where
individual layers are solving parts of it

Examples
Optimal web layer: Zhu, Yu, Doyle 01
HTTP/TCP: Chang, Liu 04
application
transport
network
link
physical

TCP: Kelly, Maulloo, Tan 98,


TCP/IP: Wang et al 05,
TCP/MAC: Chen et al 05,
TCP/power control: Xiao et al 01,
Chiang 04,

Rate control/routing/scheduling: Eryilmax et al 05, Lin et


al 05, Neely, et al 05, Stolyar 05, Chen, et al 05

detailed survey in Proc. of IEEE, 2006

Example: dual decomposition


Design via dual decomposition
Congestion control, routing, scheduling/MAC
As distributed gradient algorithm to jointly solve
NUM

Provides
basic structure of key algorithms
framework to aid protocol design

worksLijun Chen, Steven H. Low and John C. Doyle.Computer Networks Journal, Special Is

Wireless mesh network


xid
s

d
si

f ijd

xid f ijd f jid


j

xid 0

if i S

for all i N , d D

Wireless mesh network


Underlying optimization problem:
Utility to flows (s,d)

max

x , f 0

s.t.

Cost of using links (i,j)

d
d
d
d
U
(
x
)

f
s s ij ij

( s ,d )

(i , j ) d

xid f ijd f jid


j

Local flow constraint


Schedulability
constraint

Dual decomposition
utility function Uid

congestion control

local congestion price


pid

transmission rate
xid
routing
output queue d*
to service

neighbor congestion prices


pjd, ijd

link weights wij

scheduling

price = queueing delay


xid

pid

f sid

xik

conflict graph

f sik

d
ij

links (i,j) to
transmit

pik

f
j

k
ij

Algorithm architecture
Security Mgt

Application
d

utility U i
local
price

d
i

xid

Congestion Control

queue
length

p dj , dij

Routing

Estimation

other
nodes

conflict
graph

weights wi , j

d
*

En/Dequeue

Scheduling/MAC

Xmit/Rcv

Topology Control

Radio Mgt

Mobility Management

Physical
Xmission

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy