18csc310j Unit 5
18csc310j Unit 5
for
V Sem
B.Tech (CSE - Cloud Computing)
Department of NWC
CO - Course Learning Outcomes
• CO1 - Apply various data centric networking concepts.
• CO2 - Identify the different data centre architectures & core network
connectivity issues.
• CO3 - Design of server architectures in layer 2 — 3 level for Data centres.
• CO4 - Demonstrate various networking protocols in layer 2 networks.
• CO5 - Evaluate and choose the appropriate networking techniques used in
Layer 3 networks
18CSC310J-DCNSD NWC/SRMIST 2
18CSC310J
Data Centric Networking and
System Design
Unit I
Outline of the Presentation
• Introduction to Layer 3 Networks
• Layer 3 Data Center Technologies
• Locator Identifier Separation Protocol (LISP)
• Layer 3 Multicasting
• Protocol: IPv4
• Protocol: IPv6
• Protocols: MPLS, OSPF
• Protocols: IS-IS, BGP
• OTV & VPLS Layer 2 Extension
18CSC310J-DCNSD NWC/SRMIST 4
Introduction to Layer 3 Networks
18CSC310J-DCNSD NWC/SRMIST 5
Layer 3 Networks
• In today’s rapidly evolving digital landscape, data centers
play a critical role in ensuring the seamless functioning of
businesses and organizations.
• Among the various types of data centers, Layer 3 data
centers stand out as the backbone of modern networking.
18CSC310J-DCNSD NWC/SRMIST
Layer 3 Networks
• A Layer 3 data center, also known as a network layer data
center, is an advanced infrastructure that operates at the
network layer of the OSI (Open Systems Interconnection)
model.
• It acts as a gateway, facilitating communication between
various networks, both internally and externally.
• Layer 3 data centers are responsible for routing and
forwarding data packets across multiple networks,
ensuring efficient and secure transmission.
18CSC310J-DCNSD NWC/SRMIST
Layer 3 Networks
• A Layer 3 Data Center is a type of data center that utilizes
Layer 3 switching technology to provide network
connectivity and traffic control.
• Layer 3 Data Centers are typically used in large-scale
enterprise networks, providing reliable services and high
performance.
18CSC310J-DCNSD NWC/SRMIST
Layer 3 Networks
• Layer 3 Data Centers are differentiated from other data
centers using Layer 3 switching.
• Layer 3 switching, also known as Layer 3 networking, is a
switching technology that operates at the third layer of the
Open Systems Interconnection (OSI) model, the network
layer.
• This switching type manages network routing, addressing,
and traffic control and supports various protocols.
18CSC310J-DCNSD NWC/SRMIST
Layer 3 Networks
• Layer 3 Data Centers are typically characterized by their
use of high-performance routers and switches.
• These routers and switches are designed to deliver robust
performance, scalability, and high levels of security.
• In addition, by using Layer 3 switching, these data centers
can provide reliable network services such as network
access control, virtual LANs, and Quality of Service (QoS)
management.
Layer 3 Networks
Layer 3 DC Key Features and Functionalities:
1. Network Routing: Layer 3 data centers excel in routing data packets
across networks, using advanced routing protocols such as OSPF
(Open Shortest Path First) and BGP (Border Gateway Protocol). This
enables efficient traffic management and optimal utilization of
network resources.
2. IP Addressing: Layer 3 data centers assign and manage IP
addresses, allowing devices within a network to communicate with
each other and external networks. IP addressing helps in identifying
and locating devices, ensuring reliable data transmission.
Layer 3 Networks
Layer 3 DC Key Features and Functionalities:
3. Interconnectivity: Layer 3 data centers provide seamless
connectivity between different networks, whether they are local area
networks (LANs), wide area networks (WANs), or the internet. This
enables organizations to establish secure and reliable connections
with their branches, partners, and customers.
4. Load Balancing: Layer 3 data centers distribute network traffic
across multiple servers or network devices, ensuring that no single
device becomes overwhelmed. This helps to maintain network
performance, improve scalability, and prevent bottlenecks.
Layer 3 Networks
Benefits of Layer 3 Data Centers:
1. Enhanced Performance: Layer 3 data centers optimize network
performance by efficiently routing traffic, reducing latency, and
ensuring faster data transmission. This results in improved application
delivery, enhanced user experience, and increased productivity.
2. Scalability: Layer 3 data centers are designed to support the
growth and expansion of networks. Their ability to route data across
multiple networks enables organizations to scale their operations
seamlessly, accommodate increasing traffic, and add new devices
without disrupting the network infrastructure.
Layer 3 Networks
Benefits of Layer 3 Data Centers:
3. High Security: Layer 3 data centers provide enhanced security
measures, including firewall protection, access control policies,
and encryption protocols. These measures safeguard sensitive
data, protect against cyber threats, and ensure compliance with
industry regulations.
4. Flexibility: Layer 3 data centers offer network architecture and
design flexibility. They allow organizations to implement different
network topologies based on their specific requirements, such as
hub-and-spoke, full mesh, or partial mesh.
Layer 3 Data Center Technologies
18CSC310J-DCNSD NWC/SRMIST 15
Layer 3 Data Center
Technologies
Multipath Route Forwarding
• Many networks implement VLANs to support random IP
address assignment and IP mobility.
• The switches perform layer-2 forwarding even though
they might be capable of layer-3 IP forwarding.
• For example, they forward packets based on MAC
addresses within a subnet, yet a layer-3 switch does not
need Layer 2 information to route IPv4 or IPv6 packets.
Layer 3 Data Center
Technologies
Multipath Route Forwarding
• Cumulus has gone one step further and made it possible
to configure every server-to-ToR interface as a Layer 3
interface.
• Their design permits multipath default route
forwarding, removing the need for ToR interconnects and
common broadcast domain sharing of uplinks.
Layer 3 Data Center
Technologies
Bonding Vs. ECMP
• A typical server environment consists of a single server with two
uplinks.
• For device and link redundancy, uplinks are bonded into a port
channel and terminated on different ToR switches, forming an
MLAG.
• As this is an MLAG design, the ToR switches need an inter-
switch link.
• Therefore, you cannot bond server NICs to two separate ToR
switches without creating an MLAG.
Layer 3 Data Center
Technologies
Bonding Vs. ECMP
Layer 3 Data Center
Technologies
Pure layer-3 solution complexities
• Firstly, we cannot have one IP address with two MAC
addresses. To overcome this, we implement additional Linux
features.
• First, Linux has the capability for an unnumbered interface,
permitting the assignment of the same IP address to both
interfaces, one IP address for two physical NICs.
• Next, we assign a /32 Anycast IP address to the host via a
loopback address.
Layer 3 Data Center
Technologies
Layer 3 Data Center
Technologies
Pure layer-3 solution complexities
• Secondly, the end hosts must send to a next-hop, not a shared
subnet.
• Linux allows you to specify an attribute to the received default
route, called “on-link.”
• This attribute tells end-hosts, “I might not be on a directly
connected subnet to the next hop, but trust me, the next hop is
on the other side of this link.”
• It forces hosts to send ARP requests regardless of common
subnet assignment.
Layer 3 Data Center
Technologies
Pure layer-3 solution complexities
• These techniques enable the assignment of the same IP
address to both interfaces and permit forwarding a default
route out of both interfaces.
• Each interface is on its broadcast domain.
• Subnets can span two ToRs without requiring bonding
or an inter-switch link.
Layer 3 Data Center
Technologies
Standard ARP processing still works.
• Although the Layer 3 ToR switch doesn’t need Layer 2 information to
route IP packets, the Linux end-host believes it has to deal with the
traditional L2/L3 forwarding environment.
• As a result, the Layer 3 switch continues to reply to incoming ARP
requests.
• The host will ARP for the ToR Anycast gateway (even though it’s not
on the same subnet), and the ToR will respond with its MAC address.
• The host ARP table will only have one ARP entry because the default
route points to a next-hop, not an interface.
Layer 3 Data Center
Technologies
Standard ARP processing still works.
• Return traffic is slightly different, depending on what the ToR
advertises to the network.
• There are two modes; firstly, if the ToR advertises a /24 to
the rest of the network, everything works fine until
the server-to-ToR link fails.
• Then, it becomes a layer-2 problem; as you said, you could
reach the subnet. This results in return traffic traversing an
inter-switch ToR link to get back to the server.
Layer 3 Data Center
Technologies
Standard ARP processing still works.
• But this goes against our previous design requirement of
removing any ToR inter-switch links.
• Essentially, you need to opt for the second mode and
advertise a /32 for each host back into the network.
Layer 3 Data Center
Technologies
Standard ARP processing still works.
• Take the information learned in ARP, consider it a host routing
protocol, and redistribute it into the data center protocol, i.e.,
redistribute ARP.
• The ARP table gets you the list of neighbors, and the redistribution
pushes those entries into the routed fabric as /32 host routes.
• This allows you to redistribute only what /32 are active and present
in ARP tables.
• It should be noted that this is not a default mode and is currently an
experimental feature.
Locator Identifier Separation Protocol
(LISP)
18CSC310J-DCNSD NWC/SRMIST 28
LISP
• Cisco Locator ID Separation Protocol (LISP) is a mapping
and encapsulation protocol, originally developed to
address the routing scalability issues on the Internet.
18CSC310J-DCNSD NWC/SRMIST 46
Layer 3 Multicasting
Layer 3 multicast protocols include multicast group
management protocols and multicast routing protocols.
Layer 3 Multicasting
Multicast group management protocols:
18CSC310J-DCNSD NWC/SRMIST 53
IP Service
• IP supports the following services:
• one-to-one (unicast)
• one-to-all (broadcast)
• one-to-several (multicast)
unicast
broadcast multicast
55
IP datagram
56
IP Datagram Format
bit # 0 7 8 15 16 23 24 31
header
version DS ECN total length (in bytes)
length
D M
Identification 0 Fragment offset
F F
time-to-live (TTL) protocol header checksum
source IP address
destination IP address
options (0 to 40 bytes)
payload
4 bytes
Total length of the datagram = Length of the header + Length of the data
58
Fields of the IP Header
• DS/ECN field (1 byte)
– This field was previously called as Type-of-Service (TOS) field.
The role of this field has been re-defined, but is “backwards
compatible” to TOS interpretation
– Differentiated Service (DS) (6 bits):
• Used to specify service level (currently not supported in the
Internet)
– Explicit Congestion Notification (ECN) (2 bits):
• New feedback mechanism used by TCP
Type of Service
• Flags (3 bits):
– First bit always set to 0
– DF bit (Do not fragment)
– MF bit (More fragments)
Will be explained later Fragmentation
61
Fields of the IP Header
• Time To Live (TTL) (1 byte):
– Specifies longest paths before datagram is dropped
– Role of TTL field: Ensure that packet is eventually dropped
when a routing loop occurs
Used as follows:
– Sender sets the value (e.g., 64)
– Each router decrements the value by 1
– When the value reaches 0, the datagram is dropped
62
Fields of the IP Header
• Protocol (1 byte):
• Specifies the higher-layer protocol.
• Used for demultiplexing to higher layers.
4 = IP-in-IP
encapsulation
6 = TCP 17 = UDP
1 = ICMP 2 = IGMP
IP
63
Fields of the IP Header
Value (Decimal) Protocol
• The limit on the maximum IP datagram size, imposed by the data link protocol is
called maximum transmission unit (MTU)
69
MTU
• The amount of data that can be transmitted in a single
frame is called Maximum Transfer Unit (MTU) and varies
with the network technology that is used.
• MTU size is measured in bytes.
• For example, the MTU for Ethernet is 1,500 bytes,
whereas it is 4,352 bytes for FDDI.
IP Fragmentation
• What if the size of an IP datagram exceeds the MTU?
IP datagram is fragmented into smaller units.
• What if the route contains networks with different MTUs?
Ethernet
FDDI
Ring
Host A Router Host B
MTUs: FDDI: 4352 Ethernet: 1500
• Fragmentation:
• IP router splits the datagram into several datagram
• Fragments are reassembled at receiver
71
IP Fragmentation
• If a datagram can be accommodated in a frame, data transmission
becomes very simple
• However, if the size of the datagram is more than the value that can
be accommodated in the frame, the datagram must be divided into
logical groups called fragments.
• If a datagram cannot be accommodated in a single frame, it is
divided or fragmented and sent in multiple frames. The process of
dividing a datagram into multiple groups called fragments is
called fragmentation.
Where is Fragmentation done?
Router
73
What’s involved in Fragmentation?
• The following fields in the IP header are involved:
header
version
length
DS ECN total length (in bytes)
DM
Identification 0 Fragment offset
F F
time-to-live (TTL) protocol header checksum
header
version
length
DS ECN total length (in bytes)
DM
Identification 0 Fragment offset
F F
time-to-live (TTL) protocol header checksum
Service is:
• Unreliable: Losses, duplicates, out-of-order delivery
• Best effort: Packets not discarded capriciously,
delivery failure not necessarily reported
• Connectionless: Each packet is treated
independently
8/23/2023 77
IP Service
• Delivery service of IP is minimal
• Consequences:
78
21-Mar-20 79
Figure 24-
3 IP
Datagram
Why IPv6?
• Deficiency of IPv4
• Address space exhaustion
• New types of service 🡪 Integration
– Multicast
– Quality of Service
– Security
– Mobility (MIPv6)
• Header and format limitations
Problems with IPv4: Address Space Exhaustion
8/23/2023 82
Problems with IPv4: Routing Table Explosion
8/23/2023 84
Problems with IPv4: Other Limitations
8/23/2023 85
IP Address Extension
• Strict monitoring of IP address assignment
• Private IP addresses for intranets
– Only class C or a part of class C to an organization
– Encourage use of proxy services
• Application level proxies
• Network Address Translation (NAT)
• Remaining class A addresses may use CIDR
• Reserved addresses may be assigned
8/23/2023 88
IPv6: Advanced Features
• Header format simplification
• Expanded routing and addressing capabilities
• Improved support for extensions and options
• Flow labeling (for QoS) capability
• Auto-configuration and Neighbour discovery
• Authentication and privacy capabilities
• Simple transition from IPv4
8/23/2023 89
IPv6: Datagram
8/23/2023 90
Format of the Base header
8/23/2023 91
IPv6 Header Fields
• Version number (4-bit field)
The value is always 6.
• Flow label (20-bit field)
Used to label packets requesting special handling by routers.
• Traffic class (8-bit field)
Used to mark classes of traffic.
The nodes that originate a packet must identify different classes or different priorities of IPv6
packets. The nodes use the Traffic Class field in the IPv6 header to make this identification.
The routers that forward the packets also use the Traffic Class field for the same purpose.
• Payload length (16-bit field)
Length of the packet following the IPv6 header, in octets.
• Next header (8-bit field)
The type of header immediately following the IPv6 header.
8/23/2023 92
IPv6 Header Fields
• Hop limit (8-bit field)
Decremented by 1 by each node that forwards the packet.
Packet discarded if hop limit is decremented to zero.
• Source Address (128-bit field)
An address of the initial sender of the packet.
• Destination Address (128-bit field)
An address of the intended recipient of the packet. May not be the
ultimate recipient, if Routing Header is present.
8/23/2023 93
IPv6 Header Fields
8/23/2023 94
FLOW LABEL
• In version 6, the flow label has been directly added to
the format of the IPv6 datagram to allow us to use IPv6
as as connection-oriented protocol.
• To a router, a flow is a sequence of packets that share
the same characteristics, such as traveling the same
path, using the same resources, having the same
kind of security, and so on.
8/23/2023 95
Flow label
• A router that supports the handling of flow labels has a flow
label table.
• The table has an entry for each active flow label; each entry
defines the services required by the corresponding flow
label.
• When the router receives a packet, it consults its flow label
table to find the corresponding entry for the flow label value
defined in the packet.
• It then provides the packet with the services mentioned in
the entry.
8/23/2023 96
Flow label
• In its simplest form, a flow label can be used to
speed up the processing of a packet by a router.
• When a router receives a packet, instead of
consulting the routing table and going through a
routing algorithm to define the address of the next
hop, it can easily look in a flow label table for the
next hop.
8/23/2023 97
Flow label
• Flow label can be used to support the transmission of real-time
audio and video.
• Real-time audio or video, particularly in digital form, requires
resources such as high bandwidth, large buffers, long
processing time, and so on.
• A process can make a reservation for these resources
beforehand to guarantee that real-time data will not be delayed
due to a lack of resources.
• The use of real-time data and the reservation of these
resources require other protocols such as Real-Time Protocol
(RTP) and Resource Reservation Protocol (RSVP) in addition to
IPv6
8/23/2023 98
Flow label
• To allow the effective use of flow labels, three rules have been
defined:
1. The flow label is assigned to a packet by the source host. The
label is a random number between 1 and 2^24 – 1. A source
must not reuse a flow label for a new flow while the existing flow
is still alive.
2. If a host does not support the flow label, it sets this field to zero.
If a router does not support the flow label, it simply ignores it.
3. All packets belonging to the same flow have the same source,
same destination, same priority, and same options.
8/23/2023 99
Header: from IPv4 to IPv6
Changed Removed
IPv4 Vs IPv6
21-Mar-20 101
IPv4 & IPv6 Header Comparison
IPv4 Header IPv6 Header
Version IHL Type of Service Total Length
Version Traffic Class Flow Label
Fragment
Identification Flags
Offset
Next
Payload Length Hop Limit
Header
Time to Live Protocol Header Checksum
Source Address
Destination Address
Source Address
Options Padding
8/23/2023 104
Extension Headers
8/23/2023 105
Extension Headers
8/23/2023 106
Extension Headers
8/23/2023 107
MPLS
• VPNs - using MPLS, service providers can create IP tunnels throughout their
network, without the need for encryption or end-user applications
• Layer 2 Transport - New standards being defined by the IETF's PWE3 and
PPVPN working groups allow service providers to carry Layer 2 services
including Ethernet, Frame Relay and ATM over an IP/MPLS core
Some MPLS Benefits
• Elimination of Multiple Layers - Typically most carrier networks employ an
overlay model where SONET/SDH is deployed at Layer 1, ATM is used at
Layer 2 and IP is used at Layer 3. Using MPLS, carriers can migrate many
of the functions of the SONET/SDH and ATM control plane to Layer 3,
thereby simplifying network management and network complexity.
Eventually, carrier networks may be able to migrate away from SONET/SDH
and ATM all-together, which means elimination of ATM's inherent "cell-tax" in
carrying IP traffic.
MPLS History
• IP over ATM
• IP Switching by Ipsilon
• Cell Switching Router (CSR) by Toshiba
• Tag switching by Cisco
• Aggregate Route-based IP Switching (IBM)
• IETF – MPLS
– http://www.ietf.org/html.charters/mpls-charter.html
– RFC3031 – MPLS Architecture
– RFC2702 – Requirements for TE over MPLS
– RFC3036 – LDP Specification
MPLS and ISO model
(MPLS is a layer 2.5 protocol)
Applications
TCP UDP
IP
MPLS MPS
PPP FR ATM Ethernet DWDM
Physical
• What is it?
• Goal: sending a packet from A to B
– We can do it in a broadcast way.
– We can use source routing where the source
determines the path.
– How do we do it on the Internet today?
• Hop-by-hop routing: continue asking who is
closer to B at every stop (hop).
Using Label on the network
(This is not new!)
• ATM: VPI/VCI
• Frame Relay: DLCI
• X.25: LCI (logical Channel Identifier)
• TDM: the time slot (Circuit Identification Code)
• Ethernet switching: ???
Q: do you see any commonality of these labels?
Label Substitution (swapping)
Label-A1 Label-B1
Label-A2 Label-B2
Label-A3 Label-B3
Label-A4 Label-B4
MPLS
• A protocol to establish an end-to-end path from
source to the destination
• A hop-by-hop forwarding mechanism
• Use labels to set up the path
– Require a protocol to set up the labels along the path
• It builds a connection-oriented service on the IP
network
Terminology
• LSR - Routers that support MPLS are called Label Switch Router
• LER - LSR at the edge of the network is called Label Edge Router (a.k.a
Edge LSR)
– Ingress LER is responsible for adding labels to unlabeled IP packets.
– Egress LER is responsible for removing the labels.
• Label Switch Path (LSP) – the path defined by the labels through LSRs
between two LERs.
• Label Forwarding Information Base (LFIB) – a forwarding table (mapping)
between labels to outgoing interfaces.
• Forward Equivalent Class (FEC) – All IP packets follow the same path on
the MPLS network and receive the same treatment at each node.
How does it work?
IP Label Label IP
Routing Switching Switching Routing
18CSC310J-DCNSD NWC/SRMIST 124
MPLS Operation
R1 --- E0 172.16.1.0
S1 6
R2 6 S0 172.16.1.0
S2 11
R3 11 S0 172.16.1.0
S3 7
R4 7 S1 172.26.1.0
Q: create LFIB for R4 => R3 => R2 => R1
E0 --
MPLS process
Routing Protocol
L2 Label IP Datagram
Header Header
MPLS Encapsulation is specified over various media
types. Labels may use existing format (e.g., VPI/VCI)
or use a new shim label format.
Shim Header
• Traffic Engineering
• Virtual Private Network
• Quality of Service (QoS)
Traffic Engineering
• Traffic engineering allows a network administrator to make the path
deterministic and bypass the normal routed hop-by-hop paths. An
administrator may elect to explicitly define the path between stations
to ensure QoS or have the traffic follow a specified path to reduce
traffic loading across certain hops.
• The network administrator can reduce congestion by forcing the
frame to travel around the overloaded segments. Traffic engineering,
then, enables an administrator to define a policy for forwarding
frames rather than depending upon dynamic routing protocols.
• Traffic engineering is similar to source-routing in that an explicit
path is defined for the frame to travel. However, unlike source-
routing, the hop-by-hop definition is not carried with every frame.
Rather, the hops are configured in the LSRs ahead of time along with
the appropriate label values.
MPLS – Traffic Engineering
Overload !!
LER 1 LER 4 IP
IP Overload !!
IP L
IP L
Forward to IP L
LSR 2
LSR 3
LSR 4 LSR 2 LSR 3
LSR X
MPLS MPLS
Edge Edge
VPN_A MPLS Core VPN_A
10.2.0.0 11.5.0.0
VPN_B VPN_A
10.2.0.0 10.1.0.0
VPN_A
11.6.0.0 VPN_B
10.3.0.0
VPN_B
10.1.0.0
E1 E1
E3 E3
E1 E2 E2
E2
192.168.3.0 -- E1 10 E3
10 E1 30 E2 30 E3 -- E1 192.168.4.0
-- E2 20 E3
20 E1 40 E2 40 E3 -- E2
LSP
70 E3 -- E1 50 E2 70 E1 -- E1 50 E3
80 E3 -- E1 60 E2 80 E1 -- E2 60 E3
LSP
MPLS and QoS
• An important proposed MPLS capability is quality of service (QoS) support.
• QoS mechanisms:
– Pre-configuration based on physical interface
– Classification of incoming packets into different classes
– Classification based on network characteristics (such as congestion,
throughput, delay, and loss)
• A label corresponding to the resultant class is applied to the packet.
• Labeled packets are handled by LSRs in their path without needing to be
reclassified.
• MPLS enables simple logic to find the state that identifies how the packet should be
scheduled.
• The exact use of MPLS for QoS purposes depends a great deal on how QoS is
deployed.
• Support various QoS protocols, such as IntServ, DiffServ, and RSVP.
FEC QoS Classification
LER LSR
Layer 3
IPV4
Version ToS Data
other IP header info
Length 1 Byte
7 6 5 4 3 2 1 0
IP Precedence Unused
Bits;
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Label | EXP |S| TTL |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
MPLS between Carriers?
Carrier-B
Carrier-A
FDDI
Dual Ring
LSA
X R1
LSA
R2
R3
N1 R1 N5
Cost = 10 Cost = 10
R4
N4 Cost = 10
OSPF: How it works
• Hello Protocol
– Responsible for establishing and maintaining
neighbour relationships
– Elects Designated Router on broadcast networks
Hello
FDDI
Dual Ring
Hello Hello
OSPF: How it works
• Hello Protocol
– Hello Packets sent periodically on all OSPF enabled interfaces
– Adjacencies formed between some neighbours
• Hello Packet
– Contains information like Router Priority, Hello Interval, a list of
known neighbours, Router Dead Interval, and the network
mask
OSPF: How it works
• Trade Information using LSAs
– LSAs are added to the OSPF database
– LSAs are passed on to OSPF neighbours
• Each router builds an identical link state database
• SPF algorithm run on the database
• Forwarding table built from the SPF tree
OSPF: How it works
• When change occurs:
– Announce the change to all OSPF neighbours
– All routers run the SPF algorithm on the revised database
– Install any change in the forwarding table
Broadcast Networks
• These are network technologies such as Ethernet and
FDDI
• Introduces Designated and Backup Designated routers
(DR and BDR)
– Only DR and BDR form full adjacencies with other routers
– The remaining routers remain in a “2-way” state with each other
• If they were adjacent, we’d have n-squared scaling problem
– If DR or BDR “disappear”, re-election of missing router takes
place
Designated Router
Backup
Designated Designated
Router Router
Designated Backup
Router Designated Router
Designated Router
• All routers are adjacent to the DR
– All routers are adjacent to the BDR also
• All routers exchange routing information with DR (..)
– All routers exchange routing information with the BDR
• DR updates the database of all its neighbours
– BDR updates the database of all its neighbours
• This scales! 2n problem rather than having an n-squared
problem.
Designated Router
DR BDR
DR
144.254.3.5
More Advanced OSPF
• OSPF Areas
• Virtual Links
• Router Classification
• OSPF route types
• External Routes
• Route authentication
• Equal cost multipath
OSPF Areas
• Group of contiguous
hosts and networks
• Per area topological
database
– Invisible outside the area
– Reduction in routing traffic
Area 2 Area 3
• Backbone area
contiguous Area 0
Backbone Area
– All other areas must be
connected to the
backbone
• Virtual Links
Area 1
Area 4
OSPF Areas
• Reduces routing traffic in area 0
• Consider subdividing network into areas
– Once area 0 is more than 10 to 15 routers
– Once area 0 topology starts getting complex
• Area design often mimics typical ISP core network design
• Virtual links are used for “awkward” connectivity
topologies (…)
Virtual Links
• OSPF requires that all areas MUST be connected to area
0
• If topology is such that an area cannot have a physical
connection to a device in area 0, then a virtual link must
be configured
• Otherwise the disconnected area will only be able to have
connectivity to its immediately neighbouring area, and not
the rest of the network
Classification of Routers
IR
Area 2 Area 3
ABR/BR
Area 0 • Internal Router (IR)
• Area Border Router
ASBR (ABR)
• Backbone Router (BR)
To other AS
Area 1
• Autonomous System
Border Router (ASBR)
OSPF Route Types
Cost = 8
R3
Network Type 1 Next Hop
N1 11 R2
N1 10 R3 Selected Route
External Routes
• Type 2 external metric: metrics are compared without
adding to the internal link cost
to N1
External Cost = 1
Cost = 10
R1
to N1
R2 External Cost = 2
Cost = 8
R3
Network Type 2 Next Hop
N1 1 R2 Selected Route
N1 2 R3
Route Authentication
• Now recommended to use route authentication for OSPF
– …and all other routing protocols
• Susceptible to denial of service attacks
– OSPF runs on TCP/IP
– Automatic neighbour discovery
• Route authentication – Cisco example:
router ospf <pid>
network 192.0.2.0 0.0.0.255 area 0
area 0 authentication
interface ethernet 0/0
ip ospf authentication-key <password>
Equal Cost Multipath
• If n paths to same destination have equal cost, OSPF will
install n entries in the forwarding table
– Loadsharing over the n paths
– Useful for expanding links across an ISP backbone
• Don’t need to use hardware multiplexors
• Don’t need to use static routing
Summary
• Link State Protocol
• Shortest Path First
• OSPF operation
• Broadcast networks
– Designated and Backup Designated Router
• Advanced Topics
– Areas, router classification, external networks, authentication,
multipath
IS-IS
Level 1 routers only know what the local area looks like.
If a level 1 router wants to reach something outside of its
area, it has to use a level 2 router. In each area, we
configure one router as a level 1-2 router.
IS-IS
Areas and Router Roles
These level 1-2 routers will establish two neighbor
adjacencies:
R2 now has a second database, the level 2 database. Besides its level 1
database and level 1 LSP, it now also has a level 2 database. It generates a level
2 LSP and all prefixes for interfaces that are directly connected and advertised in
IS-IS.
IS-IS
Link State Packets
A few seconds later, R1 and R2 form a level 1 neighbor adjacency:
IS-IS
Link State Packets
• Once again, R1 and R2 will exchange their level 1
LSPs.
• R2 receives the level 1 LSP from R1 and it copies
new prefixes from its level 1 database to the LSP
in the level 2 database.
• In this example, that is 1.1.1.1/32 from R1.
IS-IS
Link State Packets
• Add a second area now, similar to area 12.
• There is no connection yet between the two
areas but the routers have formed a level 1
neighbor adjacency within the area:
IS-IS
Link State Packets
R4 has learned about the 3.3.3.3/32 prefix from R3 and copies this
prefix from the LSP in the level 1 database to its own LSP in the level
2 database.
BGP
R1 R2 R4
10/8 -> R3
10.1/16
10.1/16 -> R4
20/8 -> R5
30/8 -> R6
…..
R2’s IP routing table
IP route lookup: Longest match
routing
R3
All 10/8 except
Packet: Destination 10.1/16
IP address: 10.1.1.1
R1 R2 R4
10.1/16
10/8 -> R3 10.1.1.1 & FF.0.0.0
10.1/16 -> R4 is equal to Match!
20/8 -> R5 10.0.0.0 & FF.0.0.0
…..
R2’s IP routing table
IP route lookup: Longest match
routing
R3
All 10/8 except
Packet: Destination 10.1/16
IP address: 10.1.1.1
R1 R2 R4
10.1/16
10/8 -> R3
10.1/16 -> R4 10.1.1.1 & FF.FF.0.0
20/8 -> R5 is equal to Match as well!
10.1.0.0 & FF.FF.0.0
…..
R2’s IP routing table
IP route lookup: Longest match
routing
R3
All 10/8 except
Packet: Destination 10.1/16
IP address: 10.1.1.1
R1 R2 R4
10.1/16
10/8 -> R3
10.1/16 -> R4
20/8 -> R5 10.1.1.1 & FF.0.0.0
….. is equal to
Does not match!
20.0.0.0 & FF.0.0.0
R2’s IP routing table
IP route lookup: Longest match
routing
R3
All 10/8 except
Packet: Destination 10.1/16
IP address: 10.1.1.1
R1 R2 R4
10.1/16
10/8 -> R3
10.1/16 -> R4 Longest match, 16 bit netmask
20/8 -> R5
…..
R2’s IP routing table
IP route lookup: Longest match
routing
• default is 0.0.0.0/0
• can handle it using the normal longest match algorithm
• matches everything. Always the shortest match.
Forwarding
AS 100
accept announce
AS 1 announce
Routing flow
accept
AS2
ingress
packet flow
• Static Routes
– configured manually
• Connected Routes
– created automatically when an interface is ‘up’
• Interior Routes
– Routes within an AS
• Exterior Routes
– Routes exterior to AS
What Is an IGP?
• Interior • Exterior
– Automatic Specifically configured
discovery peers
– Generally trust Connecting with outside
your IGP routers networks
– Routes go to all
Set administrative
IGP routers
boundaries
Hierarchy of Routing Protocols
Other ISP’s
BGP4
BGP4 / OSPF
BGP4 BGP4/Static
Local NAP
FDDI
Customers
Demilitarized Zone (DMZ)
A C
DMZ
AS 100 Network AS 101
B D
AS 102
• Terminology
• Protocol Basics
• Messages
• General Operation
• Peering relationships (EBGP/IBGP)
• Originating routes
Terminology
• Neighbor
– Configured BGP peer
• NLRI/Prefix
– NLRI - network layer reachability information
– Reachability information for a IP address & mask
• Router-ID
– Highest IP address configured on the router
• Route/Path
– NLRI advertised by a neighbor
Protocol Basics
Peering
A C
AS 100 AS 101
B D
A C
AS 100 AS 101
220.220.8.0/24 220.220.16.0/24
B D
BGP speakers E
are called peers
Peers in different AS’s
AS 102
220.220.32.0/24
are called External Peers
eBGP TCP/IP
Peer Connection
Note: eBGP Peers normally should be directly connected.
BGP Peers
A C
AS 100 AS 101
220.220.8.0/24 220.220.16.0/24
B D
A C
AS 100 AS 101
220.220.8.0/24 220.220.16.0/24
B D
(NLRI)
BGP Update
Messages
Configuring BGP Peers
AS 100 eBGP TCP Connection AS 101
222.222.10.0/30
A .2 220.220.8.0/24 .1 B .2 .1 C .2 220.220.16.0/24 .1 D
AS 400
150.10.0.0/16
Network Path
AS 500 180.10.0.0/16 300 200 100
170.10.0.0/16 300 200
150.10.0.0/16 300 400
Next Hop Attribute
AS 300
AS 200 192.10.1.0/30 140.10.0.0/16
150.10.0.0/16 C .1 .2 D
E
B
.2
/30
Network Next-Hop Path
.2.0
192
.20 160.10.0.0/16 192.20.2.1 100
.1
• Next hop to reach a network
A
• Usually a local network is the next
AS 100 hop in eBGP session
160.10.0.0/16
BGP Update
Messages
Next Hop Attribute
AS 300
AS 200 192.10.1.0/30 140.10.0.0/16
150.10.0.0/16 C .1 .2 D
E
B
.2 Network Next-Hop Path
/30
150.10.0.0/16 192.10.1.1 200
.20
.2.0
• Next hop to reach a network
160.10.0.0/16 192.10.1.1 200 100
• Usually a local network is the
192
.1
A next
hop in eBGP session
AS 100
160.10.0.0/16
• Next Hop updated between
eBGP Peers
BGP Update
Messages
Next Hop Attribute
AS 300
AS 200 192.10.1.0/30 140.10.0.0/16
150.10.0.0/16 C .1 .2 D
E
B
.2
/30
Network Next-Hop Path
.2.0
150.10.0.0/16 192.10.1.1 200
.20
160.10.0.0/16 192.10.1.1 200 100 • Next hop not
192
.1
changed
A between iBGP
AS 100
peers
160.10.0.0/16
BGP Update
Messages
Next Hop Attribute (more)
• IGP should carry route to next hops
• Recursive route look-up
• Unlinks BGP from actual physical topology
• Allows IGP to make intelligent forwarding decision
BGP Updates —
Withdrawn Routes
AS 321
AS 123
.1 192.168.10.0/24 .2
BGP Update
Message
Withdraw
Withdraw Routes
Routes
192.192.25.0/24
192.192.25.0/24
x
Connectivity lost 192.192.25.0/24
D 10.1.2.0/24
D 160.10.1.0/24
D 160.10.3.0/24
R 153.22.0.0/16
S 192.1.1.0/24
BGP ‘aggregate-address’ commands may
be used to install summary routes in the
Route Table BGP RIB
BGP Routing Information Base
BGP RIB
Network Next-Hop Path
*> 160.10.0.0/16 0.0.0.0 i
* i 192.20.2.2 i
s> 160.10.1.0/24 192.20.2.2 i
s> 160.10.3.0/24 192.20.2.2 i
*> 192.1.1.0/24 192.20.2.2 ?
D 10.1.2.0/24
D 160.10.1.0/24 • Best paths installed in routing table if:
D 160.10.3.0/24
R 153.22.0.0/16
• prefix and prefix length are unique
S 192.1.1.0/24 • lowest “protocol distance”
B 173.21.0.0/16
Route Table
The ‘Bible’ & other resources
• Route-views.oregon-ix.net
AS 100 AS 101
C
AS200
F
B AS21
C
D
AS101 AS675
E
• Withdrawn routes
• Path Attributes
• Advertised routes
BGP Path Attributes: Why ?
• Encoded as Type, Length & Value (TLV)
• Transitive/Non-Transitive attributes
• Some are mandatory
• Used in path selection
• To apply policy for steering traffic
BGP Path Attributes...
• Origin
• AS-path
• Next-hop
• Multi-Exit Discriminator (MED)
• Local preference
• BGP Community
• Others...
AS-PATH
AS 200 AS 100
170.10.0.0/16 180.10.0.0/16
• Sequence of ASes a route
has traversed 180.10.0.0/16
dropped
• Loop detection AS 300
AS 400
150.10.0.0/16
AS 200
150.10.0.0/16 AS 300
A B
150.10.0.0/16 150.10.1.1
160.10.0.0/16 150.10.1.1
AS 100
160.10.0.0/16
• Next hop router to reach a network
• Advertising router/Third party in
EBGP
• Unmodified in IBGP
AS 200
192.68.1.0/24 150.1.1.3
C
150.1.1.1
peering
150.1.1.2 150.1.1.3
150.1.1.3
A B
192.68.1.0/24
AS 201
AS 100
160.10.0.0/16
AS 200 AS 300
D 500 800 E
A B
160.10.0.0/16 500
AS 400
> 160.10.0.0/16 800
C
Multi-Exit Discriminator
• Non-transitive
• Represented as a numeric value (0-0xffffffff)
• Used to convey the relative preference of entry points
• Comparable if paths are from the same AS
• Path with lower MED wins
• IGP metric can be conveyed as MED
Multi-Exit Discriminator (MED)
AS 200
C
preferred
192.68.1.0/24 2000 192.68.1.0/24 1000
A B
192.68.1.0/24
AS 201
Origin
• Transitive, Non-mandatory
• Represented as a numeric value (0-0xffffffff)
• Used to group destinations
• Each destination could be member of multiple
communities
• Flexibility to scope a set of prefixes within or across
AS for applying policy
Community...
C D
Community:201:110 Community:201:120
A B
192.68.1.0/24
Customer AS 201
Synchronization
1880
C
A
D OSPF
690 35/8
• C not running BGP (non-pervasive BGP) 209
• A won’t advertise 35/8 to D until the IGP is inBsync
• Turn synchronization off!
– Run pervasive BGP
AS 200 AS 300
D
Increase AS path attribute length
by at least 1
A B
AS 400
AS 400’s Policy to reach AS100
AS 200 preferred path
AS 300 backup
OTV
• Trunk configuration will extend more than one VLAN across the
overlay. There is no need to apply OTV-specific configuration to
these interfaces.
• “Join” the overlay network and discover the other remote OTV
edge devices.
• Form OTV adjacencies with the other OTV edge devices
belonging to the same VPN.