M07 - Fabric Operation and Forwarding
M07 - Fabric Operation and Forwarding
Packet comes in
MAC A1 MAC A2 MAC B1
IP A1 IP A2 IP B1
It is not magic. It still follows standard L2/L3 behavior (most of the time)
Still need to understand how legacy networking works
Layer the new concepts in ACI on top of that knowledge.
Virtual eXtensible LAN
(VXLAN) Basics
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
VXLAN Overview
Problem: VXLAN extends L2 connectivity
Layer 2 adjacency over a Layer 3 Fabric across L3 boundary
Provides integration between
VXLAN and non-VXLAN
infrastructures
Higher scalability than VLANs –
up to 16 million VXLAN segments
Utilizes all available paths – no
STP blocking
Cisco Nexus 9000 Series
switches provide hardware-based
VXLAN function
VXLAN Technology Overview
• Uses MAC-in-UDP encapsulation
• Leverages multicast in the transport network to simulate
flooding behavior for BUM (broadcast, unknown unicast
and multicast) traffic in the Layer 2 segment
• Leverage ECMP to achieve optimal path over the underlay
(transport) network
Data Center Challenges
Old Challenges New Challenges
Layer 2 limitations Move to microsegmentation
Layer 3 benefits requires more segments
vMotion requirements (more than 4096)
Multitenancy
Overlapping IP addresses
Dynamic network constructs
Layer-2 adjacency
VXLAN Solutions to Legacy Problems
Challenge VXLAN Solution
MicroSegmentation (802.1q 24-bit VNIs provide up to 16m different VNIDs
supports only 4096 VLANs)
Multitenancy Through use of VXLAN, IP space can be reused in
Overlapping Address Space different VNIDs
Network Agility Using network virtualization and an underlay
network, it is possible to carry out network
adaptations quickly from software without having to
adjust physical network components
Layer 2 Adjacency VXLAN allows Layer-2 network adjacency to be
stretched across L3 underlay networks
VXLAN Overview
Underlay Network
VNID (Representing Segment)
VNID: 1500635
Egress VTEP
Routing decisions made VTEP to VTEP decapsulates VXLAN
Ingress VTEP encapsulates header
in VXLAN header Virtual VTEP Physical VTEP
VTEP VTEP
vSwitch
Flags/DRE LB – Indicates that the Discounting Rate Estimator (DRE) load balancing is in use
DL – Indicates destination TEP should not learn source TEP to Inner IP binding
DRE
LB DL E SP DP E – Indicates frame has experienced a forwarding exception (fast rerouting, bounce)
Mcast SP – Indicates Source Policy has been applied
DP – Indicates Destination Policy has been applied
DRE – used to indicate extent of path load (only used when LB bit is set)
Mcast – Indicates if Multicast frame is to be bridged or routed and the original encap
Source Group Source Group – Identifier indicating which EPG policy the frame belongs to
M – Indicates which Bank of Atomic Counters should be used for this packet
M
LBTag LBMetric LBTag – Indicates which LBTag the LB Feedback is for (which group)
SPort LBMetric – Indicates congestion metric for the specific LBTag
SPort – Indicates the original source port
VXLAN Tunnel Endpoint (VTEP)
VTEPs originate and terminate VXLAN tunnels
Each VTEP function has two interfaces:
Switch interface on the local LAN segment VTEP VTEP
SPINE1 SPINE2
10.0.208.126/32 10.0.40.95/32
IPv4 Proxy VTEP
10.50.50.101/32
VTEP VTEP
MAC Proxy VTEP
10.50.30.101/32
VTEP VTEP
LEAF1 LEAF2
10.0.208.127/32 10.0.208.124/32
ACI Initial Configuration Overview
Mapping the ACI Logical Model to 7 Layer OSI for Network
Engineers
7 Layer OSI Model ACI Constructs that apply
Application
Presentation
Contracts, Graphs, ANP, EPG
Session
Transport
Network BD (SVI), L3 Private Network (VRF lite)
Data Link BD, Policy Groups (VPC, PC, Interfaces),
Encapsulation (VLAN, VXLAN, NVGRE)
Physical Access Policy, AEP, Domains (Physical/VMM)
ACI Initial Configuration Steps
Infrastructure Admin:
Tenant Admin
Fabric Discovery
Create L3 Private Networks (VRFs)
Out of Band Management (OOB)
Create Bridge Domains
Configure NTP
Associate with Physical Domain
Configure Access Policies (ports)
Create EPGs
Create VLAN pools
Associate EPGs to L3 Private
Create Physical Domain Networks and Bridge Domains
Configure Attachable Entity Profile
Create Tenant(s)
Create Tenant Admin role(s)
ACI Policy Types
Two policy types under fabric:
Fabric Policies configure interfaces
that connect spine and leaf switches
Access Policies configure external-
facing interfaces (i.e. servers, etc.)
Example Fabric Policies:
– NTP
– IS-IS
– BGP
– DNS
ACI Fabric Fundamentals
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
ACI Encapsulation Normalization
Fabric translates an external identifier to distinguish different
application end-points via the internal VXLAN tagging format
External Identifiers are localized to specific Leaf or Leaf ports
VTEP
Eth / IP Payload
802.1q IP Payload
• Group of EPs which decides who can talk to who from • aMAC/IP B1 -> Leaf3
Spine
Spine Proxy
• If ingress Leaf doesn’t know the destination, Leaf sends
packet to one of Spines for proxy
Fabric Forwarding
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Leaf Packet Flow
Ingress: Egress:
Proxy Lookup
Derive EPG VxLAN termination
Station lookup Station lookup VTEP VTEP
Cached
Policy lookup Policy lookup Unknown
on Leaf
Encap Egress port selection VTEP VTEP VTEP
VTEP
Bounce
These protocols do not leave the fabric. They are internal only.
COOP Protocol
Two roles, two lines of algorithm
Roles
Citizen
Oracle
Two lines of algorithm
If you’re a Citizen, and you learn something, go tell an Oracle
If you’re an Oracle and you learn something from a Citizen, tell the other Oracles
COOP is used to distribute end point information only.
Spines know where every endpoint is, spines learn this through leaf+COOP
Endpoints know where their local endpoints are and cache *some* remote leaf endpoints
If a leaf doesn’t know where an endpoint is, the spines will know
ISIS Each spine and leaf gets an
infrastructure IPs, used for
An anycast IP is used as the IP address
leafs use to notify the Oracles
reachability, VTEP, etc.
Infra IP
ISIS is only used for the internal Infra IP Infra IP Infra IP Infra IP
infrastructure IP addresses/VTEPs (one in
the same)
ISIS is not supported for external routes VTEP
VTEP VTEP VTEP VTEP VTEP VTEP
ISIS creates a very simple, fairly static
routing table of the fabric
Additional anycast VTEP is assigned
ISIS table only changes when links go to vPC pairs, this is the destination
IP for hosts that are on a vPC
down or leafs/spines are added or
removed Links are IP unnumbered,
keeping the routing table simple
MP-BGP
This is not MP-BGP EVPN
This is MP-BGP, and only used to distribute external routes
learned on a single leaf to the rest of the leafs
Route reflectors distribute
learned external route to
Two spines are designated as
other leafs
route reflectors (configured
when fabric is built)
MP-BGP
OSPF
Leaf 1 learns about External router announces
10.10.10.0/22 through MP- route: 10.10.10.0/22
BGP route reflector
Location Independent Forwarding – Layer 2 and Layer 3
Layer 2/3 forwarding semantics supported without changes to
applications or EP IP stacks
Fabric provides
pervasive SVI
Forwards Layer 2 and
Layer 3 traffic directly to
the destination endpoint
Fabric learns MAC for
non-IP packets
10.1.1.10/24 10.1.3.12/24 10.6.3.2/24 10.1.3.35/24
Fabric learns IP address
for all other packets Distributed default gateway
ACI Fabric Unicast Forwarding Overview
If ingress Leaf does not contain cache entry for IP to egress VTEP binding set VTEP
4b address as anycast VTEP which will perform inline HW lookup and perform egress VTEP VTEP VXLAN IP Payload
rewrite. No additional latency nor any decrease in throughput due to lookup
VTEP
If Leaf has learned the Inner IP to egress
4a VTEP binding, it will set required VTEP
address and forward directly to egress
Leaf
IP Payload IP Payload
7
Packet Sourced from VM attached to Ingress Packet transmitted on vSwitch port
Port Group or directly from physical server 1
Forwarding – Bounce
Ingress Leaf will continue to forward to
original egress Leaf until return traffic
updates cache entry (location, identity)
with new Leaf or COOP updates Leaf sends an update via
mapping entry COOP to spine Oracles
Leaf has multiple equal cost routes to all other Spine will forward traffic to any remaining leaves on
leaves across all active paths detection of direct path loss to the target TEP
On loss of fabric link, leaf node removes the entry Leaf will forward along any valid path to target TEP from
for that link and HW hashes subsequent frames the list of valid paths (excluding the arriving interface)
across remaining paths IS-IS will eventually remove route to spine (with the
Hardware detects link failure and updates failed downlink)
forwarding entries in ~125 µsec
Overview of ACI Fabric Multicast Forwarding
• Spine Switches maintain a table of
FTAG Root for FTAG Root for FTAG Root for FTAG Root for
GIPo (Multicast IP Overlay Group)
Tree 0, 4, 8, 12 Tree 1, 5, 9, 13 Tree 2, 6, 10, 14 Tree 3, 7, 11, 15
to Leaf binding.
• A Leaf will receive traffic for a GIPo Spine replicates and forwards the
if the EPG BD exists on that Leaf. 4 frame to the Leafs based on the
• The GIPo represents a multicast GIPo address
TEP.
GIPo VXLAN IP Payload
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
Endpoint Learning & Lookup
Tenant Endpoint Learning: Spine maintains copy of
1. Data plane learning creates (location, all EP (location, identity)
APIC
identity) mapping at the Leaf
2. DHCP forwarding can be used to learn
(location, identity) mapping entry
3. ARP/GARP/RARP learning VMM updates
4. VMM programs (location, identity) APIC upon
VM creation
mapping DME
5. APIC statically programs (location, ARP/DHCP
identity) mapping
• COOP used to communicate the Data Plane Port Grp
(location, identity) mapping to the Learning
spine proxy
Endpoint Identity/Location Learning Example
COOP ensures all spines maintain a consistent copy of Endpoint addresses
and location information
4 Database Replication
Egress Leaf learns
internal IP/MAC; caches
3 Leaf forwards EP address 5 source VTEP address
information to Spine
‘Oracle’ via COOP VTEP VXLAN IP Payload
DRE
LB DL E SP DP
Mcast SP: Indicates Source Policy has been applied
DP: Indicates Destination Policy has been applied
LISP flags Flags/DRE Source Group == EPG VXLAN Instance ID (VNID) M/LB/SP
ACI Policy Enforcement Example
Ingress Leaf populates Source Group based on classification
3
If Leaf knows egress EPG > enforces policy
SP policy bit set (indicates ingress policy invoked)
If SP flag = ‘0’ > egress Leaf sets
VTEP
Flags Source
VNID Payload
4 DP policy bit & enforces policy
SP ==1 Group
Forwards frame to vSwitch.
Flags Source
VTEP DP==1 Group VNID Payload
1
EPG derived based on ingress classification (port group, physical port, IP address, VLAN)
What is APIC?
APIC is the policy controller
It’s not the control plane
It’s not in the data path
39
EPG Operational Tab – Client Endpoints
Navigate to the EPG Operational Tab and verify End Point is learned, including the path.
EPG Operational Tab – Global Endpoints
Navigate to the Fabric > Topology and verify End Point is learned, including the path.
EPG Operational Tab – Global Endpoints
Filter for a specific MAC or IP Address
Fabric Innovations
© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43
Fabric Scaling – Hardware-based Directed ARP Forwarding
Leaf uses DST_IP
address in ARP header
to perform VTEP lookup ARP Payload
for forwarding
Hardware type Protocol type
Hardware Protocol
VTEP VXLAN MAC ARP Operation
size size
Sender MAC address
Sender MAC address
Sender IP address
ARP Frame (cont).
ARP Frame MAC forwarded to
ARP
destination end
Sender IP address Destination MAC
sourced from end (cont). address
point point
Destination MAC address
Destination IP address
ARP Payload
Traffic to original
Hardware type Protocol type
VTEP destination VTEP VXLAN MAC GARP
is ‘bounced’ to Hardware Protocol
Operation
new Leaf size size
Standard High
Priority Priority
vPC Configuration in ACI Fabric
vPC Support in ACI Fabric APIC