0% found this document useful (0 votes)
53 views53 pages

M07 - Fabric Operation and Forwarding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views53 pages

M07 - Fabric Operation and Forwarding

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Cisco ACI

Fabric Operation and Forwarding


www.lumoscloud.com
learning@lumosconsultinginc.com
Agenda
 Virtual eXtensible LAN (VXLAN) Basics
 ACI Fabric Fundamentals
 ACI Fabric Forwarding
 ACI Endpoint Learning & Lookup
 ACI Fabric Innovations
ACI Magic
No Flooding
No difficult routing /
It knows where ACI Magic! switching
to send it

Forget about legacy


network

Packet comes in
MAC A1 MAC A2 MAC B1
IP A1 IP A2 IP B1

This may be an image of yours from a typical sales/marketing talk

It is not magic. It still follows standard L2/L3 behavior (most of the time)
 Still need to understand how legacy networking works
 Layer the new concepts in ACI on top of that knowledge.
Virtual eXtensible LAN
(VXLAN) Basics

© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4
VXLAN Overview
Problem:  VXLAN extends L2 connectivity
Layer 2 adjacency over a Layer 3 Fabric across L3 boundary
 Provides integration between
VXLAN and non-VXLAN
infrastructures
 Higher scalability than VLANs –
up to 16 million VXLAN segments
 Utilizes all available paths – no
STP blocking
 Cisco Nexus 9000 Series
switches provide hardware-based
VXLAN function
VXLAN Technology Overview
• Uses MAC-in-UDP encapsulation
• Leverages multicast in the transport network to simulate
flooding behavior for BUM (broadcast, unknown unicast
and multicast) traffic in the Layer 2 segment
• Leverage ECMP to achieve optimal path over the underlay
(transport) network
Data Center Challenges
Old Challenges New Challenges
 Layer 2 limitations  Move to microsegmentation
 Layer 3 benefits requires more segments
 vMotion requirements (more than 4096)
 Multitenancy
 Overlapping IP addresses
 Dynamic network constructs
 Layer-2 adjacency
VXLAN Solutions to Legacy Problems
Challenge VXLAN Solution
MicroSegmentation (802.1q 24-bit VNIs provide up to 16m different VNIDs
supports only 4096 VLANs)
Multitenancy Through use of VXLAN, IP space can be reused in
Overlapping Address Space different VNIDs
Network Agility Using network virtualization and an underlay
network, it is possible to carry out network
adaptations quickly from software without having to
adjust physical network components
Layer 2 Adjacency VXLAN allows Layer-2 network adjacency to be
stretched across L3 underlay networks
VXLAN Overview

Underlay Network
VNID (Representing Segment)

VNID: 1500635
Egress VTEP
Routing decisions made VTEP to VTEP decapsulates VXLAN
Ingress VTEP encapsulates header
in VXLAN header Virtual VTEP Physical VTEP
VTEP VTEP

vSwitch

IP: 10.1.10.100 IP: 10.1.10.200


VLAN 10 VLAN 10
VXLAN Encapsulation and Packet Format
• Entire L2 frame encapsulated in UDP; 50 bytes of overhead
• IP multicast used for L2 broadcast/multicast, unknown unicast
VXLAN Encapsulation Original Ethernet pkt
Outer Outer Inner Inner Optional Original
Outer Outer Outer Outer VXLAN ID
MAC MAC MAC Mac Inner Ethernet CRC
802.1q IP DA IP SA UDP (24-bits)
DA SA DA SA 802.1q Payload
8 Bytes

LISP flags Flags/DRE Source Group VXLAN Instance ID (VNID) M/LB/SP


1 Byte 3 Bytes

N L rsvd I rsvd ACI maps user NW context into the VNID:


N – Nonce-present bit • Tenant traffic was sourced from
L – Locator-Status-Bits field enabled bit • VRF if pkt is to be routed
• Bridge Domain if pkt is to be switched (bridged)
I – Instance ID bit; indicates valid VNID field
• EPG for Service Graph if pkt is to be redirected
Additional ACI VXLAN (VXLAN) Header Details
LISP flags Flags/DRE Source Group VXLAN Instance ID (VNID) M/LB/SP

Flags/DRE LB – Indicates that the Discounting Rate Estimator (DRE) load balancing is in use
DL – Indicates destination TEP should not learn source TEP to Inner IP binding
DRE
LB DL E SP DP E – Indicates frame has experienced a forwarding exception (fast rerouting, bounce)
Mcast SP – Indicates Source Policy has been applied
DP – Indicates Destination Policy has been applied
DRE – used to indicate extent of path load (only used when LB bit is set)
Mcast – Indicates if Multicast frame is to be bridged or routed and the original encap
Source Group Source Group – Identifier indicating which EPG policy the frame belongs to
M – Indicates which Bank of Atomic Counters should be used for this packet
M
LBTag LBMetric LBTag – Indicates which LBTag the LB Feedback is for (which group)
SPort LBMetric – Indicates congestion metric for the specific LBTag
SPort – Indicates the original source port
VXLAN Tunnel Endpoint (VTEP)
VTEPs originate and terminate VXLAN tunnels
Each VTEP function has two interfaces:
 Switch interface on the local LAN segment VTEP VTEP

 IP interface to infrastructure VLAN with the


following functions:
 Uniquely identifies the VTEP on transport network
 Encapsulates and decapsulates VXLAN frames VTEP VTEP

 Discovers remote VTEPs for its configured VXLAN


segments VTEP
 Learns remote MAC address-to-VTEP mappings to vSwitch
be used for forwarding lookups
ACI Fabric Addressing
Configure the IP addressing
for the infrastructure VLAN
during APIC initial setup
 Default: 10.0.0.0/16

SPINE1 SPINE2
10.0.208.126/32 10.0.40.95/32
IPv4 Proxy VTEP
10.50.50.101/32
VTEP VTEP
MAC Proxy VTEP
10.50.30.101/32

VTEP VTEP
LEAF1 LEAF2
10.0.208.127/32 10.0.208.124/32
ACI Initial Configuration Overview
Mapping the ACI Logical Model to 7 Layer OSI for Network
Engineers
7 Layer OSI Model ACI Constructs that apply
Application
Presentation
Contracts, Graphs, ANP, EPG
Session
Transport
Network BD (SVI), L3 Private Network (VRF lite)
Data Link BD, Policy Groups (VPC, PC, Interfaces),
Encapsulation (VLAN, VXLAN, NVGRE)
Physical Access Policy, AEP, Domains (Physical/VMM)
ACI Initial Configuration Steps

Infrastructure Admin:
Tenant Admin
 Fabric Discovery
 Create L3 Private Networks (VRFs)
 Out of Band Management (OOB)
 Create Bridge Domains
 Configure NTP
 Associate with Physical Domain
 Configure Access Policies (ports)
 Create EPGs
 Create VLAN pools
 Associate EPGs to L3 Private
 Create Physical Domain Networks and Bridge Domains
 Configure Attachable Entity Profile
 Create Tenant(s)
 Create Tenant Admin role(s)
ACI Policy Types
Two policy types under fabric:
 Fabric Policies configure interfaces
that connect spine and leaf switches
 Access Policies configure external-
facing interfaces (i.e. servers, etc.)
Example Fabric Policies:
– NTP
– IS-IS
– BGP
– DNS
ACI Fabric Fundamentals

© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 18
ACI Encapsulation Normalization
 Fabric translates an external identifier to distinguish different
application end-points via the internal VXLAN tagging format
 External Identifiers are localized to specific Leaf or Leaf ports

VTEP Flags Source Group VNID MAC/IP Payload

VTEP

Eth MAC Payload

Eth / IP Payload

802.1q IP Payload

Outer IP VXLAN IP Payload


802.1q VXLAN VXLAN NVGRE
VLAN 10 VNID 5876 VNID 2394 VSID 7456 Outer IP NVGRE IP Payload
Decoupled Identity and Location
ACI Fabric decouples tenant endpoint address from the
VTEP address
 Forwarding within fabric
is between VTEPs
VTEP VXLAN IP Payload

VTEP VTEP VTEP VTEP VTEP VTEP

 Mapping internal tenant MAC/IP address to location is performed


by VTEP using distributed mapping database maintained in
spines.
ACI Forwarding Basic and Important New Concepts
 End Point (EP) COOP Table
• All hosts data (MAC/IP) are handled as an EP. • MAC/IP
a A1 -> Leaf1
 End Point Group (EPG) • aMAC/IP A2 -> Leaf2

• Group of EPs which decides who can talk to who from • aMAC/IP B1 -> Leaf3

security perspective (contract)

 Bridge Domain L2 Flood in BD


• L2 forwarding domain in ACI is Bridge Domain (BD) which regardless of VLAN
BD
could have multiple subnets on it
GW GW GW
 Pervasive Gateway
• Each Leaf could be a default gateway for directly attached VLAN 1 VLAN 2 VLAN 3
EndPoints

 COOP MAC A1 MAC A2 MAC B1


EPG-A IP B1 EPG-B
• All EndPoints MAC/IP are stored in COOP table in each IP A1 IP A2

Spine
 Spine Proxy
• If ingress Leaf doesn’t know the destination, Leaf sends
packet to one of Spines for proxy
Fabric Forwarding

© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 22
Leaf Packet Flow
Ingress: Egress:
Proxy Lookup
 Derive EPG  VxLAN termination
 Station lookup  Station lookup VTEP VTEP
Cached
 Policy lookup  Policy lookup Unknown
on Leaf
 Encap  Egress port selection VTEP VTEP VTEP
VTEP
 Bounce

Functions Supported by ACI Spine


 Transit: IP forwarding of traffic between VTEP’s
 Proxy Lookup: Data Plane based directory for forwarding traffic based on mapping database of EID to
VTEP bindings
 Multicast Root: Root for one of the 16 multicast forwarding topologies (used for optimization of multicast
load balancing and forwarding)
 Not all functions are required on all spine switches
Distributed Forwarding Control Plane – Station Tables
Leaf forwarding table includes local Proxy Station Table
(directly attached) and global entries 10.1.3.36 Leaf 2
 Local station table contains addresses of 10.1.3.11 Leaf 1
all hosts attached directly to Leaf 00:25:b5:12:34:
Leaf 3
 Global station table contains local cache 56
of fabric endpoints 00:25:b5:ab:cd:
Proxy Proxy ProxyLeaf 5 Proxy
 Proxy station table in Spine contains ef
addresses of all hosts attached to the fabric ...

Leaf 1 Station Tables


Local Global
10.1.3.11 Port 9 10.1.3.36 Leaf 2
00:2…d:ef Leaf 5
* Proxy 10.1.3.11 10.1.3.36 00:25:b5:12:34:56 00:25:b5:ab:cd:ef
Protocols in Cisco ACI Fabric Forwarding
 There are three internal protocols that govern forwarding in the ACI fabric
 COOP (Council of Oracle Protocol) – Distributes learned end points
 ISIS - Underlay network (VTEP to VTEP) routing
 MP-BGP - Distribute routes learned on border leafs to other leafs

 These protocols do not leave the fabric. They are internal only.
COOP Protocol
 Two roles, two lines of algorithm
 Roles
 Citizen
 Oracle
 Two lines of algorithm
 If you’re a Citizen, and you learn something, go tell an Oracle
 If you’re an Oracle and you learn something from a Citizen, tell the other Oracles
 COOP is used to distribute end point information only.
 Spines know where every endpoint is, spines learn this through leaf+COOP
 Endpoints know where their local endpoints are and cache *some* remote leaf endpoints
 If a leaf doesn’t know where an endpoint is, the spines will know
ISIS Each spine and leaf gets an
infrastructure IPs, used for
An anycast IP is used as the IP address
leafs use to notify the Oracles
reachability, VTEP, etc.
Infra IP

 ISIS is only used for the internal Infra IP Infra IP Infra IP Infra IP
infrastructure IP addresses/VTEPs (one in
the same)
 ISIS is not supported for external routes VTEP
VTEP VTEP VTEP VTEP VTEP VTEP
 ISIS creates a very simple, fairly static
routing table of the fabric
Additional anycast VTEP is assigned
 ISIS table only changes when links go to vPC pairs, this is the destination
IP for hosts that are on a vPC
down or leafs/spines are added or
removed Links are IP unnumbered,
keeping the routing table simple
MP-BGP
 This is not MP-BGP EVPN
 This is MP-BGP, and only used to distribute external routes
learned on a single leaf to the rest of the leafs
Route reflectors distribute
learned external route to
Two spines are designated as
other leafs
route reflectors (configured
when fabric is built)

MP-BGP

OSPF
Leaf 1 learns about External router announces
10.10.10.0/22 through MP- route: 10.10.10.0/22
BGP route reflector
Location Independent Forwarding – Layer 2 and Layer 3
Layer 2/3 forwarding semantics supported without changes to
applications or EP IP stacks
 Fabric provides
pervasive SVI
 Forwards Layer 2 and
Layer 3 traffic directly to
the destination endpoint
 Fabric learns MAC for
non-IP packets
10.1.1.10/24 10.1.3.12/24 10.6.3.2/24 10.1.3.35/24
 Fabric learns IP address
for all other packets Distributed default gateway
ACI Fabric Unicast Forwarding Overview
If ingress Leaf does not contain cache entry for IP to egress VTEP binding set VTEP
4b address as anycast VTEP which will perform inline HW lookup and perform egress VTEP VTEP VXLAN IP Payload
rewrite. No additional latency nor any decrease in throughput due to lookup

VTEP
If Leaf has learned the Inner IP to egress
4a VTEP binding, it will set required VTEP
address and forward directly to egress
Leaf

VTEP VXLAN IP Payload VTEP VXLAN IP Payload 5


Egress Leaf will swap outer VXLAN
Leaf swaps ingress
3 with correct egress encapsulation
encapsulation with VXLAN VTEP VTEP and perform any required policy

VTEP VXLAN IP Payload 2 DMAC 802.1q IP Payload

vSwitch encapsulates frame Leaf forwards frame to vSwitch or 6


 Forwards to Leaf VTEP vSwitch (VMWare) vSwitch (MSFT) directly to physical server

IP Payload IP Payload
7
Packet Sourced from VM attached to Ingress Packet transmitted on vSwitch port
Port Group or directly from physical server 1
Forwarding – Bounce
Ingress Leaf will continue to forward to
original egress Leaf until return traffic
updates cache entry (location, identity)
with new Leaf or COOP updates Leaf sends an update via
mapping entry COOP to spine Oracles

Leaf creates a vMotion or similar workload move


‘bounce’ forwarding occurs. GARP or RARP sourced from
10.1.3.11 entry
10.6.3.2 vSwitch or end point
 Bounce entries are created on the Leaf ‘A’ where the end point was originally attached
 On arrival of a unicast GARP or flooded RARP
 On notification via COOP of an endpoint move as initiated by new Leaf ‘B’
 ‘Bounced’ frames marked with forwarding exception ‘E’ bit in VXLAN header
 Exception bit prevents redirect looping
Fast Re-Route – Unicast

Leaf has multiple equal cost routes to all other Spine will forward traffic to any remaining leaves on
leaves across all active paths detection of direct path loss to the target TEP
 On loss of fabric link, leaf node removes the entry  Leaf will forward along any valid path to target TEP from
for that link and HW hashes subsequent frames the list of valid paths (excluding the arriving interface)
across remaining paths  IS-IS will eventually remove route to spine (with the
 Hardware detects link failure and updates failed downlink)
forwarding entries in ~125 µsec
Overview of ACI Fabric Multicast Forwarding
• Spine Switches maintain a table of
FTAG Root for FTAG Root for FTAG Root for FTAG Root for
GIPo (Multicast IP Overlay Group)
Tree 0, 4, 8, 12 Tree 1, 5, 9, 13 Tree 2, 6, 10, 14 Tree 3, 7, 11, 15
to Leaf binding.
• A Leaf will receive traffic for a GIPo Spine replicates and forwards the
if the EPG BD exists on that Leaf. 4 frame to the Leafs based on the
• The GIPo represents a multicast GIPo address
TEP.
GIPo VXLAN IP Payload

3 GIPo VXLAN IP Payload


Leaf replicates to all
5
For destinations on remote downstream servers,
leaves, the ingress Leaf maps vSwitches, …
the multicast traffic to a GIPo 2 Traffic to locally attached
and hashes flow to an FTAG tree destinations is replicated
locally
vSwitch vSwitch vSwitch
6
MCAST IP MCAST IP Payload MCAST IP Payload
1 Payload
Multicast frame is received from a server Multicast frame is received from a
attached to a unique EPG Bridge Domain server when attached to the
either on a VXLAN, VLAN, NVGRE Segment unique EPG Bridge Domain
or Port
Endpoint Learning & Lookup

© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 34
Endpoint Learning & Lookup
Tenant Endpoint Learning: Spine maintains copy of
1. Data plane learning creates (location, all EP (location, identity)
APIC
identity) mapping at the Leaf
2. DHCP forwarding can be used to learn
(location, identity) mapping entry
3. ARP/GARP/RARP learning VMM updates
4. VMM programs (location, identity) APIC upon
VM creation
mapping DME
5. APIC statically programs (location, ARP/DHCP
identity) mapping
• COOP used to communicate the Data Plane Port Grp
(location, identity) mapping to the Learning
spine proxy
Endpoint Identity/Location Learning Example
COOP ensures all spines maintain a consistent copy of Endpoint addresses
and location information
4 Database Replication
Egress Leaf learns
internal IP/MAC; caches
3 Leaf forwards EP address 5 source VTEP address
information to Spine
‘Oracle’ via COOP VTEP VXLAN IP Payload

2 VTEP VXLAN IP Payload 6 Leaf forwards frame to vSwitch


Ingress Leaf learns IP/MAC+ingress port+encap GRE IP 802.1q IP Payload
 Optionally programmed by APIC
vSwitch vSwitch
1 IP Payload
7 IP Payload
vSwitch is programmed with VM address Packet transmitted on vSwitch
to port and VTEP mapping by VMM port
ACI Policy Identification and Enforcement
ACI VXLAN (VXLAN) header identifies the
All application traffic carries a
policy attributes of application end point policy identifier (source group
 Source Group used as tag/label; identifies tag) within the VXLAN header
specific end point for each EPG
 Policy is enforced between ingress EPG and Storage Storage
egress EPG
 Policy can be enforced at source or
destination WEB DB DB

DRE
LB DL E SP DP
Mcast SP: Indicates Source Policy has been applied
DP: Indicates Destination Policy has been applied

LISP flags Flags/DRE Source Group == EPG VXLAN Instance ID (VNID) M/LB/SP
ACI Policy Enforcement Example
Ingress Leaf populates Source Group based on classification
3
 If Leaf knows egress EPG > enforces policy
 SP policy bit set (indicates ingress policy invoked)
If SP flag = ‘0’ > egress Leaf sets
VTEP
Flags Source
VNID Payload
4 DP policy bit & enforces policy
SP ==1 Group
 Forwards frame to vSwitch.
Flags Source
VTEP DP==1 Group VNID Payload

vSwitch encapsulates packet


2 using assigned identifier vSwitch enforces policy
(VLAN/VXLAN/NVGRE)
vSwitch (VMWare) vSwitch (MSFT) 5
based on port group
GRE IP VXLAN IP Payload GRE IP 802.1q IP Payload

1
EPG derived based on ingress classification (port group, physical port, IP address, VLAN)
What is APIC?
APIC is the policy controller
 It’s not the control plane
 It’s not in the data path

 It’s a highly redundant cluster of 3-7 Servers (N+2 for


redundancy)

39
EPG Operational Tab – Client Endpoints
Navigate to the EPG Operational Tab and verify End Point is learned, including the path.
EPG Operational Tab – Global Endpoints
Navigate to the Fabric > Topology and verify End Point is learned, including the path.
EPG Operational Tab – Global Endpoints
Filter for a specific MAC or IP Address
Fabric Innovations

© 2013-2014 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 43
Fabric Scaling – Hardware-based Directed ARP Forwarding
Leaf uses DST_IP
address in ARP header
to perform VTEP lookup ARP Payload
for forwarding
Hardware type Protocol type
Hardware Protocol
VTEP VXLAN MAC ARP Operation
size size
Sender MAC address
Sender MAC address
Sender IP address
ARP Frame (cont).
ARP Frame MAC forwarded to
ARP
destination end
Sender IP address Destination MAC
sourced from end (cont). address
point point
Destination MAC address

Destination IP address

 Leaf ASIC forwards ARP packet to target destination IP identified in ARP


payload
 Result is no Flooding of ARP traffic
Fabric Scaling – Gratuitous ARP and Device Movement

ARP Payload
Traffic to original
Hardware type Protocol type
VTEP destination VTEP VXLAN MAC GARP
is ‘bounced’ to Hardware Protocol
Operation
new Leaf size size

Sender MAC address

Sender MAC address


Sender IP address
MAC GARP (cont).
Sender IP address Destination MAC
(cont). address
Destination MAC address
GARP Frame sourced Hyper-V vSwitch
Destination IP address
• Leaf ASIC forwards GARP to Leaf where VM originally attached
• Original source Leaf installs Bounce entry to redirect traffic to new
VTEP destination until learning occurs on remaining Leaves
ACI Fabric Load Balancing – Flowlet Switching
 State-of-the-art ECMP hashes flows (5-
tuples) to path to prevent reordering TCP
packets.
 Flowlet switching* routes bursts of packets
from the same flow independently.
d1 d2
 No packet re-ordering

TCP flow Gap ≥ |d1 – d2|


H1 H2

*Flowlet Switching (Kandula et al ’04)


ACI Fabric Dynamic Flow Prioritization
Real traffic is a mix of large (elephant) and small (mice) flows.
Single priority: Fabric automatically
Large flows impact
performance for Fabric detects each assigns higher priority
small flows flows’ initial flowlets to small flows.

Standard High
Priority Priority
vPC Configuration in ACI Fabric
vPC Support in ACI Fabric APIC

ACI Leaves support vPC


ACI Fabric Services
 Similar to Nexus (802.3ad port channels; links split across
two devices)
vPC Anycast vPC Anycast
VTEP VTEP Differences between ACI vPC and standard vPC
 No Peer Link is required
VTEP VTEP
 Peer communication happens via the Fabric
 Path recovery also happens via the Fabric and not peer link
 CFS (Cisco Fabric Services) is replaced by AFS
(ACI Fabric Services)
Host or Switch
 Multicast Forwarding selection (which peer will forward a
vPC interfaces use an anycast VTEP frame )
within the ACI Fabric; which is active on
both vPC peers.
vPC Traffic in ACI Fabric APIC

Traffic within the Fabric is sent to


the vPC anycast address Traffic is both sourced and destined to the anycast vPC
VTEP address from remote Leaves
 Hardware flow hashing between peers via spine; spine
determines which peer forwards a specific flow downstream
vPC Anycast vPC Anycast
to attached device
VTEP VTEP
Upon downlink failure on a peer:
VTEP VTEP
1. Bounce entry created pointing to the peer’s VTEP for end
points reachable via the port channel
2. All MAC/IP-to-leaf bindings for the specific vPC are
removed from COOP database and the spine proxy
3. On failure of a peer, the remaining leaf converts all vPC
Host or Switch ports to non-VPC local ports
Port-Channels in NX-OS (Reference)
Leaf103 Leaf104
• Port Channels in Nexus
standalone mode
1/25 1/25
N5K# interface GigabitEthernet1/7-8
switchport trunk allowed vlan 200
spanning-tree portfast trunk 1/7 1/8

channel-group 10 mode active Nexus


5000
Configuring VPC Interface Policy Group

Navigate: Fabric | Access Policies |


Interfaces| Leaf Interfaces | Policy
Groups | VPC Interface

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy