FTD Clustering CiscoLive
FTD Clustering CiscoLive
BRKSEC-3691
Your Presenter
Luis Restrepo
• Electronics Engineer.
• 6 years as Technical Consulting Engineer in NGFW TAC.
• AMER
• EMEA
• Colombia.
• Passionate about Network/CyberSecurity.
• Hobbies:
• Family Time, Running, Traveling.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 3
• Introduction
• Troubleshooting
• Know the Allies
• Key Concepts
Agenda • Ticket Reports
• Conclusion
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 4
Introduction
Cisco Secure Firewall Hardware Portfolio
Virtual Cluster
Private Cloud: VMware
Public Cloud: AWS, GCP, Azure
Cluster Support
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 6
High-Availability (Failover)
Configuration replication
PRIMARY SECONDARY
Failover Link
ACTIVE STANDBY READY
Stateful Link
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 7
Clustering Connection states preserved
on member failures
Multiple devices grouped Virtual IP/MAC for first
as one logical device hop redundancy
Poll Query = 2
1/3 Holdtime
Poll Query = 3
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 9
Health Monitoring Settings FMC (7.3+)
Time unit waits to receive heartbeat
messages, before marking it dead
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 10
Troubleshooting
Champ Tip 1 – Understanding the problem is half the solution.
Collect as much information as possible from all cluster units.
This is key to save time in the overall troubleshooting process.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 12
Champ Tip 2 – Ask the right questions
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 13
Know The Allies
FMC Allies – Cluster Status
Under Devices > Device Management > Cluster > General
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 15
FMC Allies – Cluster Status
Unit state
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 16
FMC Allies - Health Monitoring
On Health > Monitor, performance and alert information is available.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 17
FMC Allies - Health Monitoring
Selecting the device, shows graphs on CPU, memory, throughput,
connections, etc.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 18
CLI Allies – Cheat Sheet Reference
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 19
Key Concepts
Unit Roles/Functions
• Control Unit – (Previously Master)
• One per cluster, elected based on configured priority or first to join.
• In charge of centralized functions and management.
• Has ownership of virtual IP address for connections to the cluster.
• Process regular transit connections.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 21
Clustering Election Process
Health or sync failure
Control Unit
Data Unit
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 22
Cluster Info Command
FTD-Cluster-BVG-2# show cluster info
Cluster FTD-Cluster-RB: On
Interface mode: spanned
Cluster Member Limit : 16
This is "unit-2-1" in state SLAVE Unit State
ID : 1
Site ID : 1
Champ Tip 4 - Always Version : 9.18(3)53
Serial No.: FCH22247MKJ
use the show cluster CCL IP : 10.99.2.1
CCL MAC : 0015.c500.028f
info command as first Last join : 00:46:31 CET Dec 2 2023
reference point for Last leave: 00:41:28 CET Dec 2 2023 Last Join/Leave
Other members in the cluster:
troubleshooting Unit "unit-1-1" in state MASTER
ID : 0
Site ID : 1
Version : 9.18(3)53
Serial No.: FCH22247LNK
CCL IP : 10.99.1.1 CCL IP/MAC
CCL MAC : 0015.c500.018f
Last join : 10:34:43 CET Nov 30 2023
Last leave: N/A
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 23
Cluster Control Link (CCL)
Carries all data and Cluster control protocol data path messages
communications Info about flow Owner/Director/Forwarder
between cluster Unicast
members UDP 4193
Po48 CCL
CCL Interface flaps
force unit out of Data Packets
cluster Belonging to traffic flows
forwarded by/to other units
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 24
New TCP Connection
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 25
New UDP Connection
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 26
Member Failures
3
1
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 27
Connection Roles Information Reference
Fragment
Unit that handles fragmented traffic. -
Owner
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 28
Flags Reference Examples
FTD-Cluster-RRB-1# cluster exec show conn
unit-1-1(LOCAL):******************************************************
18 in use, 40 most used
Cluster:
fwd connections: 0 in use, 12 most used Connection
dir connections: 0 in use, 22 most used Count
centralized connections: 0 in use, 10 most used
TCP OUTSIDE 172.18.202.150:443 INSIDE 172.18.201.100:44394, idle 0:00:00, bytes 487413076, flags UIO N1 Owner + Director
unit-2-1:*************************************************************
18 in use, 46 most used
Cluster:
fwd connections: 0 in use, 16 most used
dir connections: 0 in use, 8 most used
centralized connections: 0 in use, 0 most used
unit-3-1:*************************************************************
15 in use, 42 most used
Cluster:
fwd connections: 1 in use, 7 most used
dir connections: 1 in use, 32 most used
centralized connections: 0 in use, 0 most used
TCP OUTSIDE 172.18.202.150:443 INSIDE 172.18.201.100:44394, idle 0:00:06, bytes 0, flags y Backup Owner
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 29
Setup Cluster Troubleshooting
Methodology Reference
YES YES
YES YES
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 31
Data Plane Troubleshooting Methodology Reference
YES THEN NO
CPU/Memory/ Fragmentation
Identify 5-Tuple Split-Brain?
Blocks ok? Issues?
WITH YES YES NO
Identify Conn
Check CCL Conns Number & Check ASP Contact
Owner, Director,
Connectivity/MTU Throughput ok? Drops counter TAC
Forwarder
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 32
Ticket Reports
Giant Snorty (Imaginary-Scenario Company)
• Has a 3-unit cluster of 4125’s.
• This cluster acts as perimeter firewall for their network.
• 7 Tickets were opened for the security engineer to handle.
FTD-Cluster-RB
Po6 Inside
Po48 CCL
Po7 Outside
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 34
Ticket Report #1
Ticket #1 – General Questions
Customer Symptom:
• Are DHCP Server and client supported with clustering setups?
• Are dynamic routing protocols supported with clustering setups?
Resolution:
• Based on Cisco documentation DHCP Server/Client are unsupported
features on clustering.
• Dynamic routing protocols are supported and it’s a centralized feature.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 36
Unsupported Features
• Remote Access VPN (SSL/IPsec).
• DHCP client, server, and proxy.
• Virtual tunnel interfaces (VTI).
• Management Center UCAPL/CC mode.
• Integrated routing and bridging.
• Failover configuration.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 37
Centralized Features
The following features are only supported in the Control node.
• Application inspections (DCERPC, ESMTP, NetBIOS, PPTP, RSH,
SQLNET, SUNRPC, TFTP, XDMCP).
• Static route monitoring.
• Site-to-Site VPN.
• IGMP/PIM multicast control plane protocol processing.
• Dynamic Routing.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 38
Ticket Report #2
Ticket #2 – Throughput Testing Issues
Customer Symptom:
• On our FPR4125 cluster we are expecting 135 Gbps of throughput
(datasheet information), when doing performance test we cannot reach
those values, why?
Resolution:
• When combining multiple units into a cluster, the total expected
performance is ~80% of the maximum combined throughput.
• In this case if each unit has 45 Gbps as standalone, on 3-unit cluster the
approximate combined throughput would be (80% of 135 Gbps = 108
Gbps).
• Calculations are based on 1024B packet size.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 40
Performance Scaling Factor
Failover Throughput 10 Gbps
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 41
Ticket Report #3
Ticket #3 – Datacenter Activity Report
Customer Symptom:
• Yesterday there was a planned activity in the datacenter.
• Clustering on two units was reported as disabled afterwards.
FTD-Cluster-RB
Po6 Inside
Po48 CCL
Po7 Outside
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 43
Champ Tip 5 - Use FMC as starting troubleshooting point
FMC Alerts
Unit 3-1
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 44
Champ Tip 6 – Check Allies, First Control Unit
FTD-Cluster-RRB-1# Asking slave unit unit-3-1 to quit because it failed unit health-check.
FTD-Cluster-RRB-1# Asking slave unit unit-2-1 to quit because it failed interface health check 1 times (last failure
on Port-channel6), rejoin will be attempted after 5 min.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 45
Champ Tip 8 – Divide & Conquer, one issue/unit at a time
Control Unit:
FTD-Cluster-RRB-1# show cluster info trace | inc unit-2-1
Dec 06 20:12:13.832 [INFO]Peer unit-2-1(1) reported its Port-channel6 is down
Dec 06 20:12:13.832 [INFO]Slave unit unit-2-1 reports inconsistent cluster interface state for interface Port-channel6
(up on master unit, down on slave unit).
Dec 06 20:12:13.832 [DBUG]Send CCP message to unit-2-1(1): CCP_MSG_IFC_REJOIN_FAIL_COUNTER
Dec 06 20:12:13.832 [DBUG]Send CCP message to unit-2-1(1): CCP_MSG_QUIT from unit-1-1 to unit-2-1 for reason
CLUSTER_QUIT_REASON_IFC_HC
Dec 06 20:12:13.832 [ALERT]Asking slave unit unit-2-1 to quit because it failed interface health check 1 times (last
failure on Port-channel6), rejoin will be attempted after 5 min
Dec 06 20:12:13.832 [INFO]State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-2-1,DISABLED,0)
.
Dec 06 20:17:17.674 [DBUG]Receive CCP message: CCP_MSG_ELEC_REQ from unit-2-1
Dec 06 20:17:17.784 [INFO]State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-2-1,SLAVE_COLD,0)
Dec 06 20:17:17.784 [INFO]FTD - CD proxy received state notification (SLAVE_COLD) from unit unit-2-1
Dec 06 20:17:17.794 [INFO]CCL MTU test to unit unit-2-1 passed
Dec 06 20:17:17.814 [INFO]State machine notify event CLUSTER_EVENT_MEMBER_STATE (unit-2-1,SLAVE_APP_SYNC,0)
Dec 06 20:19:37.793 [INFO]Peer unit-2-1(1) reported its Port-channel6 is down
Dec 06 20:32:03.176 [INFO]Peer unit-2-1(1) reported its Port-channel6 is down
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 46
Champ Tip 9 – Check Data Unit Cluster is disabled
LINA
FTD-Cluster-BVG-2# Unit is kicked out from cluster because of interface health check failure.
FTD-Cluster-BVG-2# Cluster disable is performing cleanup..done.
FTD-Cluster-BVG-2# All data interfaces have been shutdown due to clustering being disabled. To recover either enable
clustering or remove cluster group configuration.
FXOS
FTD-Cluster-BVG-2# scope eth-uplink; scope fabric a; show port-channel
Port Channel:
Port Channel Id Name Port Type Admin State Oper State Port Channel Mode Allowed Vlan State Reason
--------------- ---------------- ------------------ ----------- ---------------- ----------------- ------------
6 Port-channel6 Data Enabled Failed Active All No operational members
7 Port-channel7 Data Enabled Failed Active All No operational members
48 Port-channel48 Cluster Enabled Up Active All Port is enabled and up
Port-Channel6 Down BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 47
FXOS
FTD-Cluster-BVG-2# connect fxos
FTD-Cluster-BVG-2(fxos)# show lacp internal event-history interface ethernet 1/2
64) FSM:<Ethernet1/2> Transition at 297515 usecs after Wed Dec 6 19:12:13 2023 Previous state:
[LACP_ST_PORT_MEMBER_COLLECTING_AND_DISTRIBUTING_ENABLED] Triggered event: [LACP_EV_UNGRACEFUL_DOWN] Next state:
[LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED]
65) FSM:<Ethernet1/2> Transition at 376781 usecs after Wed Dec 6 19:12:13 2023 Previous state:
[LACP_ST_PORT_IS_DOWN_OR_LACP_IS_DISABLED] Triggered event: [LACP_EV_UNGRACEFUL_DOWN] Next state:
[FSM_ST_NO_CHANGE]
SWITCH
GIANT-SNORTY-CORE1#show int status | inc 4/15
Gi4/15 FTD-BVG-2-P2 - E disabled 201 full auto 10/100/1000BaseT Ungraceful down from
GIANT-SNORTY-CORE1#show int status | inc 4/17
Gi4/17 FTD-BVG-2-P4-CCL - E connected 209 a-full a-1000 10/100/1000BaseT LACP events
GIANT-SNORTY-CORE1#show int status | inc 4/18
Gi4/18 FTD-BVG-2-P5-CCL - E connected 209 a-full a-1000 10/100/1000BaseT
interface GigabitEthernet4/15
description KSEC-FPR4125-2 - E1/2
switchport
switchport access vlan 201
switchport mode access Interface was
shutdown
channel-group 40 mode active
shutdown as part
spanning-tree portfast edge of activity
end
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 48
Ticket #3 – Troubleshoot
FTD-Cluster-CRV-3# show cluster info FTD-Cluster-CRV-3# show cluster history
Cluster FTD-Cluster-RB: On 18:46:53 UTC Dec 6 2023
Interface mode: spanned SLAVE DISABLED Cluster interface down
Cluster Member Limit : 16
This is "unit-3-1" in state MASTER 18:51:54 UTC Dec 6 2023
ID : 0 DISABLED ELECTION Enabled from CLI
Site ID : 1
Version : 9.18(3)53 18:52:39 UTC Dec 6 2023
Serial No.: FLM251700E8 ELECTION MASTER_CONFIG Enabled from CLI
CCL IP : 10.99.3.1
CCL MAC : 0015.c500.038f 18:52:39 UTC Dec 6 2023
Last join : 18:52:39 UTC Dec 6 2023 MASTER_CONFIG MASTER_POST_CONFIG Client progression done
Last leave: 18:46:50 UTC Dec 6 2023
Other members in the cluster: 18:52:40 UTC Dec 6 2023
There is no other unit in the cluster MASTER_POST_CONFIG MASTER Master post config done
and waiting for ntfy
Units 1-1 & 3-1 are Control Unit 3-1 doesn’t see Unit 3-1 transitions from
at the same time (Split-Brain) other units on CCL Data > Disabled > Control
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 49
Ticket #3 - Troubleshoot
LINA
FTD-Cluster-CRV-3# show int ip br
Interface IP-Address OK? Method Status Protocol
Port-channel6 172.18.201.1 YES manual up up
Port-channel7 172.18.202.1 YES manual up up
Port-channel48 10.99.3.1 YES unset up up
Ethernet1/1 unassigned YES unset up up
FTD-Cluster-RRB-1#
Interface is up and right
FTD-Cluster-RRB-1# show int po48
Interface Port-channel48 "cluster", is up, line protocol is up MTU is set
Hardware is EtherSVI, BW 2000 Mbps, DLY 1000 usec
Description: Clustering Interface
MAC address 0015.c500.038f, MTU 1600
IP address 10.99.3.1, subnet mask 255.255.0.0
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 50
Ticket #3 - Troubleshoot
FXOS
FTD-Cluster-CRV-3# connect fxos
FTD-Cluster-CRV-3(fxos)# show port-channel database
port-channel48
Last membership update is successful
2 ports in total, 2 ports up
First operational port is Ethernet1/5
Age of the port-channel is 7d:01h:11m:52s
Time since last bundle is 7d:01h:11m:38s
Port-Channel status ok,
Last bundled member is Ethernet1/5 member ports active/up
Ports: Ethernet1/4 [active ] [up]
Ethernet1/5 [active ] [up] *
SWITCH
GIANT-SNORTY-CORE1# show run int Po45 Champ Tip 11 -
interface Port-channel45
switchport Always have in hand
working configuration
switchport access vlan 206
switchport mode access
mtu 1600
spanning-tree portfast edge
from adjacent devices
end for comparison.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 51
Ticket #3 – Summary
• Two data units were kicked out from cluster (unit 2 & 3).
• Only recent change was an activity performed on the datacenter
switches.
• After investigating configuration was OK on cluster units, however
Control unit reported:
• Unit-2-1: Interface health check.
• Unit-3-1: Unit health check.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 52
Ticket Report #4
Ticket #4 – Service Health Report
Customer Symptom:
• Today two of our units were reported as kicked out from the cluster at
different times.
• There was no impact, but customer is afraid it can happen again.
FTD-RRB-1 FTD-BVG-2 FTD-CRV-3
FTD-Cluster-RB
Po6 Inside
Po48 CCL
Po7 Outside
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 54
Ticket #4 – General Troubleshooting
FTD-Cluster-RRB-1# show cluster info
Cluster FTD-Cluster-RB: On
Interface mode: spanned
Cluster Member Limit : 16
This is "unit-1-1" in state MASTER
ID : 0
Site ID : 1
Version : 9.18(3)53 Units 2-1 & 3-1 kicked
Serial No.: FCH22247LNK
CCL IP : 10.99.1.1 out from cluster
CCL MAC : 0015.c500.018f
Last join : 10:34:43 CET Nov 30 2023
Last leave: N/A
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 55
Ticket #4 – General Troubleshooting
FTD-Cluster-RRB-1# show cluster info health
Member ID to name mapping:
0 - unit-1-1(myself) 1 - unit-2-1 2 - unit-3-1
Disk Snort
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 56
Ticket #4 – Snort Troubleshooting Snort3 crash detected
Dec 8 19:48:34 FTD-Cluster-CRV-3 Notification Daemon[14571]: Notification Daemon: Sending UP Status Update NGFW-1.0-snort-1.0
Dec 8 19:48:34 FTD-Cluster-CRV-3 Notification Daemon[14571]: Service Up: Last Heartbeat received at Fri Dec 8 19:48:34 2023
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 57
Ticket #4 – Snort Troubleshooting
root@FTD-Cluster-CRV-3:/home/admin# ls -l /ngfw/var/log/crashinfo/
-rw-r--r-- 1 root root 1037 Dec 08 19:40 snort3-crashinfo.1692444378.572272 Provide TAC for analysis
Champ Tip 12 – Snort cores/crash files can be found in the following locations:
Snort 2 - /ngfw/var/data/cores/ or /ngfw/var/common/
Snort 3 - /ngfw/var/log/crashinfo/ - /ngfw/var/data/cores/ - /ngfw/var/common/
Champ Tip 13 –
1. Copy crash/core files to /ngfw/var/common/ folder on expert mode.
2. Access FMC via HTTPS and go under System > Health > Monitor.
3. Select FTD where the Core Files were generated Advanced Troubleshooting > View
System & Troubleshooting details > File Download:
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 58
Ticket #4 – Disk Troubleshooting
LINA
FTD-Cluster-RRB-1# show cluster history
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 59
Ticket #4 – Disk Troubleshooting
root@FTD-Cluster-BVG-2:/ngfw# find /ngfw -type f -exec du -Sh {} + | sort -rh | head -n 15
find: File system loop detected; '/ngfw/Volume/root1/ngfw' is part of the same file system loop as '/ngfw'.
171G /ngfw/badfile
8.8G /ngfw/Volume/.swaptwo
531M /ngfw/var/sf/cloud_download/cisco_uridb_large_1705310873
531M /ngfw/usr/local/sf/cloud_download/cisco_uridb_large_1705310873
Champ Tip 15 – Increase available disk space by deleting the following: old backup
files, troubleshoot files under /ngfw/var/common/. Don’t delete files/folders if not
completely sure.
Useful commands:
df –ha > expert
find /ngfw -type f -exec du -Sh {} + | sort -rh | head -n 15 > expert
lsof | grep deleted > expert
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 60
Ticket #4 – Summary
• Two Data units were kicked out from cluster (unit 2 & 3).
• No recent changes were performed.
• After investigating configuration was OK on cluster units, however
Control unit reported:
• Unit-2-1: Application Health Check Failure due to disk.
• Unit-3-1: Application Health Check Failure due to snort.
• Big file filling disk was removed for unit 2.
• Snort Crash was identified and provided to TAC for review.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 61
Ticket Report #5
Ticket #5 – Unit Replacement Report
Customer Symptom:
• One of the cluster units had a hardware failure and was replaced.
• Replacement unit is not able to join the cluster.
FTD-Cluster-RB
Po6 Inside
Po48 CCL
Po7 Outside
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 63
Ticket #5 – FMC Checks
WARNING: Unit unit-3-1 is not reachable in CCL jumbo frame ICMP test, please check cluster interface and switch MTU
configuration
WARNING: Unit unit-3-1 is not reachable in CCL jumbo frame ICMP test, please check cluster interface and switch MTU
configuration
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 65
Ticket #5 – Troubleshooting MTU
FTD-Cluster-RRB-1# ping 10.99.2.1 size 1600
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.99.2.1, timeout is 2 seconds:
!!!!! Ping test to unit 2-1
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/10 ms working
FTD-Cluster-RRB-1# ping 10.99.3.1 size 1600
Type escape sequence to abort.
Sending 5, 1600-byte ICMP Echos to 10.99.3.1, timeout is 2 seconds:
?????
FTD-Cluster-RRB-1# cluster exec show interface detail
Success rate is 0 percent (0/5)
unit-1-1(LOCAL):******************************************************
Interface Port-channel48 "cluster", is up, line protocol is up
Hardware is EtherSVI, BW 1000 Mbps, DLY 10 usec
Ping test to unit 3-1 Description: Clustering Interface
not-working MAC address 0015.c500.018f, MTU 1600
IP address 10.99.1.1, subnet mask 255.255.0.0
unit-2-1:*************************************************************
Interface Port-channel48 "cluster", is up, line protocol is up
Hardware is EtherSVI, BW 1000 Mbps, DLY 10 usec
Description: Clustering Interface
MAC address 0015.c500.028f, MTU 1600
Cluster exec used IP address 10.99.2.1, subnet mask 255.255.0.0
FTD-Cluster-CRV-3# show int detail
to check CCL MTU Interface Port-channel48 "cluster", is up, line protocol is up
Hardware is EtherSVI, BW 1000 Mbps, DLY 10 usec
Description: Clustering Interface
MAC address 0015.c500.038f, MTU 1600
IP address 10.99.3.1, subnet mask 255.255.0.0
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 66
Ticket #5 – Troubleshooting MTU
GIANT-SNORTY-CORE1#show int po41
Port-channel41 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 0021.a03d.e666 (bia
0021.a03d.e666) FTD-RRB-1 FTD-BVG-2 FTD-CRV-3
MTU 1600 bytes, BW 2000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is unknown
GIANT-SNORTY-CORE1#show int po43
Port-channel43 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 0021.a03d.e660 (bia
0021.a03d.e660) MTU 1600
MTU 1600 bytes, BW 2000000 Kbit, DLY 10 usec, Po6 Inside
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set Po48 CCL
Keepalive set (10 sec) Po7 Outside
Full-duplex, 1000Mb/s, media type is unknown
GIANT-SNORTY-CORE1#show int po45 MTU 1600 MTU 1500
Port-channel45 is up, line protocol is up (connected)
Hardware is EtherChannel, address is 0021.a03d.e648 (bia
0021.a03d.e648)
Champ Tip 16 – MTU on CCL must always match
MTU 1500 bytes, BW 2000000 Kbit, DLY 10 usec, between the switch and FTD. CCL MTU needs to
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA, loopback not set
be 100+ bytes more than data interfaces MTU.
Keepalive set (10 sec)
Full-duplex, 1000Mb/s, media type is unknown
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 67
Ticket #5 – Summary
• After replacement, unit 3-1 was unable to join the cluster.
• After investigating configuration was OK on cluster units, however
Control Unit reported:
• Unit-3-1: CCL MTU test failed.
• Misconfigured MTU was identified on switch side.
• After setting right value unit 3-1 device was able to join the cluster
and FMC.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 68
Ticket Report #6
Ticket #6 – PAT/Internet Access Report
Customer Symptom:
• Giant Snorty company recently acquired Tiny Snort company which have
a two-unit cluster.
• Devices are running 6.6 version and connectivity issues to internet have
been reported with and without PAT pool configured.
Po2 Outside
FTD-BROWNIE-1 FTD-COOKIES-2
Po1 Inside
Po48 CCL
SW-INFRA BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 70
Ticket #6 - Without PAT Pool
• Public IP is assigned to Control Unit. None available for Data Unit.
• Traffic received by Data Unit to the internet is forwarded through
CCL to Control Unit which can cause overhead or CCL congestion.
Po2 Outside
136.228.226.200
FTD-BROWNIE-1 FTD-COOKIES-2
Po1 Inside
Po48 CCL
LAN
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 71
Ticket #6 - Without PAT Pool
FTD-BROWNIE-1# show nat pool cluster Information about PAT
IP Outside:Giant-Snorty-PATPool 136.228.226.200, owner unit-1-1, backup unit-2-1
pool owner/backup
FTD-BROWNIE-1# show xlate
TCP PAT from Inside:172.16.100.30/31733 to Outside:136.228.226.200/31733 flags ri idle 0:00:07 timeout 0:00:30
TCP PAT from Inside:172.16.100.31/35883 to Outside:136.228.226.200/35883 flags ri idle 0:00:04 timeout 0:00:30
FTD-BROWNIE-1#
Phase: 4
Type: CLUSTER-EVENT Use show xlate
Subtype:
Result: ALLOW command to check
Additional Information: translations
Input interface: 'Inside'
Flow type: NO FLOW
I (0) got initial, attempting ownership.
Phase: 5 Control Unit capture trace
Type: CLUSTER-EVENT
Subtype: shows unit becoming
Result: ALLOW connection Owner
Additional Information:
Input interface: 'Inside'
Flow type: NO FLOW
I (0) am becoming owner
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 72
Ticket #6 - Without PAT Pool
FTD-COOKIES-2#
Phase: 4 Champ Tip 17 – PAT pool size must be always
Type: CLUSTER-EVENT
Result: ALLOW
equal or bigger that the number of cluster units.
Additional Information:
Input interface: 'Inside'
Flow type: NO FLOW Po2 Outside
I (1) got initial, attempting ownership.
. FTD-COOKIES-2
Phase: 5 136.228.226.200
Type: CLUSTER-EVENT
Result: ALLOW
Additional Information:
Input interface: 'Inside' FTD-BROWNIE-1
Flow type: NO FLOW
I (1) am becoming owner SYN
. Po1 Inside
Phase: 10
Type: CLUSTER-EVENT
Result: ALLOW LAN Po48 CCL
Config:
Additional Information:
Input interface: 'Inside'
Flow type: NO FLOW
NAT: I (1) am redirecting packet to
master (0) for PAT.
Data Unit capture trace shows unit attempting
connection ownership, but redirects to control unit
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 73
Ticket #6 - With PAT Pool Balanced PAT
allocation on units
FTD-BROWNIE-1# show nat pool cluster Allocation become
IP Outside:Giant-Snorty-PATPool 136.228.226.200, owner unit-1-1, backup unit-2-1
IP Outside:Giant-Snorty-PATPool 136.228.226.201, owner unit-2-1, backup unit-1-1 imbalanced, even when
Data unit is back online
FTD-BROWNIE-1# show nat pool cluster
IP Outside:Giant-Snorty-PATPool 136.228.226.200, owner unit-1-1, backup unit-2-1 Workaround: Add more IP’s
IP Outside:Giant-Snorty-PATPool 136.228.226.201, owner unit-1-1, backup unit-2-1 to PAT pool or clear xlates
for one IP.
Po2 Outside Po2 Outside
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 74
FTD Clustering PAT Improvements (6.7+)
• IP’s are not distributed entirely to a single cluster member.
• PAT IP’s split in port blocks and evenly distributed on members.
• IP stickiness is also used.
object network Giant-Snorty-PATPool nat (Inside,Outside) after-auto source dynamic Giant-Snorty-LAN pat-pool
range 136.228.226.2 136.228.226.4 Giant-Snorty-PATPool
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 75
FTD Clustering PAT Improvements
FTD-Cluster-RRB-1# show nat pool cluster
IP Outside:Giant-Snorty-PATPool 136.228.226.2
[1024-1535], owner unit-1-1, backup unit-2-1
[1536-2047], owner unit-1-1, backup unit-2-1 Blocks with Owner 1-1
[2048-2559], owner unit-1-1, backup unit-2-1
[2560-3071], owner unit-1-1, backup unit-2-1 Backup 2-1
[3072-3583], owner unit-1-1, backup unit-2-1
[17920-18431], owner unit-2-1, backup unit-3-1
[18432-18943], owner unit-2-1, backup unit-3-1
[18944-19455], owner unit-2-1, backup unit-3-1 Blocks with Owner 2-1
[19456-19967], owner unit-2-1, backup unit-3-1
[19968-20479], owner unit-2-1, backup unit-3-1 Backup 3-1
[20480-20991], owner unit-2-1, backup unit-3-1
[33280-33791], owner unit-3-1, backup unit-1-1
[33792-34303], owner unit-3-1, backup unit-1-1
[34304-34815], owner unit-3-1, backup unit-1-1
[34816-35327], owner unit-3-1, backup unit-1-1 Blocks with Owner 3-1
[35328-35839], owner unit-3-1, backup unit-1-1
[35840-36351], owner unit-3-1, backup unit-1-1 Backup 1-1
[36352-36863], owner unit-3-1, backup unit-1-1
[36864-37375], owner unit-3-1, backup unit-1-1
[49664-50175], owner <RESERVED>, backup <RESERVED>
[50176-50687], owner <RESERVED>, backup <RESERVED>
[50688-51199], owner <RESERVED>, backup <RESERVED> Reserved Blocks
[51200-51711], owner <RESERVED>, backup <RESERVED>
[51712-52223], owner <RESERVED>, backup <RESERVED>
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 76
Ticket #6 – Summary
• Issues were seen in connectivity to the internet when using single
IP address for PAT or with PAT pool when one unit was
rebooted/kicked out from cluster.
• Devices are running FTD 6.6 version.
• Solution was to add additional IP addresses to the PAT pool or clear
xlates for one IP after imbalance is detected.
• Version 6.7+ offers re-design for PAT-related limitations.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 77
Ticket Report #7
Ticket #7 – Data Plane Issues Report
• Customer Symptom:
• Sometimes there are connectivity issues for certain traffic through the
cluster.
• Need some guidance on how to troubleshoot such scenarios.
FTD-Cluster-RB
Po6 Inside
Po48 CCL
Po7 Outside
SW-INFRA
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 79
Ticket #7 – Data Plane Troubleshoot
Champ Tip 18
• Collect as much details as possible about flow(s) affected.
• Identify 5-Tuple (Source/Destination IP/Port + Protocol).
• Identify interfaces and units involved in traffic forwarding.
Source IP – 172.18.201.99
Destination IP – 18.239.18.70
Source Port - X
Destination Port - 443
Protocol - TCP
Ingress Interface - Inside
Egress Interface - Outside
Units Involved – Unit 1-1 & 2-1
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 80
Ticket #7 - Packet Captures
FTD-Cluster-RRB-1# cluster exec show capture
unit-1-1(LOCAL):****************
FTD-Cluster-RRB-1# cluster exec capture IN buffer 33554432 interface Inside match tcp host 172.18.201.99 host
18.239.18.70 eq 443
FTD-Cluster-RRB-1# cluster exec show capture
unit-1-1(LOCAL):******************************************************
capture IN type raw-data buffer 33554432 interface Inside [Capturing - 1260 bytes]
match tcp host 172.18.201.99 host 18.239.18.70 eq https
unit-2-1:*************************************************************
capture IN type raw-data buffer 33554432 interface Inside [Capturing - 0 bytes]
match tcp host 172.18.201.99 host 18.239.18.70 eq https
unit-3-1:*************************************************************
capture IN type raw-data buffer 33554432 interface Inside [Capturing - 0 bytes]
match tcp host 172.18.201.99 host 18.239.18.70 eq https
FTD-Cluster-RRB-1#
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 81
Ticket #7 - Packet Captures
FTD-Cluster-RRB-1# cluster exec show capture IN
unit-1-1(LOCAL):******************************************************
10 packets captured
1: 10:23:12.879226 802.1Q vlan#201 P0 172.18.201.99.31349 > 18.239.18.70.443: S 2225395909:2225395909(0) win 29200 <mss
1460,sackOK,timestamp 1110209649 0,nop,wscale 7>
2: 10:23:12.880401 802.1Q vlan#201 P0 18.239.18.70.443 > 172.18.201.99.31349: S 719653963:719653963(0) ack 2225395910 win
28960 <mss 1380,sackOK,timestamp 1120565119 1110209649,nop,wscale 7>
3: 10:23:12.880691 802.1Q vlan#201 P0 172.18.201.99.31349 > 18.239.18.70.443: . ack 719653964 win 229 <nop,nop,timestamp
1110209650 1120565119>
4: 10:23:12.880783 802.1Q vlan#201 P0 172.18.201.99.31349 > 18.239.18.70.443: P 2225395910:2225396054(144) ack 719653964
win 229 <nop,nop,timestamp 1110209650 1120565119>
unit-2-1:*************************************************************
0 packet captured
0 packet shown
unit-3-1:*************************************************************
0 packet captured
0 packet shown
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 82
Ticket #7 - Packet Captures Options
Champ Tip 20 – Trace option allows to see how the unit handle ingress traffic, by default only
the first ingress 50 packets are traced but it can be configured up to 1000.
FTD-Cluster-RRB-1# cluster exec capture OUT interface Outside buffer 33554432 trace trace-count 1000 match tcp host
136.228.226.2 host 18.239.18.70 eq 443
unit-2-1:*************************************************************
unit-3-1:*************************************************************
1: 09:28:12.111429 802.1Q vlan#202 P0 18.239.18.70.443 > 136.228.226.2.31349: S 301658077:301658077(0) ack 441626017 win
28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
Champ Tip 21 – Same packet can have different numbers on different units. Check timestamps
to understand packet flow.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 83
Ticket #7 - Packet Captures Trace Option
FTD-Cluster-RRB-1# cluster exec show cap OUT packet-number 2 trace
unit-1-1(LOCAL):******************************************************
2: 09:28:12.118341 802.1Q vlan#202 P0 18.239.18.70.443 > 136.228.226.2.31349: S 301658077:301658077(0) ack 441626017 win
28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
Phase: 1
Type: CAPTURE
Subtype:
Result: ALLOW
Config:
Additional Information:
MAC Access list
FTD-Cluster-RRB-1# cluster exec unit unit-3-1 show cap OUT packet-number 1 trace
1: 09:28:12.111429 802.1Q vlan#202 P0 18.239.18.70.443 > 136.228.226.2.31349: S 301658077:301658077(0) ack 441626017 win
28960 <mss 1460,sackOK,timestamp 1125686319 1115330849,nop,wscale 7>
Phase: 1
Type: CAPTURE
Subtype:
Result: ALLOW
Config:
Additional Information:
MAC Access list
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 84
Ticket #7 - CCL/ASP Packet Captures
FTD-Cluster-RRB-1# cluster exec capture CCLCAP interface cluster headers-only
unit-1-1(LOCAL):******************************************************
unit-2-1:*************************************************************
Configure CCL captures
on all units.
unit-3-1:*************************************************************
Champ Tip 22 – Data interface captures shows all packets by default (Ones that reach the
interface from the network + Reinjected packets from CCL.
Use reinject-hide option to not see reinjected packets. (Useful to verify asymmetry).
headers-only option is useful when packet payload is of no interest.
In addition, asp-drop captures are useful to check if certain flow has software drops.
FTD-Cluster-RRB-1# cluster exec cap ASPDROP type asp-drop all buffer 33554432
unit-1-1(LOCAL):******************************************************
Configure ASP drop
unit-2-1:*************************************************************
captures on all units.
unit-3-1:*************************************************************
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 85
Ticket #7 - ASP Packet Captures
• Can be used to check main reasons behind flows or packets drops.
• Troubleshooting approach goes as follows:
1. Clear ASP 2. Run ASP drop few times 3. Configure drop-
drop counters to identify high counter specific captures
FTD-Cluster-RRB-1# show asp drop FTD-Cluster-RRB-1 # cap ASP type asp-drop no-route
FTD-Cluster-RRB-1 # show cap ASP
Frame drop:
Flow is being freed (flow-being-freed) 21 2 packets captured
Unexpected packet (unexpected-packet) 13
No route to host (no-route) 1045842 1: 14:41:05.029325 172.18.100.100.33448 > 172.19.220.100.53:
Reverse-path verify failed (rpf-violated) 625454 udp 39 Drop-reason: (no-route) No route to host, Drop-location: frame
Flow is denied by configured rule (acl-drop) 1491856 0x000055d135ca7895 flow (NA)/NA
First TCP packet not SYN (tcp-not-syn) 15005 2: 14:41:05.029386 172.18.100.100.33448 > 172.19.220.100.53:
TCP failed 3 way handshake (tcp-3whs-failed) 112 udp 39 Drop-reason: (no-route) No route to host, Drop-location: frame
FP L2 rule drop (l2_acl) 974637 0x000055d135ca7895 flow (NA)/NA
Interface is down (interface-down) 8 2 packets shown
Dispatch queue tail drops (dispatch-queue-limit) 231
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 87
Ticket #7 - Additional Commands Dataplane
FTD-Cluster-RRB-1# show logging
%FTD-6-747004: Clustering: State machine changed from state SLAVE_CONFIG to SLAVE_FILESYS
%FTD-6-747004: Clustering: State machine changed from state SLAVE_FILESYS to SLAVE_BULK_SYNC
%FTD-7-747005: Clustering: State
%FTD-7-747005: Clustering: State
machine
machine
notify event
notify event
CLUSTER_EVENT_MEMBER_IFC_STATE
CLUSTER_EVENT_MEMBER_IFC_STATE
Syslog
%FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_IFC_STATE
%FTD-7-747005: Clustering: State machine notify event CLUSTER_EVENT_MEMBER_IFC_STATE
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 88
Ticket #7 – Summary
• When having data plane related issues, make sure to identify the
traffic affected details.
• Using captures with trace and syslog can be extremely useful to
understand traffic flow and detect missing packets.
• CCL/ASP packet captures, along with checking connections and
xlates comes handy in troubleshooting process.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 89
Ticket Reports Complete
RADKit
The Churn in Issue Lifecycle
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 92
Remote Automation Development Kit (RADKit)
• RADKit is a Software Development Kit (SDK): a set of ready-to-use
tools and Python modules allowing efficient and scalable
interactions with local or remote equipment to eliminate 50% of
total time spent in problem solving lifecycle.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 93
Champ Cheat Sheet Reference
LINA
show cluster info – Shows cluster information and roles.
show cluster history – Shows unit event history.
show cluster conn count – Shows overall and per-unit connections count.
show cluster xlate count – Shows overall and per-unit xlate count.
show cluster traffic – Shows overall and per-unit traffic statistics.
show cluster info trace – Shows additional details (debug) level of clustering.
show cluster resource usage – Shows overall and per-unit resource utilization.
show cluster cpu – Shows overall and per-unit cpu utilization.
show cluster memory – Shows overall and per-unit memory utilization.
show cluster info load-monitor – Shows general information about conns, buffer drops, memory and CPU.
show cluster info health – Shows general information about unit health (interfaces, disk, snort).
cluster exec capture <name> – To configure packet captures.
cluster exec show cap <name> – To check packet capture contents.
show cluster info conn-distribution – Shows information about connection distribution in cluster.
show cluster info packet-distribution – Show information about packet distribution in cluster.
show nat pool cluster summary – Shows PAT pool distribution.
show conn detail – Show details about connections.
show xlate detail – Show details about translations.
show asp drop – Check software drops
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 116
Champ Cheat Sheet Reference
FXOS commands:
scope eth-uplink; scope fabric a; show port-channel
connect fxos
show port-channel summary
show lacp internal event-history interface ethernet <int>
show port-channel database
FMC
Devices > Device Management > Cluster > General – To check cluster information and history from FMC.
Health > Monitor – Check cluster health and graphs.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 117
Documentation
Configuration Guides:
https://www.cisco.com/c/en/us/td/docs/security/secure-firewall/management-center/device-
config/720/management-center-device-config-72.html
Clustering Troubleshooting Document:
https://www.cisco.com/c/en/us/support/docs/security/firepower-ngfw/216745-troubleshoot-
firepower-threat-defense-f.html
Radkit:
https://radkit.cisco.com/
Compatibility Guide:
https://www.cisco.com/c/en/us/td/docs/security/secure-firewall/compatibility/threat-defense-
compatibility.html
FTD Syslog Messages:
https://www.cisco.com/c/en/us/td/docs/security/firepower/Syslogs/b_fptd_syslog_guide/about.
html
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 118
Conclusion
Key Session Learnings
• Structured approach, Allies, Tools and Champ Tips can help to
have a faster and more effective clustering troubleshooting.
• Monitor as much as possible with health monitoring.
• Make sure MTU is set properly.
• Must be same on FTD and SW side.
• For CCL MTU must be at least 100 bytes more than data interfaces.
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 120
CTF booth at World of Solutions
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 121
• Recommended Sessions
Continue Troubleshooting
BRKSEC-3691 © 2024 Cisco and/or its affiliates. All rights reserved. Cisco Public 122
Thank you