Enterprise Sonic Distribution Lifecycle Management
Enterprise Sonic Distribution Lifecycle Management
Technologies
Lifecycle Management
Abstract
This document describes the outlook of current and future management infrastructure in
SONiC, and the benefits and options offered by the new SONiC Management Framework
pioneered by Dell Technologies. Zero Touch Provisioning (ZTP), monitoring and telemetry, and
flow analysis are also discussed.
June 2020
Notes, cautions, and warnings
NOTE: A NOTE indicates important information that helps you make better use of your product.
CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the
problem.
WARNING: A WARNING indicates a potential for property damage, personal injury, or death.
© 2020 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other
trademarks may be trademarks of their respective owners.
Contents
1 Introduction to SONiC.................................................................................................................. 5
Dell Technologies vision........................................................................................................................................................ 5
SONiC..................................................................................................................................................................................... 5
Containerized architecture...................................................................................................................................................5
Enterprise SONiC Distribution by Dell Technologies......................................................................................................... 7
Enterprise SONiC Distribution by Dell Technologies 3.0.............................................................................................7
Enterprise SONiC Distribution by Dell Technologies offerings.................................................................................. 8
Typographical conventions...................................................................................................................................................8
Support and feedback...........................................................................................................................................................8
Technical support............................................................................................................................................................ 8
Feedback for this document.......................................................................................................................................... 8
Contents 3
4 Zero Touch Provisioning............................................................................................................. 26
Introduction.......................................................................................................................................................................... 26
Advantages of using ZTP................................................................................................................................................... 26
ZTP process......................................................................................................................................................................... 26
User-defined ZTP JSON file use case example...............................................................................................................27
Example of configuring ZTP using CLI..............................................................................................................................29
Enable ZTP..................................................................................................................................................................... 29
Verify ZTP.......................................................................................................................................................................29
Disable ZTP.................................................................................................................................................................... 29
Summary...............................................................................................................................................................................30
6 Flow Analysis............................................................................................................................. 34
Introduction.......................................................................................................................................................................... 34
sFlow..................................................................................................................................................................................... 34
Advantages of sFlow.....................................................................................................................................................34
sFlow operations............................................................................................................................................................34
sFlow use case and example........................................................................................................................................ 35
Introduction to packet-level network telemetry..............................................................................................................36
Benefits of packet-level network telemetry.....................................................................................................................37
Everflow................................................................................................................................................................................37
Everflow components and operation ......................................................................................................................... 37
Everflow use case and example...................................................................................................................................38
Summary...............................................................................................................................................................................40
7 References................................................................................................................................. 41
SONiC documentation.........................................................................................................................................................41
Ansible documentation........................................................................................................................................................ 41
Dell Technologies Networking Infrastructure Solutions documentation.......................................................................41
4 Contents
1
Introduction to SONiC
Topics:
• Dell Technologies vision
• SONiC
• Containerized architecture
• Enterprise SONiC Distribution by Dell Technologies
• Typographical conventions
• Support and feedback
SONiC
Software for Open Networking in the Cloud (SONiC) is a Networking Operating System (NOS) based on Linux. It is an open-source NOS.
Switches from various vendors support SONiC. The open-source community contributing to the development of SONiC NOS provides a
great advantage. SONiC has been embraced and adopted in companies that run huge enterprise data centers and cloud providers.
The benefits that SONiC offers are:
• Hardware independence
• Containerized architecture
• Scale with ease
• Open source
• High performance
• Agility with a flexible management framework
Containerized architecture
Containerized architecture provides maximum control, flexibility, and choice, as shown in the following figure. Various modules in SONiC
architecture use the Redis database to work with each other. The modules are placed in docker containers which include:
• DHCP-relay
• Pmon
• Snmp
• Lldp
• Bgp
• Teamd
• Database
• Swss
• Syncd
Introduction to SONiC 5
Docker containers, some of which are integrated within Linux, work to carry out significant functionalities. An example is how the SONiC
shell provides the CLI.
Tools and various management platforms integrate seamlessly into the SONiC switches. The Switch Abstraction Interface is an integral
part of the SONiC NOS. The SONiC NOS can run on supported Dell EMC PowerSwitch hardware or switch hardware from supported
vendors. This provides flexibility for the user and prevents vendor lock-in.
SONiC is a competitive choice for large enterprises, private cloud, and cloud environments and is suitable for scalable solutions. A use case
example includes a typical leaf-spine (also known as Clos) architecture as well as a leaf-spine architecture with super spines. The first of
the following figures shows a typical Leaf-spine or Clos architecture. The second of the following figures shows a leaf-spine topology with
super spines.
6 Introduction to SONiC
Figure 3. Leaf-spine topology with super spines
Introduction to SONiC 7
Enterprise SONiC Distribution by Dell Technologies
offerings
Customers can deploy the most suited offering for Enterprise SONiC Distribution by Dell Technologies 3.0 based on their requirements.
The offerings include:
• Cloud standard
• Enterprise standard
• Cloud premium
• Enterprise premium
Typographical conventions
The CLI and GUI examples in this document use the following conventions:
Technical support
For technical support, visit http://www.dell.com/support or call (USA) 1-800-945-3355.
8 Introduction to SONiC
2
Outlook of the Current Management
Infrastructure
Topics:
• Current management models
• Config_db.json
• Minigraph.xml
• Linux shell
• Python-based SONiC CLI
• Free Range Routing
• Simple Network Management Protocol
• Summary of management models
• Challenges with current management models
• Summary
Config_db.json
The primary databases hosted by the REDIS database include APPL_DB, CONFIG_DB, STATE_DB, ASIC_DB, and COUNTERS_DB.
Stock SONiC keeps the configuration in ConfigDB. ConfigDB uses a table-object schema, and config_db.json is a serialization of DB.
The following example shows a Device Metadata table. The Device Metadata table has a single object named localhost. For example, the
Device Metadata table contains hwsku, platform, and so on.
"DEVICE_METADATA": {
"localhost": {
"hostname": "sonic-leaf1",
"hwsku": "DellEMC-S5232f-C32",
"mac": "3c:2c:30:49:20:00",
"platform": "x86_64-dellemc_s5232f_c3538-r0",
"type": "LeafRouter"
}
}
"Vlan1003": {
"members": [
"Ethernet48",
"Ethernet0"
],
"vlanid": "1003"
}
},
"VLAN_INTERFACE": {
"Vlan1003|172.10.3.253/24": {}
},
"VLAN_MEMBER": {
"Vlan1003|Ethernet0": {
"tagging_mode": "tagged"
},
"Vlan1003|Ethernet48": {
"tagging_mode": "untagged"
}
}
Viewing the added configuration in the config_db.json file can be done using the Linux shell. See the following figure.
NOTE: For the configurations added to the config_db.json file to take effect in the redisDB, use the config load or
config reload command to force load the JSON file into the database. Alternatively, users can reboot the device.
Minigraph.xml
A legacy method of configuration was done using an XML file named minigraph.xml. The user can edit and modify the file mingraph.xml
located in /etc/sonic/. The file can also be copied from a remote server. The configuration from minigraph.xml can be loaded using the
config load_minigraph command. Optionally, the user can reboot the device to make the configuration effective. For example, the
device information can be defined in the minigraph.xml file.
The hostname and HwSku information are defined in the root element <DeviceMiniGraph>.
<DeviceMiniGraph>
...
<Hostname> sonic-leaf1</Hostname>
<HwSku> DellEMC-S5232f-C32</HwSku>
</DeviceMiniGraph>
admin@Leaf1:~$ help
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
These shell commands are defined internally. Type `help' to see this list.
Type `help name' to find out more about the function `name'.
Use `info bash' to find out more about the shell in general.
Use `man -k' or `info' to find out more about commands not in this list.
A star (*) next to a name means that the command is disabled.
job_spec [&] history [-c] [-d offset] [n] or hist>
(( expression )) if COMMANDS; then COMMANDS; [ elif C>
. filename [arguments] jobs [-lnprs] [jobspec ...] or jobs >
: kill [-s sigspec | -n signum | -sigs>
[ arg... ] let arg [arg ...]
[[ expression ]] local [option] name[=value] ...
alias [-p] [name[=value] ... ] logout [n]
bg [job_spec ...] mapfile [-d delim] [-n count] [-O or>
bind [-lpsvPSVX] [-m keymap] [-f file> popd [-n] [+N | -N]
break [n] printf [-v var] format [arguments]
builtin [shell-builtin [arg ...]] pushd [-n] [+N | -N | dir]
caller [expr] pwd [-LP]
***** OUTPUT TRUNCATED *****
NOTE: The preceding code is from the Device information section of https://github.com/Azure/SONiC/wiki/
Configuration-with-Minigraph-(~Sep-2017).
For example, the command to list the contents of the folder sonic is listed below:
admin@Leaf1:~$ ls /etc/sonic/
agent_config.cfg frr snmp.yml
asic_config_checksum generated_services.conf sonic_branding.yml
config_db.json hamd sonic_version.yml
constants.yml init_cfg.json updategraph.conf
Another example to view the contents of the config_db.json file is shown below:
NOTE: The Python-based SONiC CLI is also referred to as the Click CLI, or SONiC CLI.
The following are some of the configurations supported using the Python-based SONiC CLI:
Options:
--help Show this message and exit.
Commands:
aaa AAA command line
acl ACL-related configuration tasks
bgp BGP-related configuration tasks
classifier Classifiers related configuration tasks
copp Configure COPP
core Configure coredump
custom_assert Configuration action on assert
ecn ECN-related configuration tasks
export
flow Flow related configuration tasks
hardware Configure hardware parameters
hostname
igmp_snooping igmp-snooping configuration tasks
interface Interface-related configuration tasks
***** OUTPUT TRUNCATED *****
For example, both IPv4 and IPv6 can be assigned to interfaces. To assign or remove an IP address for an interface, use the following
commands:
The below example shows the configuration of VLAN 1611 on a switch using the Python-based SONiC CLI.
To configure an interface in a VLAN, assign the VLAN ID, then assign an interface to the VLAN ID. Optionally, an IP address can be
assigned to the VLAN.
Enter the following command to create VLAN 1611:
FRR provides the configuration and management of Layer 3 protocols. FRR can be used to configure routing protocols such as BGP and
RIP. FRR operates as a suite of daemons that work together with the protocols that are implemented as separate services. Some of the
advantages of using FRR include:
• High resiliency
• Independent daemons for protocols
• Flexibility
• Open source
Some of the protocols that FRR supports include:
• EIGRP
• BABEL
• RIP
• OSPF
• IS-IS
• PIM
• VRRP
• BGP
FRR also supports various use cases that require routing and routing protocols.
NOTE: For detailed information about FRR and supported platforms, see http://docs.frrouting.org/en/latest/
overview.html.
By default, the FRR stack is installed in the SONiC NOS and is available when the switch boots. The FRR CLI is accessible from the Linux
shell.
SONiC routing protocols are configured in the FRR shell. To enter the FRR shell from SONiC, enter the vtysh command.
The following is a sample eBGP configuration that uses the FRR shell on Leaf1, as shown in the previous figure.
Summary
This chapter covered the legacy management models for SONiC NOS. The configuration method and sample example using the
management models such as Linux shell, Python-based SONiC CLI, FRRouting shell, config_db.json file, and SNMP were discussed.
The advantage, disadvantages, and operational complexity can be inferred in using these disparate management models. To avoid these
hardships, the SONiC Management Framework model pioneered by Dell Technologies provides a single way to configure features and is a
model that is configured and managed from a single place.
Management framework
To overcome the challenges discussed in the previous chapter, Dell Technologies has taken the lead to introduce a new Management
Framework for SONiC. The Management Framework includes:
• CLI
• REST
• gNMI
• gNOI
The Management Framework CLI is a SONiC application responsible for managing configuration and status on SONiC switches. This
application provides a coherent way to validate, configure, and manage the features running on the SONiC NOS. This intuitive and holistic
CLI provides a familiar way of configuring various functions and features available in the switch. There are numerous benefits in using the
CLI as it allows for centralized operation and management.
SONiC may also be configured and administered using a REST API and a gRPC network management interface (gNMI) using YANG data
models. The Management Framework supports both standard and custom YANG models for communication with the corresponding
management servers. The Management Framework runs in a single container named sonic-mgmt-framework. The following
command lists the docker containers and displays only the Management Framework docker container:
NOTE: Legacy methods of configuration, such as the FRR shell and config_db.json, are supported.
sonic# show ?
aaa Show aaa info
bfd Bidirection Forwarding Detection
bgp show bgp community, ext community, as list information
image Show image information
interface Show Interface info
ip show IP commands
ipv6 show IP commands
kdump Show kdump status
link Show link state tracking information
lldp Show lldp information
mac Show MAC
mclag Show mclag domain
mirror-session Show Mirror session information
nat Show NAT info
NeighbourSuppressStatus Show arp and nd suppression status
platform Show platform information
PortChannel LAG status and configuration
***** OUTPUT TRUNCATED ******
A user can take advantage of the CLI to access the information and data that is associated with the SONiC NOS and hardware. The
following examples depict how the show command gathers system details.
The show system command lists the attributes such as the hostname, boot time, current date and time, and domain name.
The show system memory command provides details on the total available memory and the used memory.
The CLI supports many configuration commands. Access the configuration mode by using the command configure terminal.
sonic(config)# ?
aaa AAA configuration
bfd Configure BFD peers
bgp bgp command list
end Exit to the exec Mode
evpn EVPN Global Configuration
The following example shows the initial configuration steps for assigning an IP address to a loopback interface, and point-to-point
interfaces in switch Leaf 1 within the leaf-spine topology shown in the previous figure.
interface Loopback 0
ip address 10.0.2.1/32
exit
interface Ethernet 0
description To-Spine1
ip address 192.168.1.1/31
no shutdown
exit
interface Ethernet 12
description To-Dynamic-neighbor
ip address 50.50.50.2/24
no shutdown
exit
REST
SONiC Management Framework provides various common North Bound Interfaces (NBIs) for managing configuration and retrieving
status on SONiC switches. SONiC Management Framework includes support for NBIs such as CLI, gNMI, and REST. A use case includes
using a REST client by the user to perform POST, PUT, PATCH, DELETE, GET operations on the supported data models, and URL paths.
The management REST server leverages the HTTP/HTTPS protocol.
NOTE: For more information, see https://github.com/project-arlo/SONiC/blob/master/doc/mgmt/Management
%20Framework.md#3122-REST.
Select the REST API for openconfig-platform. This will open the page for selected REST API. The page will list all the REST API
paths and each action available such as GET, POST, PATCH, DELETE, and PUT. Once it is executed, the tool fetches the data via REST
API with the "components" data, as shown in the following figure.
sudo -I
docker exec -it telemetry bash
gnmi_set -username admin -password Dell\!234 -update /openconfig-interfaces:interfaces/
interface[name=Ethernet0]/config/mtu:@mtu.json -target_addr 127.0.0.1:8080 -insecure true -
pretty
{
"mtu":1514
}
OpenConfig option
OpenConfig aims to develop networks that are dynamic and programmable using software-defined networking principles, such as model-
driven management and operations. For more information, see http://openconfig.net/.
gNMI option
gRPC Network Management Interface (gNMI) uses the gRPC framework to enable telemetry and configuration management of devices.
Google Protocol RPC (gRPC) is a Remote Procedure Call framework that is used to connect services competently. gNMI service
describes an interface for the network management system to interact with a network element. gNMI streams data from the network
device and provides the functionality to configure and retrieve operational and configuration states while providing a powerful method for
switch management.
gNOI option
gRPC Network Operations Interface (gNOI) is a set of microservices that are based on gRPC that runs commands on a network device
using protocol buffers. Protocol buffers are used to serialize structured data.
Summary
In this chapter, the advantages and use cases for the SONiC Management Framework model, accessing and configuring features in the
CLI, infrastructure management integration with SONiC NOS, various DevOps, automation and telemetry options available was covered.
Thus, the SONiC Management Framework provides a comprehensive advantage in configuring and managing the Enterprise SONiC
Distribution by Dell Technologies NOS by a user with ease, overcoming the challenges that arose with the use of disparate traditional
management models such as FRR shell, config_db.json, and so on.
Introduction
The Zero-Touch Provisioning (ZTP) service upgrades firmware and configures many switches simultaneously. ZTP can also help perform
connectivity checks after the upgrade. As the data center scales and more network equipment is added, the switches must be upgraded
with the wanted firmware, followed by the configuration of the switches. It is inefficient and time-consuming to update the firmware and
configure each switch manually.
To carry out ZTP, the booting switches require a connection to the remote provisioning server so that it can download the necessary
firmware, configuration files, and scripts. The SONiC ZTP service processes this provisioning information in the ZTP JSON file. The
automation of bringing up the switch that is based on user requirements without user intervention provides a significant advantage.
ZTP process
When a switch boots, the ZTP service starts and checks for the presence of the startup configuration file. The config_db.json
startup configuration file is at /etc/sonic/. DHCP discovery is used to get the information about where the ZTP JSON file is located.
The JSON file has the configuration needed for the switch. The administrator can define the configurations that are required for a
particular switch.
NOTE: When ZTP is disabled by the user, the switch boots the default configuration available on the switch.
The DHCP discovery process is done on in-band interfaces and also the management interface. DHCPv4 and DHCPv6 address discovery
are supported. The first interface that receives a valid DHCP offer determines the URL specifying the location of the ZTP JSON file. At
the DHCP server end, the DHCP offer must include the DHCPv4 Option 67 or the DHCPv6 Option 59 to specify the ZTP JSON file URL.
See the following figure.
The ZTP service performs the configuration steps, which may cause a switch to reboot multiple times. After the downloaded ZTP JSON
file is processed, the SONiC ZTP service exits, as shown in the following figures.
An example ZTP JSON file to update the firmware and configuration for Leaf3 in the figure is shown below:
{
"ztp": {
"01-firmware": {
"install": {
"url": "http://192.168.1.1/SONiC-OS-3.0.1-Enterprise_Base.bin"
}
},
"02-configdb-json": {
"dynamic-url": {
"source": {
"prefix": "http://192.168.1.1/",
"identifier": "Leaf3",
"suffix": "¬_config_db.json"
}
}
},
"03-provisioning-script": {
"plugin": {
"url":"http://192.168.1.1/post_install.sh"
},
"reboot-on-success": true
},
"04-connectivity-check": {
"ping-hosts": [ "172.16.11.1", "172.16.11.2" ]
}
}
}
Enable ZTP
Use the ztp enable command to administratively enable ZTP.
NOTE: By default, the ZTP service is enabled. The ztp enable command reenables the ZTP after a user disables it.
Enable ZTP on the Leaf-3 switch by running the Leaf-3(config)# ztp enable command.
Verify ZTP
Use the show ztp-status command to view the current ZTP configuration of the switch and thorough information about the current
state of a ZTP session.
On the SONiC switch, the ZTP status is viewable:
----------------------------------------
01-configdb-json
----------------------------------------
Status : SUCCESS
Runtime : 02m 48s
Timestamp : 2019-09-11 19:11:55 UTC
Exit Code : 0
Ignore Result : False
----------------------------------------
02-connectivity-check
----------------------------------------
Status : SUCCESS
Runtime : 04s
Timestamp : 2019-09-11 19:12:16 UTC
Exit Code : 0
Ignore Result : False
Disable ZTP
To disable ZTP, a provision is available to stop and disable the ZTP service. Once the ZTP service is disabled, you must manually reenable
the service when rebooting or if the startup configuration file is not present.
To accomplish this, use the no ztp enable command.
On the Leaf-1 switch, ZTP can be disabled by running the Leaf-3(config)# no ztp enable command.
Introduction
Network data is gaining further importance in scalable infrastructures, where the number of monitored devices grows. The collection of
this network data such as operational state, and configuration, can significantly assist in network analysis, which improves network stability
and troubleshooting. There are traditional methods for collecting network data. Some examples include using SNMP or Syslog. This can be
done in two ways:
• Pulling data from a network device
• Having the networking device stream data to the management system
The efficiency of streaming telemetry rests in the streaming of incremental data updates from the network device to the management
system. The administrator can subscribe to the data they need.
Various benefits are associated with streaming telemetry. It removes the inefficiencies that are associated with traditional network
telemetry (SNMP), which requires management systems to be polling for the data, irrespective of whether there is a change or not.
Streaming telemetry provides insight into the network data, which can be used for network analysis, remediation of issues, and network
optimization. SONiC NOS supports streaming telemetry using gNMI.
Telemetry terminology
This section describes the various terms for telemetry.
Term Definition
Sensor path The path used to collect data for streaming telemetry.
Sensor group A reusable group of multiple sensor paths and exclude filters.
Destination group The IP address and transport port on a destination server to which telemetry data is streamed. You can
configure multiple destinations and reuse the destination group in subscription profiles.
Subscription profile Data collector destinations and stream attribute that is associated with sensor paths. A subscription ties
sensor paths and a destination group with a transport protocol, encoding format, and streaming interval.
The telemetry agent in the switch attempts to establish a session with each collector in the subscription
profile, and streams data to the collector. If a collector is not reachable, the telemetry agent continuously
tries to establish the connection at one-minute intervals.
gNMI modes
The two modes that are based on the telemetry session initiation carried out by the switch are:
• Dial-in
• Dial-out
Dial-out
The collector initiates and establishes a session with the switch. Once the session is established, telemetry data is sent from the switch.
The existing SONiC telemetry framework has been extended to support the new gNMI services. All four gNMI services are supported:
• Get
• Set
• Capabilities
• Subscribe
A use case example includes using gNMI for streaming telemetry. For more information about the supported gNMI services and examples,
see https://github.com/Azure/SONiC/blob/master/doc/mgmt/Management%20Framework.md.
Congestion monitoring
The congestion monitoring provides real-time visibility into the network. It offers scalable tracking of buffer occupancies and the ability to
monitor peak or current occupancy. The granular details can be accessed at a port level, port group level, or service pool level counters. It
also provides the advantage of monitoring various types of drop counters.
Mirror on drop
Mirror on drop captures the drops in both ingress and egress pipelines. It captures the first dropped packet. It provides details on the
packet drop by generating an event. The generate event contains the drop reason and is sent to the collector for a mirror on drop flow.
The details include the first dropped packet with drop reason and drop reason change.
Inband flow analysis, congestion monitoring, and mirror on drop provide several advantages and solutions in network management. The
information obtained from granular, real-time insights from advanced merchant silicon-based telemetry will help with rapid provisioning,
troubleshooting, and resolving network issues.
Summary
In this chapter, the methods of obtaining network data using gNMI were covered. The information about gNMI, gNMI modes, supported
RPC operations, use cases, and sample examples for gNMI were covered. Visibility into scalable networks is orchestrated using advanced
telemetry. Thus, the use of gNMI for the streaming of data from the network device and the functionality to configure and retrieve
operational and configuration states provides a powerful method for switch management.
Introduction
Flow-based services improve the switch's ability to have better control over the traffic by providing a generic framework for the Match
and Set features. Incoming packets are classified according to match rules using fields from L2-L4 headers, and defined actions are taken
accordingly. Examples include QoS remarking and policing, monitoring using SPAN, sFlow, and Forwarding such as PBR, L2 redirect.
Flow-based monitoring conserves bandwidth by only inspecting specified traffic instead of inspecting all interface traffic. Flow-based
tracking allows you to monitor only the traffic that is received by the source port, and that matches criteria in the ingress access-lists
(ACLs). IPv4 ACLs, IPv6 ACLs, and MAC ACLs support flow-based monitoring.
sFlow
sFlow is a standard-based sampling technology that is embedded within switches and routers that monitor network traffic. sFlow provides
traffic monitoring for high-speed networks with many switches and routers. There are two types of sampling:
• A packet-based sampling of packet flows
• A time-based sampling of interface counters
sFlow monitoring consists of a sFlow agent that is embedded in the device and a sFlow collector. The sFlow agent resides anywhere
within the path of the packet. The agent combines the flow samples and interface counters into sFlow datagrams and forwards them to
the sFlow collector at regular intervals. The datagrams include information about the packet header, ingress and egress interfaces,
sampling parameters, and interface counters. Application-specific integrated circuits, or ASICs, handle packet sampling. The sFlow
collector analyses the datagrams that are received from the different devices and produce a network-wide view of the traffic flows.
Advantages of sFlow
Networking monitoring on PowerSwitches running Enterprise SONiC Distribution by Dell Technologies NOS is possible using sFlow. The
benefits of using sFlow include:
• Standards-based
• Allows scaling
• Monitoring accuracy
sFlow operations
The following operations can be performed using sFlow:
• Enable sFlow
• Disable sFlow
• Configure polling interval
• Disable polling interval
• Add a collector
34 Flow Analysis
• Add a collector with a port number
• Delete a collector
• Add agent-id information
• Disable sFlow agent
• Enable sFlow on the interface
• Disable sFlow on the interface
• Configure sampling rate on the interface
• Disable sampling rate on the interface
sFlow allows the sampled packet to be analyzed using the sFlow collector. Other functions include the polling interval and the sampling
rate. All these details can be configured using the CLI to enable and monitor the sFlow packet samples. Configure sFlow globally or on
individual interfaces on a SONiC switch.
In this example, the collector IP address is 100.67.204.144/24. The broad steps include enabling sFlow for the interface, indicating the
collector IP address (port number can also be mentioned optionally), defining the sFlow agent-id, and optionally modifying the sampling
rate and polling intervals from the default values.
The polling interval is the time in seconds when traffic samples or counters are collected. The interval range is from 5 to 300, and the
default is 20. Enter 0 to disable sFlow traffic polling.
Flow Analysis 35
exit
interface Ethernet 4
sflow enable
exit
Verify sFlow
Validate and verify the configurations for sFlow using the following validation examples:
Disable sFlow
To disable the sFlow configuration later, run the following command in the CLI:
configure terminal
interface Ethernet 0
no sflow enable
exit
interface Ethernet 4
no sflow enable
exit
36 Flow Analysis
Benefits of packet-level network telemetry
Packet-level network telemetry solves various faults that conventional tools cannot solve. Errors that make debugging challenging are
described in the following table.
Silent blackhole Packet-level network tracing allows you to detect and localize a silent black hole. A silent blackhole
can happen when corrupted entries in the TCAM table cannot be examined by monitoring the
forwarding table entries.
End-to-end latency Hop-by-hop latencies between two endpoints are traced using packet-level network telemetry.
Load imbalance The problem of unevenly forwarded flows by a group of ECMP links can be detected by packet-level
network telemetry, as to count the number of specific 5-tuple pattern flows mapped to each link. This
offers a more reliable and direct method of detection and debugging.
Protocol bugs Bugs in the implementation of network protocols such as BGP, PFC, and RDMA can cause
performance and reliability issues in the network. Troubleshooting these protocols is difficult as the
protocols are implemented by a third party.
Everflow
Everflow is a network telemetry system that provides scalable and flexible access to packet-level information in large data centers.
Everflow uses “match and mirror” functionality. Commodity switches can apply actions on packets that match on flexible patterns over
packet headers or payloads and then mirror packets to analysis servers by the action.
Everflow leverages the capability of commodity DCN switches to match based on predefined rules and then execute specific actions like
mirror and encapsulate to reduce the tracing overhead. The standards designed to handle DCN faults are as follows:
• In the DCN environment, flow size distribution is highly skewed, and even small follows are often associated with customer-facing
interactive services with strict performance requirements. Everflow traces every flow in DCN by introducing new rules that match
based on TCP, SYN, FIN, and RST fields in the packet.
• To permit flexible tracing, the packets can be marked with an additional “debug” bit in the header. A new rule can be installed that
traces any packet with the “debug” bit set.
• Like regular data traffic, protocol traffic is also traced as it is critical for the health and performance of DCN.
Flow Analysis 37
Figure 19. Everflow architecture
Everflow configuration can be carried out using the config_db.json file on the Enterprise SONiC Distribution by Dell Technologies
NOS. The high-level steps for configuring Leaf1 are as follows:
38 Flow Analysis
Define ACL table for mirror_acl that lists the ports in a temporary
json file
printf '{"ACL_TABLE": {"MIRROR_ACL": {"stage": "INGRESS", "type": "mirror", "policy_desc":
"Mirror_ACLV4_CREATION", "ports": ["Ethernet8", "Ethernet16"]}}}\n' > /tmp/apply_json2.json
Define the ACL rule that lists the source and destination for L3
packets to be captured
printf '{"ACL_RULE": {"MIRROR_ACL|Mirror_Rule": {"PRIORITY": "999", "IP_PROTOCOL": "61",
"MIRROR_ACTION": "Mirror_Ses", "SRC_IP": "172.16.11.10/24", "DST_IP": "172.16.12.10/24"}}}\n'
> /tmp/apply_json2.json
{
"ACL_RULE": {
"MIRROR_ACL|Mirror_Rule": {
"DST_IP": "172.16.12.10/24",
"IP_PROTOCOL": "61",
"MIRROR_ACTION": "Mirror_Ses",
"PRIORITY": "999",
"SRC_IP": "172.16.11.10/24"
}
},
"ACL_TABLE": {
"MIRROR_ACL": {
"policy_desc": "Mirror_ACLV4_CREATION",
"ports": [
"Ethernet8",
"Ethernet16"
],
"stage": "INGRESS",
"type": "mirror"
}
},
"MIRROR_SESSION": {
"Mirror_Ses": {
"dscp": "50",
"dst_ip": "12.0.0.2",
"gre_type": "0x88be",
"queue": "0",
"src_ip": "172.16.11.10/24",
"ttl": "100"
}
}
Flow Analysis 39
Summary
In this chapter, the introduction to the flow-based services was covered. The advantages of standard-based sampling technology sFlow,
its operations, use case, and example configuration were detailed. Further, the details on the network telemetry system for monitoring
packet-level information, namely EverFlow, its components and operation, use cases, and sample configuration were discussed.
40 Flow Analysis
7
References
Topics:
• SONiC documentation
• Ansible documentation
• Dell Technologies Networking Infrastructure Solutions documentation
SONiC documentation
The following pages provide additional information about SONiC.
SONiC GitHub
Management Documentation
Azure GitHub
SONiC Wiki page
SONiC quick start guide
SONiC Configuration Database Manual
SONiC Command Line Interface Guide
Extended SONiC Telemetry Architecture
SONiC EverFlow High Level Design
Ansible documentation
For additional information about Ansible, see:
Ansible Site
References 41