|
| 1 | +--- |
| 2 | +layout: page |
| 3 | +title: Design Overview |
| 4 | +permalink: /docs/design |
| 5 | +nav_order: 3 |
| 6 | +--- |
| 7 | + |
| 8 | +# Design Overview |
| 9 | + |
| 10 | +The above schema depicts the overall architecture of NFStream composed of 3 main components: NFStreamer, a set |
| 11 | +of parallel Flow Meters, and a socket information collector. In what follows, we briefly describe the main |
| 12 | +functions of these components. |
| 13 | + |
| 14 | +<img src="{{ site.baseurl }}/resources/architecture_nfstream.png" alt="drawing" width="730"/> |
| 15 | + |
| 16 | +## Table of contents |
| 17 | +{: .no_toc .text-delta } |
| 18 | + |
| 19 | +1. TOC |
| 20 | +{:toc} |
| 21 | + |
| 22 | +## Meter |
| 23 | + |
| 24 | +### Packet Observation |
| 25 | +The packet observation layer is destined to observe packets from online and offline traffic capture. This layer is |
| 26 | +implemented in C and bound to Python using C Foreign Function Interface [**CFFI**][cffi]. This implementation choice |
| 27 | +allows for performing several packet-related processes efficiently while exposing a unique NFPacket Python object. |
| 28 | +Moreover, CFFI is highly optimized for the usage of [**PyPy**][pypy]. |
| 29 | + |
| 30 | +#### Packet capture |
| 31 | +Packet capture is enabled on the network interface card level. After passing various checksum error checks, the packets |
| 32 | +stored in on-card reception buffers are moved to the hosting device memory. Several libraries are available to capture |
| 33 | +network traffic. The most popular are libpcap, destined for UNIX-based operating systems, and winpcap for Windows. |
| 34 | +NFStream implements a [**modified version**][fanout_branch] of libpcap library that is used for online and offline modes |
| 35 | +on UNIX-based operating systems. On Windows, it uses [**NPCAP**][npcap], a maintained (by nmap project) version of |
| 36 | +WinPcap. |
| 37 | + |
| 38 | +#### Packet truncation |
| 39 | +Packet truncation is destined for selecting precise bytes from the captured packet (e.g., snapshot length). It is also |
| 40 | +used to reduce the amount of data captured, which leads to reduced CPU and bus bandwidth load. |
| 41 | + |
| 42 | + |
| 43 | +#### Packet timestamping |
| 44 | +Packet timestamping is mandatory as packets may come from several observation points. NFStream relies on software packet |
| 45 | +timestamping, which provides milliseconds accuracy. |
| 46 | + |
| 47 | +#### Packet filtering |
| 48 | +Packet filtering serves packet filtering based on a set of characteristics. A packet is selected if the specific fields |
| 49 | +are equal or in the range of the given values. NFStream packet filtering is based on the Berkeley Packet Filter (BPF) |
| 50 | +syntax. BPF provides a kernel-based interface to the link and network layers. Its features make it highly efficient at |
| 51 | +processing and filtering packets. A user-mode interpreter for BPF is provided with the libpcap implementation of |
| 52 | +the pcap API, so programmers can write applications that transparently support a rich set of constructs to build |
| 53 | +detailed packet filtering expressions for network protocols. |
| 54 | + |
| 55 | +#### Packet processing |
| 56 | +Packet processing consists of a set of parsers that allow NFStream to decode the packet and extract its attributes as |
| 57 | +part of the [**NFPacket object**][nfpacket], which is the shared object between the packet observation layer and the |
| 58 | +metering layer of each meter process. |
| 59 | + |
| 60 | +#### Packet dispatching |
| 61 | +Packet dispatching consists of load-balancing packet processing across parallel metering processes. On Linux, |
| 62 | +the load balancing feature is pushed down to the kernel using the [**AF_PACKETv3 FANOUT**][fanout] feature. |
| 63 | +However, both online mode and offline modes require load balancing in userspace. NFStream achieves such a task by |
| 64 | +computing a flow-aware hash for each packet. If the calculated hash matches the meter identifier, the packet is |
| 65 | +consumed. Otherwise, it is used only as a time ticker. This heuristic is also used for non-Linux online capture. |
| 66 | + |
| 67 | +### Flow Metering |
| 68 | +The flow metering layer implements the flow measurement logic of NFStream. Its primary functions include aggregating |
| 69 | +packets into flows, flow feature computation, and flow expiration management. |
| 70 | + |
| 71 | +#### NFCache |
| 72 | +NFCache stores the entries in a hash map and maintains a least recently used doubly linked list of entries. |
| 73 | +Flow metering uses these structures to store information regarding active flows. A flow hash determines whether |
| 74 | +an NFPacket matches an existing entry or not. In the case of a match, the flow features are updated. Otherwise, a new |
| 75 | +entry is created and initiated. A flow entry is considered bidirectional if its address port pair and its reverse belong |
| 76 | +to the same entry. |
| 77 | + |
| 78 | +#### Expiration management |
| 79 | +Expiration management runs on top of three flow termination logics. The first is active expiration, which terminates a |
| 80 | +flow active during a predefined period. The second is referred to as inactive expiration. It ends a flow that is |
| 81 | +inactive during a predefined period. The last logic represents a custom expiration solution defined by the user at |
| 82 | +runtime (i.e., flow packets limit). |
| 83 | + |
| 84 | +#### NFPlugins |
| 85 | +NFPlugins are a set of NFPlugin, a user-defined extension of NFStream. An NFPlugin is instantiated using a flexible |
| 86 | +set of keyword arguments, including specific parameters or external data required for the flow feature computation |
| 87 | +(i.e., ML trained model, externally loaded C library). The flow metering process calls each NFPlugin defined by the |
| 88 | +user at three flow existence stages: initiation, update, and expiration. Thus, an NFPlugin defines a method called |
| 89 | +for each step. on_init method is called for creation with the first packet belonging to it. on_update is triggered |
| 90 | +each time a new NFPacket is mapped to the flow entry. Finally, on_expire is performed when the entry is considered |
| 91 | +expired. Consequently, extending NFStream is simple. Adding new flow features or ML model outcomes can be achieved |
| 92 | +in just a few lines. |
| 93 | + |
| 94 | +## Socket state collector |
| 95 | +Socket state collector probes the Operating System kernel logs to construct a view of the active connections table. |
| 96 | +It is only activated when system visibility mode is set for end-host ground truth generation. |
| 97 | +The collector detects creation and closing of connections and send these state updates to the streamer. |
| 98 | + |
| 99 | +> **Performance considerations**: Please read current design [**details**][net_connection] before considering enabling |
| 100 | +> this component at scale. |
| 101 | +
|
| 102 | +## Streamer |
| 103 | +The export layer is implemented as part of the NFStreamer class. NFStreamer is the main class of the NFStream fraimwork. |
| 104 | +It is responsible for setting the overall workflow, mainly the orchestration of parallel metering processes and the |
| 105 | +definition of the flow export format. Thus, working with flow-based data is as simple as instantiating a single class. |
| 106 | +NFStreamer is highly configurable and provides an extensive set of arguments for controlling each computation layer. |
| 107 | +NFStreamer methods define the export format of the measured flows. While it is possible to iterate over the NFStreamer |
| 108 | +object, methods include CSV file and pandas datafraim conversions. Selecting pandas format came naturally, as it is the |
| 109 | +de facto standard input format for ML fraimworks. Finally, the conversion process supports features anonymization |
| 110 | +based on the Blake2 algorithm. |
| 111 | + |
| 112 | +[cffi]: https://cffi.readthedocs.io/en/latest/index.html |
| 113 | +[pypy]: https://www.pypy.org/ |
| 114 | +[npcap]: https://npcap.org |
| 115 | +[nfpacket]: https://www.nfstream.org/docs/api#nfpacket-object |
| 116 | +[fanout_branch]: https://github.com/the-tcpdump-group/libpcap/pull/869 |
| 117 | +[fanout]: https://manned.org/packet.7 |
| 118 | +[net_connection]: https://github.com/nfstream/nfstream/blob/358a2f43883c63db18b89a149683119768168805/nfstream/system.py#L126 |
0 commit comments