|
11 | 11 | "cell_type": "markdown",
|
12 | 12 | "metadata": {},
|
13 | 13 | "source": [
|
14 |
| - "[**nfstream**][repo] is a Python package providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for\n", |
15 |
| - "doing practical, **real world** network data analysis in Python. Additionally, it has\n", |
16 |
| - "the broader goal of becoming **a common network data processing framework for researchers** providing data reproducibility across experiments.\n", |
| 14 | + "[**NFStream**][repo] is a Python framework providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** network data analysis in Python. Additionally, it has the broader goal of becoming **a common network data analytics framework for researchers** providing data reproducibility across experiments.\n", |
17 | 15 | "\n",
|
18 |
| - "* **Performance:** **nfstream** is designed to be fast (x10 faster with pypy3 support) with a small CPU and memory footprint.\n", |
19 |
| - "* **Layer-7 visibility:** **nfstream** deep packet inspection engine is based on [**nDPI**][ndpi]. It allows nfstream to perform [**reliable**][reliable] encrypted applications identification and metadata extraction (e.g. TLS, QUIC, TOR, HTTP, SSH, DNS, etc.).\n", |
20 |
| - "* **Flexibility:** add a flow feature in 2 lines as an [**NFPlugin**][nfplugin].\n", |
21 |
| - "* **Machine Learning oriented:** add your trained model as an [**NFPlugin**][nfplugin].\n", |
| 16 | + "* **Performance:** NFStream is designed to be fast: parallel processing, native C \n", |
| 17 | + "(using [**CFFI**][cffi]) for critical computation and [**PyPy**][pypy] support.\n", |
| 18 | + "* **Encrypted layer-7 visibility:** NFStream deep packet inspection is based on [**nDPI**][ndpi]. \n", |
| 19 | + "It allows NFStream to perform [**reliable**][reliable] encrypted applications identification and metadata \n", |
| 20 | + "fingerprinting (e.g. TLS, SSH, DHCP, HTTP).\n", |
| 21 | + "* **Statistical features extraction:** NFStream provides state of the art of flow-based statistical feature extraction. \n", |
| 22 | + "It includes both post-mortem statistical features (e.g. min, mean, stddev and max of packet size and inter arrival time) \n", |
| 23 | + "and early flow features (e.g. sequence of first n packets sizes, inter arrival times and\n", |
| 24 | + "directions).\n", |
| 25 | + "* **Flexibility:** NFStream is easily extensible using [**NFPlugins**][nfplugin]. It allows to create a new \n", |
| 26 | + "feature within a few lines of Python.\n", |
| 27 | + "* **Machine Learning oriented:** NFStream aims to make Machine Learning Approaches for network traffic management \n", |
| 28 | + "reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using \n", |
| 29 | + "the same feature computation logic and thus, a fair comparison is possible. Moreover, trained models can be deployed \n", |
| 30 | + "and evaluated on live network using [**NFPlugins**][nfplugin]. \n", |
22 | 31 | "\n",
|
23 | 32 | "In this notebook, we demonstrate a subset of features provided by [**nfstream**][repo].\n",
|
24 | 33 | "\n",
|
25 |
| - "[documentation]: https://nfstream.github.io/\n", |
26 | 34 | "[ndpi]: https://github.com/ntop/nDPI\n",
|
27 | 35 | "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
|
28 | 36 | "[reliable]: http://people.ac.upc.edu/pbarlet/papers/ground-truth.pam2014.pdf\n",
|
29 |
| - "[repo]: https://nfstream.github.io/" |
| 37 | + "[repo]: https://nfstream.org/\n", |
| 38 | + "[pypy]: https://www.pypy.org/\n", |
| 39 | + "[cffi]: https://cffi.readthedocs.io/en/latest/index.html" |
30 | 40 | ]
|
31 | 41 | },
|
32 | 42 | {
|
|
54 | 64 | "source": [
|
55 | 65 | "In the following, we are going to use the main object provided by nfstream, `NFStreamer` which have the following parameters:\n",
|
56 | 66 | "\n",
|
57 |
| - "* `source` [default= `None` ]: Source of packets. Possible values: `live_interface_name` or `pcap_file_path`.\n", |
58 |
| - "* `snaplen` [default= `65535` ]: Packet capture length.\n", |
59 |
| - "* `idle_timeout` [default= `30` ]: Flows that are inactive for more than this value in seconds will be exported.\n", |
60 |
| - "* `active_timeout` [default= `300` ]: Flows that are active for more than this value in seconds will be exported.\n", |
61 |
| - "* `plugins` [default= `()` ]: Set of user defined NFPlugins.\n", |
62 |
| - "* `dissect` [default= `True` ]: Enable nDPI deep packet inspection library for Layer 7 visibility.\n", |
63 |
| - "* `max_tcp_dissections` [default= `80` ]: Maximum per flow TCP packets to dissect (ignored when dissect=False).\n", |
64 |
| - "* `max_udp_dissections` [default= `16` ]: Maximum per flow UDP packets to dissect (ignored when dissect=False).\n", |
65 |
| - "* `statistics` [default= `False`]: Enable statistical flow features extraction.\n", |
66 |
| - "* `account_ip_padding_size` [default= `False`]: Enable Ethernet padding accounting when reporting IP sizes.\n", |
67 |
| - "* `enable_guess` [default= True]: Enable/Disable identification engine port guess heuristic.\n", |
68 |
| - "* `decode_tunnels` [default= True]: Enable/Disable GTP/TZSP tunnels dissection.\n", |
69 |
| - "* `bpf_filter` [default= None]: Specify a BPF filter for filtering selected traffic\n", |
70 |
| - "* `promisc` [default= True]: Enable/Disable promiscuous capture mode.\n", |
| 67 | + "* `source` [default=None]: Packet capture source. Pcap file path or network interface name.\n", |
| 68 | + "* `decode_tunnels` [default=True]: Enable/Disable GTP/TZSP tunnels decoding.\n", |
| 69 | + "* `bpf_filter` [default=None]: Specify a [BPF filter][bpf] filter for filtering selected traffic.\n", |
| 70 | + "* `promiscuous_mode` [default=True]: Enable/Disable promiscuous capture mode.\n", |
| 71 | + "* `snapshot_length` [default=1500]: Control packet slicing size (truncation) in bytes.\n", |
| 72 | + "* `idle_timeout` [default=30]: Flows that are idle (no packets received) for more than this value in seconds are expired.\n", |
| 73 | + "* `active_timeout` [default=300]: Flows that are active for more than this value in seconds are expired.\n", |
| 74 | + "* `accounting_mode` [default=0] : Specify the accounting mode that will be used to report bytes related features (0: Link layer, 1: IP layer, 2: Transport layer, 3: Payload).\n", |
| 75 | + "* `udps` [default=None]: Specify user defined NFPlugins used to extend NFStreamer.\n", |
| 76 | + "* `n_dissections` | [default=20]: Number of per flow packets to dissect for L7 visibility feature. When set to 0, L7 visibility feature is disabled.\n", |
| 77 | + "* `statistical_analysis` [default=False]: Enable/Disable post-mortem flow statistical analysis.\n", |
| 78 | + "* `splt_analysis` [default=0]: Specify the sequence of first packets length for early statistical analysis. When set to 0, splt_analysis is disabled.\n", |
| 79 | + "* `n_meters` [default=0]: Specify the number of parallel metering processes. When set to 0, NFStreamer will automatically scale metering according to available physical cores on the running host.\n", |
| 80 | + "* `performance_summary` [default=false]: Enable/Disable printing performance summary.\n", |
71 | 81 | "\n",
|
72 |
| - "`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method." |
| 82 | + "`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method.\n", |
| 83 | + "\n", |
| 84 | + "[bpf]: https://biot.com/capstats/bpf.html" |
73 | 85 | ]
|
74 | 86 | },
|
75 | 87 | {
|
|
78 | 90 | "metadata": {},
|
79 | 91 | "outputs": [],
|
80 | 92 | "source": [
|
81 |
| - "df = NFStreamer(source=\"pcaps/instagram.pcap\").to_pandas()" |
| 93 | + "df = NFStreamer(source=\"tests/pcap/instagram.pcap\").to_pandas()" |
82 | 94 | ]
|
83 | 95 | },
|
84 | 96 | {
|
|
94 | 106 | "cell_type": "markdown",
|
95 | 107 | "metadata": {},
|
96 | 108 | "source": [
|
97 |
| - "We can enable statistical flow features extraction as follow:" |
| 109 | + "We can enable post-mortem statistical flow features extraction as follow:" |
98 | 110 | ]
|
99 | 111 | },
|
100 | 112 | {
|
|
103 | 115 | "metadata": {},
|
104 | 116 | "outputs": [],
|
105 | 117 | "source": [
|
106 |
| - "df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas()" |
| 118 | + "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas()" |
107 | 119 | ]
|
108 | 120 | },
|
109 | 121 | {
|
|
119 | 131 | "cell_type": "markdown",
|
120 | 132 | "metadata": {},
|
121 | 133 | "source": [
|
122 |
| - "We can enable IP anonymization as follow:" |
| 134 | + "We can enable early statistical flow features extraction as follow:" |
123 | 135 | ]
|
124 | 136 | },
|
125 | 137 | {
|
|
128 | 140 | "metadata": {},
|
129 | 141 | "outputs": [],
|
130 | 142 | "source": [
|
131 |
| - "df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas(ip_anonymization=True)" |
| 143 | + "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", splt_analysis=10).to_pandas()" |
132 | 144 | ]
|
133 | 145 | },
|
134 | 146 | {
|
|
144 | 156 | "cell_type": "markdown",
|
145 | 157 | "metadata": {},
|
146 | 158 | "source": [
|
147 |
| - "Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n", |
148 |
| - "\n", |
149 |
| - "* Compute data ratio on both direction (src2dst and dst2src)" |
| 159 | + "We can enable IP anonymization as follow:" |
150 | 160 | ]
|
151 | 161 | },
|
152 | 162 | {
|
|
155 | 165 | "metadata": {},
|
156 | 166 | "outputs": [],
|
157 | 167 | "source": [
|
158 |
| - "df[\"src2dst_raw_bytes_data_ratio\"] = df['src2dst_raw_bytes'] / df['bidirectional_raw_bytes']\n", |
159 |
| - "df[\"dst2src_raw_bytes_data_ratio\"] = df['dst2src_raw_bytes'] / df['bidirectional_raw_bytes']" |
| 168 | + "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas(ip_anonymization=True)" |
160 | 169 | ]
|
161 | 170 | },
|
162 | 171 | {
|
|
172 | 181 | "cell_type": "markdown",
|
173 | 182 | "metadata": {},
|
174 | 183 | "source": [
|
175 |
| - "* Filter data according to some criterias:" |
| 184 | + "Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n", |
| 185 | + "\n", |
| 186 | + "* Compute data ratio on both direction (src2dst and dst2src)" |
176 | 187 | ]
|
177 | 188 | },
|
178 | 189 | {
|
|
181 | 192 | "metadata": {},
|
182 | 193 | "outputs": [],
|
183 | 194 | "source": [
|
184 |
| - "df[df[\"dst_port\"] == 443].head()" |
| 195 | + "df[\"src2dst_bytes_data_ratio\"] = df['src2dst_bytes'] / df['bidirectional_bytes']\n", |
| 196 | + "df[\"dst2src_bytes_data_ratio\"] = df['dst2src_bytes'] / df['bidirectional_bytes']" |
185 | 197 | ]
|
186 | 198 | },
|
187 | 199 | {
|
188 |
| - "cell_type": "markdown", |
| 200 | + "cell_type": "code", |
| 201 | + "execution_count": null, |
189 | 202 | "metadata": {},
|
| 203 | + "outputs": [], |
190 | 204 | "source": [
|
191 |
| - "## Extend nfstream" |
| 205 | + "df.head()" |
192 | 206 | ]
|
193 | 207 | },
|
194 | 208 | {
|
195 | 209 | "cell_type": "markdown",
|
196 | 210 | "metadata": {},
|
197 | 211 | "source": [
|
198 |
| - "In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n", |
199 |
| - "\n", |
200 |
| - "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n", |
201 |
| - "\n", |
202 |
| - "* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow." |
203 |
| - ] |
204 |
| - }, |
205 |
| - { |
206 |
| - "cell_type": "code", |
207 |
| - "execution_count": null, |
208 |
| - "metadata": {}, |
209 |
| - "outputs": [], |
210 |
| - "source": [ |
211 |
| - "class packet_with_40_ip_size(NFPlugin):\n", |
212 |
| - " def on_init(self, pkt): # flow creation with the first packet\n", |
213 |
| - " if pkt.ip_size == 40:\n", |
214 |
| - " return 1\n", |
215 |
| - " else:\n", |
216 |
| - " return 0\n", |
217 |
| - " \n", |
218 |
| - " def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n", |
219 |
| - " if pkt.ip_size == 40:\n", |
220 |
| - " flow.packet_with_40_ip_size += 1" |
| 212 | + "* Filter data according to some criterias:" |
221 | 213 | ]
|
222 | 214 | },
|
223 | 215 | {
|
|
226 | 218 | "metadata": {},
|
227 | 219 | "outputs": [],
|
228 | 220 | "source": [
|
229 |
| - "df = NFStreamer(source=\"pcaps/google_ssl.pcap\", plugins=[packet_with_40_ip_size()]).to_pandas()" |
| 221 | + "df[df[\"dst_port\"] == 443].head()" |
230 | 222 | ]
|
231 | 223 | },
|
232 | 224 | {
|
233 |
| - "cell_type": "code", |
234 |
| - "execution_count": null, |
| 225 | + "cell_type": "markdown", |
235 | 226 | "metadata": {},
|
236 |
| - "outputs": [], |
237 | 227 | "source": [
|
238 |
| - "df.head()" |
| 228 | + "## Extend nfstream" |
239 | 229 | ]
|
240 | 230 | },
|
241 | 231 | {
|
242 | 232 | "cell_type": "markdown",
|
243 | 233 | "metadata": {},
|
244 | 234 | "source": [
|
245 |
| - "Our Dataframe have a new column named `packet_with_40_ip_size`.\n", |
246 |
| - "\n", |
247 |
| - "In some cases, we need volatile features.\n", |
248 |
| - "Let's have an example use case as following:\n", |
| 235 | + "In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n", |
249 | 236 | "\n",
|
250 |
| - "* We want to compute the maximum per flow packet inter arrival time.\n", |
251 |
| - "* Our feature will be based on iat that we do not want as feature.\n", |
| 237 | + "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n", |
252 | 238 | "\n",
|
253 |
| - "Note that such feature already implemented within nfstream statistical features." |
| 239 | + "* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow." |
254 | 240 | ]
|
255 | 241 | },
|
256 | 242 | {
|
|
259 | 245 | "metadata": {},
|
260 | 246 | "outputs": [],
|
261 | 247 | "source": [
|
262 |
| - "class iat(NFPlugin):\n", |
263 |
| - " def on_init(self, pkt):\n", |
264 |
| - " return [-1, pkt.time] # [iat value, last packet timestamp]\n", |
265 |
| - " def on_update(self, pkt, flow):\n", |
266 |
| - " flow.iat = [pkt.time - flow.iat[1], pkt.time]\n", |
267 |
| - "\n", |
268 |
| - "class maximum_iat_ms(NFPlugin):\n", |
269 |
| - " def on_init(self, pkt):\n", |
270 |
| - " return -1 # we will set it as -1 as init value\n", |
271 |
| - " def on_update(self, pkt, flow):\n", |
272 |
| - " if flow.iat[0] > flow.maximum_iat_ms:\n", |
273 |
| - " flow.maximum_iat_ms = flow.iat[0]" |
| 248 | + "class Packet40Count(NFPlugin):\n", |
| 249 | + " def on_init(self, pkt, flow): # flow creation with the first packet\n", |
| 250 | + " if pkt.ip_size == 40:\n", |
| 251 | + " flow.udps.packet_with_40_ip_size=1\n", |
| 252 | + " else:\n", |
| 253 | + " flow.udps.packet_with_40_ip_size=0\n", |
| 254 | + " \n", |
| 255 | + " def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n", |
| 256 | + " if pkt.ip_size == 40:\n", |
| 257 | + " flow.udps.packet_with_40_ip_size += 1" |
274 | 258 | ]
|
275 | 259 | },
|
276 | 260 | {
|
|
279 | 263 | "metadata": {},
|
280 | 264 | "outputs": [],
|
281 | 265 | "source": [
|
282 |
| - "df = NFStreamer(source=\"pcaps/instagram.pcap\", plugins=[iat(volatile=True), maximum_iat_ms()]).to_pandas()" |
| 266 | + "df = NFStreamer(source=\"tests/pcap/google_ssl.pcap\", udps=[Packet40Count()]).to_pandas()" |
283 | 267 | ]
|
284 | 268 | },
|
285 | 269 | {
|
|
295 | 279 | "cell_type": "markdown",
|
296 | 280 | "metadata": {},
|
297 | 281 | "source": [
|
298 |
| - "Our Dataframe have a new column named `maximum_iat_ms` containing the maximum observed packet \n", |
299 |
| - "inter arrval time per flow and set to -1 when there is only 1 packet." |
| 282 | + "Our Dataframe have a new column named `udps.packet_with_40_ip_size`." |
300 | 283 | ]
|
301 | 284 | }
|
302 | 285 | ],
|
|
0 commit comments