Skip to content

Commit 801ec64

Browse files
committed
Update demo.
1 parent 9c51be9 commit 801ec64

File tree

163 files changed

+74
-91
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

163 files changed

+74
-91
lines changed

README.md

Lines changed: 2 additions & 2 deletions

demo_notebook.ipynb

Lines changed: 71 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,32 @@
1111
"cell_type": "markdown",
1212
"metadata": {},
1313
"source": [
14-
"[**nfstream**][repo] is a Python package providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for\n",
15-
"doing practical, **real world** network data analysis in Python. Additionally, it has\n",
16-
"the broader goal of becoming **a common network data processing framework for researchers** providing data reproducibility across experiments.\n",
14+
"[**NFStream**][repo] is a Python framework providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** network data analysis in Python. Additionally, it has the broader goal of becoming **a common network data analytics framework for researchers** providing data reproducibility across experiments.\n",
1715
"\n",
18-
"* **Performance:** **nfstream** is designed to be fast (x10 faster with pypy3 support) with a small CPU and memory footprint.\n",
19-
"* **Layer-7 visibility:** **nfstream** deep packet inspection engine is based on [**nDPI**][ndpi]. It allows nfstream to perform [**reliable**][reliable] encrypted applications identification and metadata extraction (e.g. TLS, QUIC, TOR, HTTP, SSH, DNS, etc.).\n",
20-
"* **Flexibility:** add a flow feature in 2 lines as an [**NFPlugin**][nfplugin].\n",
21-
"* **Machine Learning oriented:** add your trained model as an [**NFPlugin**][nfplugin].\n",
16+
"* **Performance:** NFStream is designed to be fast: parallel processing, native C \n",
17+
"(using [**CFFI**][cffi]) for critical computation and [**PyPy**][pypy] support.\n",
18+
"* **Encrypted layer-7 visibility:** NFStream deep packet inspection is based on [**nDPI**][ndpi]. \n",
19+
"It allows NFStream to perform [**reliable**][reliable] encrypted applications identification and metadata \n",
20+
"fingerprinting (e.g. TLS, SSH, DHCP, HTTP).\n",
21+
"* **Statistical features extraction:** NFStream provides state of the art of flow-based statistical feature extraction. \n",
22+
"It includes both post-mortem statistical features (e.g. min, mean, stddev and max of packet size and inter arrival time) \n",
23+
"and early flow features (e.g. sequence of first n packets sizes, inter arrival times and\n",
24+
"directions).\n",
25+
"* **Flexibility:** NFStream is easily extensible using [**NFPlugins**][nfplugin]. It allows to create a new \n",
26+
"feature within a few lines of Python.\n",
27+
"* **Machine Learning oriented:** NFStream aims to make Machine Learning Approaches for network traffic management \n",
28+
"reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using \n",
29+
"the same feature computation logic and thus, a fair comparison is possible. Moreover, trained models can be deployed \n",
30+
"and evaluated on live network using [**NFPlugins**][nfplugin]. \n",
2231
"\n",
2332
"In this notebook, we demonstrate a subset of features provided by [**nfstream**][repo].\n",
2433
"\n",
25-
"[documentation]: https://nfstream.github.io/\n",
2634
"[ndpi]: https://github.com/ntop/nDPI\n",
2735
"[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
2836
"[reliable]: http://people.ac.upc.edu/pbarlet/papers/ground-truth.pam2014.pdf\n",
29-
"[repo]: https://nfstream.github.io/"
37+
"[repo]: https://nfstream.org/\n",
38+
"[pypy]: https://www.pypy.org/\n",
39+
"[cffi]: https://cffi.readthedocs.io/en/latest/index.html"
3040
]
3141
},
3242
{
@@ -54,22 +64,24 @@
5464
"source": [
5565
"In the following, we are going to use the main object provided by nfstream, `NFStreamer` which have the following parameters:\n",
5666
"\n",
57-
"* `source` [default= `None` ]: Source of packets. Possible values: `live_interface_name` or `pcap_file_path`.\n",
58-
"* `snaplen` [default= `65535` ]: Packet capture length.\n",
59-
"* `idle_timeout` [default= `30` ]: Flows that are inactive for more than this value in seconds will be exported.\n",
60-
"* `active_timeout` [default= `300` ]: Flows that are active for more than this value in seconds will be exported.\n",
61-
"* `plugins` [default= `()` ]: Set of user defined NFPlugins.\n",
62-
"* `dissect` [default= `True` ]: Enable nDPI deep packet inspection library for Layer 7 visibility.\n",
63-
"* `max_tcp_dissections` [default= `80` ]: Maximum per flow TCP packets to dissect (ignored when dissect=False).\n",
64-
"* `max_udp_dissections` [default= `16` ]: Maximum per flow UDP packets to dissect (ignored when dissect=False).\n",
65-
"* `statistics` [default= `False`]: Enable statistical flow features extraction.\n",
66-
"* `account_ip_padding_size` [default= `False`]: Enable Ethernet padding accounting when reporting IP sizes.\n",
67-
"* `enable_guess` [default= True]: Enable/Disable identification engine port guess heuristic.\n",
68-
"* `decode_tunnels` [default= True]: Enable/Disable GTP/TZSP tunnels dissection.\n",
69-
"* `bpf_filter` [default= None]: Specify a BPF filter for filtering selected traffic\n",
70-
"* `promisc` [default= True]: Enable/Disable promiscuous capture mode.\n",
67+
"* `source` [default=None]: Packet capture source. Pcap file path or network interface name.\n",
68+
"* `decode_tunnels` [default=True]: Enable/Disable GTP/TZSP tunnels decoding.\n",
69+
"* `bpf_filter` [default=None]: Specify a [BPF filter][bpf] filter for filtering selected traffic.\n",
70+
"* `promiscuous_mode` [default=True]: Enable/Disable promiscuous capture mode.\n",
71+
"* `snapshot_length` [default=1500]: Control packet slicing size (truncation) in bytes.\n",
72+
"* `idle_timeout` [default=30]: Flows that are idle (no packets received) for more than this value in seconds are expired.\n",
73+
"* `active_timeout` [default=300]: Flows that are active for more than this value in seconds are expired.\n",
74+
"* `accounting_mode` [default=0] : Specify the accounting mode that will be used to report bytes related features (0: Link layer, 1: IP layer, 2: Transport layer, 3: Payload).\n",
75+
"* `udps` [default=None]: Specify user defined NFPlugins used to extend NFStreamer.\n",
76+
"* `n_dissections` | [default=20]: Number of per flow packets to dissect for L7 visibility feature. When set to 0, L7 visibility feature is disabled.\n",
77+
"* `statistical_analysis` [default=False]: Enable/Disable post-mortem flow statistical analysis.\n",
78+
"* `splt_analysis` [default=0]: Specify the sequence of first packets length for early statistical analysis. When set to 0, splt_analysis is disabled.\n",
79+
"* `n_meters` [default=0]: Specify the number of parallel metering processes. When set to 0, NFStreamer will automatically scale metering according to available physical cores on the running host.\n",
80+
"* `performance_summary` [default=false]: Enable/Disable printing performance summary.\n",
7181
"\n",
72-
"`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method."
82+
"`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method.\n",
83+
"\n",
84+
"[bpf]: https://biot.com/capstats/bpf.html"
7385
]
7486
},
7587
{
@@ -78,7 +90,7 @@
7890
"metadata": {},
7991
"outputs": [],
8092
"source": [
81-
"df = NFStreamer(source=\"pcaps/instagram.pcap\").to_pandas()"
93+
"df = NFStreamer(source=\"tests/pcap/instagram.pcap\").to_pandas()"
8294
]
8395
},
8496
{
@@ -94,7 +106,7 @@
94106
"cell_type": "markdown",
95107
"metadata": {},
96108
"source": [
97-
"We can enable statistical flow features extraction as follow:"
109+
"We can enable post-mortem statistical flow features extraction as follow:"
98110
]
99111
},
100112
{
@@ -103,7 +115,7 @@
103115
"metadata": {},
104116
"outputs": [],
105117
"source": [
106-
"df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas()"
118+
"df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas()"
107119
]
108120
},
109121
{
@@ -119,7 +131,7 @@
119131
"cell_type": "markdown",
120132
"metadata": {},
121133
"source": [
122-
"We can enable IP anonymization as follow:"
134+
"We can enable early statistical flow features extraction as follow:"
123135
]
124136
},
125137
{
@@ -128,7 +140,7 @@
128140
"metadata": {},
129141
"outputs": [],
130142
"source": [
131-
"df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas(ip_anonymization=True)"
143+
"df = NFStreamer(source=\"tests/pcap/instagram.pcap\", splt_analysis=10).to_pandas()"
132144
]
133145
},
134146
{
@@ -144,9 +156,7 @@
144156
"cell_type": "markdown",
145157
"metadata": {},
146158
"source": [
147-
"Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n",
148-
"\n",
149-
"* Compute data ratio on both direction (src2dst and dst2src)"
159+
"We can enable IP anonymization as follow:"
150160
]
151161
},
152162
{
@@ -155,8 +165,7 @@
155165
"metadata": {},
156166
"outputs": [],
157167
"source": [
158-
"df[\"src2dst_raw_bytes_data_ratio\"] = df['src2dst_raw_bytes'] / df['bidirectional_raw_bytes']\n",
159-
"df[\"dst2src_raw_bytes_data_ratio\"] = df['dst2src_raw_bytes'] / df['bidirectional_raw_bytes']"
168+
"df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas(ip_anonymization=True)"
160169
]
161170
},
162171
{
@@ -172,7 +181,9 @@
172181
"cell_type": "markdown",
173182
"metadata": {},
174183
"source": [
175-
"* Filter data according to some criterias:"
184+
"Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n",
185+
"\n",
186+
"* Compute data ratio on both direction (src2dst and dst2src)"
176187
]
177188
},
178189
{
@@ -181,43 +192,24 @@
181192
"metadata": {},
182193
"outputs": [],
183194
"source": [
184-
"df[df[\"dst_port\"] == 443].head()"
195+
"df[\"src2dst_bytes_data_ratio\"] = df['src2dst_bytes'] / df['bidirectional_bytes']\n",
196+
"df[\"dst2src_bytes_data_ratio\"] = df['dst2src_bytes'] / df['bidirectional_bytes']"
185197
]
186198
},
187199
{
188-
"cell_type": "markdown",
200+
"cell_type": "code",
201+
"execution_count": null,
189202
"metadata": {},
203+
"outputs": [],
190204
"source": [
191-
"## Extend nfstream"
205+
"df.head()"
192206
]
193207
},
194208
{
195209
"cell_type": "markdown",
196210
"metadata": {},
197211
"source": [
198-
"In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n",
199-
"\n",
200-
"[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
201-
"\n",
202-
"* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow."
203-
]
204-
},
205-
{
206-
"cell_type": "code",
207-
"execution_count": null,
208-
"metadata": {},
209-
"outputs": [],
210-
"source": [
211-
"class packet_with_40_ip_size(NFPlugin):\n",
212-
" def on_init(self, pkt): # flow creation with the first packet\n",
213-
" if pkt.ip_size == 40:\n",
214-
" return 1\n",
215-
" else:\n",
216-
" return 0\n",
217-
" \n",
218-
" def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n",
219-
" if pkt.ip_size == 40:\n",
220-
" flow.packet_with_40_ip_size += 1"
212+
"* Filter data according to some criterias:"
221213
]
222214
},
223215
{
@@ -226,31 +218,25 @@
226218
"metadata": {},
227219
"outputs": [],
228220
"source": [
229-
"df = NFStreamer(source=\"pcaps/google_ssl.pcap\", plugins=[packet_with_40_ip_size()]).to_pandas()"
221+
"df[df[\"dst_port\"] == 443].head()"
230222
]
231223
},
232224
{
233-
"cell_type": "code",
234-
"execution_count": null,
225+
"cell_type": "markdown",
235226
"metadata": {},
236-
"outputs": [],
237227
"source": [
238-
"df.head()"
228+
"## Extend nfstream"
239229
]
240230
},
241231
{
242232
"cell_type": "markdown",
243233
"metadata": {},
244234
"source": [
245-
"Our Dataframe have a new column named `packet_with_40_ip_size`.\n",
246-
"\n",
247-
"In some cases, we need volatile features.\n",
248-
"Let's have an example use case as following:\n",
235+
"In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n",
249236
"\n",
250-
"* We want to compute the maximum per flow packet inter arrival time.\n",
251-
"* Our feature will be based on iat that we do not want as feature.\n",
237+
"[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
252238
"\n",
253-
"Note that such feature already implemented within nfstream statistical features."
239+
"* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow."
254240
]
255241
},
256242
{
@@ -259,18 +245,16 @@
259245
"metadata": {},
260246
"outputs": [],
261247
"source": [
262-
"class iat(NFPlugin):\n",
263-
" def on_init(self, pkt):\n",
264-
" return [-1, pkt.time] # [iat value, last packet timestamp]\n",
265-
" def on_update(self, pkt, flow):\n",
266-
" flow.iat = [pkt.time - flow.iat[1], pkt.time]\n",
267-
"\n",
268-
"class maximum_iat_ms(NFPlugin):\n",
269-
" def on_init(self, pkt):\n",
270-
" return -1 # we will set it as -1 as init value\n",
271-
" def on_update(self, pkt, flow):\n",
272-
" if flow.iat[0] > flow.maximum_iat_ms:\n",
273-
" flow.maximum_iat_ms = flow.iat[0]"
248+
"class Packet40Count(NFPlugin):\n",
249+
" def on_init(self, pkt, flow): # flow creation with the first packet\n",
250+
" if pkt.ip_size == 40:\n",
251+
" flow.udps.packet_with_40_ip_size=1\n",
252+
" else:\n",
253+
" flow.udps.packet_with_40_ip_size=0\n",
254+
" \n",
255+
" def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n",
256+
" if pkt.ip_size == 40:\n",
257+
" flow.udps.packet_with_40_ip_size += 1"
274258
]
275259
},
276260
{
@@ -279,7 +263,7 @@
279263
"metadata": {},
280264
"outputs": [],
281265
"source": [
282-
"df = NFStreamer(source=\"pcaps/instagram.pcap\", plugins=[iat(volatile=True), maximum_iat_ms()]).to_pandas()"
266+
"df = NFStreamer(source=\"tests/pcap/google_ssl.pcap\", udps=[Packet40Count()]).to_pandas()"
283267
]
284268
},
285269
{
@@ -295,8 +279,7 @@
295279
"cell_type": "markdown",
296280
"metadata": {},
297281
"source": [
298-
"Our Dataframe have a new column named `maximum_iat_ms` containing the maximum observed packet \n",
299-
"inter arrval time per flow and set to -1 when there is only 1 packet."
282+
"Our Dataframe have a new column named `udps.packet_with_40_ip_size`."
300283
]
301284
}
302285
],

pcaps/KakaoTalk_talk.pcap

-476 KB
Binary file not shown.

pcaps/ookla.pcap

-4.64 MB
Binary file not shown.

pcaps/radiotap.pcap

-189 Bytes
Binary file not shown.

pcaps/tinc.pcap

-349 KB
Binary file not shown.

pcaps/whatsapp_login_call.pcap

-208 KB
Binary file not shown.

requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
nfstream>=5.1.5
1+
nfstream>=6.0.0
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy