nfstream
diff --git a/‎README.md
Lines changed: 2 additions & 2 deletions b/‎README.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎demo_notebook.ipynb
Lines changed: 71 additions & 88 deletions b/‎demo_notebook.ipynb
Lines changed: 71 additions & 88 deletions
diff --git a/‎pcaps/KakaoTalk_talk.pcap
-476 KB b/‎pcaps/KakaoTalk_talk.pcap
-476 KB
diff --git a/‎pcaps/ookla.pcap
-4.64 MB b/‎pcaps/ookla.pcap
-4.64 MB
diff --git a/‎pcaps/radiotap.pcap
-189 Bytes b/‎pcaps/radiotap.pcap
-189 Bytes
diff --git a/‎pcaps/tinc.pcap
-349 KB b/‎pcaps/tinc.pcap
-349 KB
diff --git a/‎pcaps/whatsapp_login_call.pcap
-208 KB b/‎pcaps/whatsapp_login_call.pcap
-208 KB
diff --git a/‎requirements.txt
Lines changed: 1 addition & 1 deletion b/‎requirements.txt
Lines changed: 1 addition & 1 deletion
diff --git a/‎pcaps/1kxun.pcap renamed to ‎tests/pcap/1kxun.pcap b/‎pcaps/1kxun.pcap renamed to ‎tests/pcap/1kxun.pcap
diff --git a/‎pcaps/443-chrome.pcap renamed to ‎tests/pcap/443-chrome.pcap b/‎pcaps/443-chrome.pcap renamed to ‎tests/pcap/443-chrome.pcap
@@ -11,22 +11,32 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "[**nfstream**][repo] is a Python package providing fast, flexible, and expressive data structures designed to make working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level building block for\n",
-    "doing practical, **real world** network data analysis in Python. Additionally, it has\n",
-    "the broader goal of becoming **a common network data processing framework for researchers** providing data reproducibility across experiments.\n",
+    "[**NFStream**][repo] is a Python framework providing fast, flexible, and expressive data structures designed to make  working with **online** or **offline** network data both easy and intuitive. It aims to be the fundamental high-level  building block for doing practical, **real world** network data analysis in Python. Additionally, it has the broader goal of becoming **a common network data analytics framework for researchers** providing data reproducibility across experiments.\n",
     "\n",
-    "* **Performance:** **nfstream** is designed to be fast (x10 faster with pypy3 support) with a small CPU and memory footprint.\n",
-    "* **Layer-7 visibility:** **nfstream** deep packet inspection engine is based on [**nDPI**][ndpi]. It allows nfstream to perform [**reliable**][reliable] encrypted applications identification and metadata extraction (e.g. TLS, QUIC, TOR, HTTP, SSH, DNS, etc.).\n",
-    "* **Flexibility:** add a flow feature in 2 lines as an [**NFPlugin**][nfplugin].\n",
-    "* **Machine Learning oriented:** add your trained model as an [**NFPlugin**][nfplugin].\n",
+    "* **Performance:** NFStream is designed to be fast: parallel processing, native C \n",
+    "(using [**CFFI**][cffi]) for critical computation and [**PyPy**][pypy] support.\n",
+    "* **Encrypted layer-7 visibility:** NFStream deep packet inspection is based on [**nDPI**][ndpi]. \n",
+    "It allows NFStream to perform [**reliable**][reliable] encrypted applications identification and metadata \n",
+    "fingerprinting (e.g. TLS, SSH, DHCP, HTTP).\n",
+    "* **Statistical features extraction:** NFStream provides state of the art of flow-based statistical feature extraction. \n",
+    "It includes both post-mortem statistical features (e.g. min, mean, stddev and max of packet size and inter arrival time) \n",
+    "and early flow features (e.g. sequence of first n packets sizes, inter arrival times and\n",
+    "directions).\n",
+    "* **Flexibility:** NFStream is easily extensible using [**NFPlugins**][nfplugin]. It allows to create a new \n",
+    "feature within a few lines of Python.\n",
+    "* **Machine Learning oriented:** NFStream aims to make Machine Learning Approaches for network traffic management \n",
+    "reproducible and deployable. By using NFStream as a common framework, researchers ensure that models are trained using \n",
+    "the same feature computation logic and thus, a fair comparison is possible. Moreover, trained models can be deployed \n",
+    "and evaluated on live network using [**NFPlugins**][nfplugin]. \n",
     "\n",
     "In this notebook, we demonstrate a subset of features provided by [**nfstream**][repo].\n",
     "\n",
-    "[documentation]: https://nfstream.github.io/\n",
     "[ndpi]: https://github.com/ntop/nDPI\n",
     "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
     "[reliable]: http://people.ac.upc.edu/pbarlet/papers/ground-truth.pam2014.pdf\n",
-    "[repo]: https://nfstream.github.io/"
+    "[repo]: https://nfstream.org/\n",
+    "[pypy]: https://www.pypy.org/\n",
+    "[cffi]: https://cffi.readthedocs.io/en/latest/index.html"
    ]
   },
   {
@@ -54,22 +64,24 @@
    "source": [
     "In the following, we are going to use the main object provided by nfstream, `NFStreamer` which have the following parameters:\n",
     "\n",
-    "* `source` [default= `None` ]: Source of packets. Possible values: `live_interface_name` or  `pcap_file_path`.\n",
-    "* `snaplen` [default= `65535` ]: Packet capture length.\n",
-    "* `idle_timeout` [default= `30` ]: Flows that are inactive for more than this value in seconds will be exported.\n",
-    "* `active_timeout` [default= `300` ]: Flows that are active for more than this value in seconds will be exported.\n",
-    "* `plugins` [default= `()` ]: Set of user defined NFPlugins.\n",
-    "* `dissect` [default= `True` ]: Enable nDPI deep packet inspection library for Layer 7 visibility.\n",
-    "* `max_tcp_dissections` [default= `80` ]: Maximum per flow TCP packets to dissect (ignored when dissect=False).\n",
-    "* `max_udp_dissections` [default= `16` ]: Maximum per flow UDP packets to dissect (ignored when dissect=False).\n",
-    "* `statistics` [default= `False`]: Enable statistical flow features extraction.\n",
-    "* `account_ip_padding_size` [default= `False`]: Enable Ethernet padding accounting when reporting IP sizes.\n",
-    "* `enable_guess` [default= True]: Enable/Disable identification engine port guess heuristic.\n",
-    "* `decode_tunnels` [default= True]: Enable/Disable GTP/TZSP tunnels dissection.\n",
-    "* `bpf_filter` [default= None]: Specify a BPF filter for filtering selected traffic\n",
-    "* `promisc` [default= True]: Enable/Disable promiscuous capture mode.\n",
+    "* `source` [default=None]: Packet capture source. Pcap file path or network interface name.\n",
+    "* `decode_tunnels` [default=True]: Enable/Disable GTP/TZSP tunnels decoding.\n",
+    "* `bpf_filter` [default=None]: Specify a [BPF filter][bpf] filter for filtering selected traffic.\n",
+    "* `promiscuous_mode` [default=True]: Enable/Disable promiscuous capture mode.\n",
+    "* `snapshot_length` [default=1500]: Control packet slicing size (truncation) in bytes.\n",
+    "* `idle_timeout` [default=30]: Flows that are idle (no packets received) for more than this value in seconds are expired.\n",
+    "* `active_timeout` [default=300]: Flows that are active for more than this value in seconds are expired.\n",
+    "* `accounting_mode` [default=0] : Specify the accounting mode that will be used to report bytes related features (0: Link layer, 1: IP layer, 2: Transport layer, 3: Payload).\n",
+    "* `udps` [default=None]: Specify user defined NFPlugins used to extend NFStreamer.\n",
+    "* `n_dissections` | [default=20]: Number of per flow packets to dissect for L7 visibility feature. When set to 0, L7 visibility feature is disabled.\n",
+    "* `statistical_analysis` [default=False]: Enable/Disable post-mortem flow statistical analysis.\n",
+    "* `splt_analysis` [default=0]: Specify the sequence of first packets length for early statistical analysis. When set to 0, splt_analysis is disabled.\n",
+    "* `n_meters` [default=0]: Specify the number of parallel metering processes. When set to 0, NFStreamer will automatically scale metering according to available physical cores on the running host.\n",
+    "* `performance_summary` [default=false]: Enable/Disable printing performance summary.\n",
     "\n",
-    "`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method."
+    "`NFStreamer` returns a flow iterator. We can iterate over flows or convert it directly to pandas Dataframe using `to_pandas()` method.\n",
+    "\n",
+    "[bpf]: https://biot.com/capstats/bpf.html"
    ]
   },
   {
@@ -78,7 +90,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = NFStreamer(source=\"pcaps/instagram.pcap\").to_pandas()"
+    "df = NFStreamer(source=\"tests/pcap/instagram.pcap\").to_pandas()"
    ]
   },
   {
@@ -94,7 +106,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can enable statistical flow features extraction as follow:"
+    "We can enable post-mortem statistical flow features extraction as follow:"
    ]
   },
   {
@@ -103,7 +115,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas()"
+    "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas()"
    ]
   },
   {
@@ -119,7 +131,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can enable IP anonymization as follow:"
+    "We can enable early statistical flow features extraction as follow:"
    ]
   },
   {
@@ -128,7 +140,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = NFStreamer(source=\"pcaps/instagram.pcap\", statistics=True).to_pandas(ip_anonymization=True)"
+    "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", splt_analysis=10).to_pandas()"
    ]
   },
   {
@@ -144,9 +156,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n",
-    "\n",
-    "* Compute data ratio on both direction (src2dst and dst2src)"
+    "We can enable IP anonymization as follow:"
    ]
   },
   {
@@ -155,8 +165,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df[\"src2dst_raw_bytes_data_ratio\"] = df['src2dst_raw_bytes'] / df['bidirectional_raw_bytes']\n",
-    "df[\"dst2src_raw_bytes_data_ratio\"] = df['dst2src_raw_bytes'] / df['bidirectional_raw_bytes']"
+    "df = NFStreamer(source=\"tests/pcap/instagram.pcap\", statistical_analysis=True).to_pandas(ip_anonymization=True)"
    ]
   },
   {
@@ -172,7 +181,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "* Filter data according to some criterias:"
+    "Now that we have our Dataframe, we can start analyzing our data as any data. For example we can compute additional features:\n",
+    "\n",
+    "* Compute data ratio on both direction (src2dst and dst2src)"
    ]
   },
   {
@@ -181,43 +192,24 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df[df[\"dst_port\"] == 443].head()"
+    "df[\"src2dst_bytes_data_ratio\"] = df['src2dst_bytes'] / df['bidirectional_bytes']\n",
+    "df[\"dst2src_bytes_data_ratio\"] = df['dst2src_bytes'] / df['bidirectional_bytes']"
    ]
   },
   {
-   "cell_type": "markdown",
+   "cell_type": "code",
+   "execution_count": null,
    "metadata": {},
+   "outputs": [],
    "source": [
-    "## Extend nfstream"
+    "df.head()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n",
-    "\n",
-    "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
-    "\n",
-    "* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class packet_with_40_ip_size(NFPlugin):\n",
-    "    def on_init(self, pkt): # flow creation with the first packet\n",
-    "        if pkt.ip_size == 40:\n",
-    "            return 1\n",
-    "        else:\n",
-    "            return 0\n",
-    "        \n",
-    "    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n",
-    "        if pkt.ip_size == 40:\n",
-    "            flow.packet_with_40_ip_size += 1"
+    "* Filter data according to some criterias:"
    ]
   },
   {
@@ -226,31 +218,25 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = NFStreamer(source=\"pcaps/google_ssl.pcap\", plugins=[packet_with_40_ip_size()]).to_pandas()"
+    "df[df[\"dst_port\"] == 443].head()"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
    "source": [
-    "df.head()"
+    "## Extend nfstream"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our Dataframe have a new column named `packet_with_40_ip_size`.\n",
-    "\n",
-    "In some cases, we need volatile features.\n",
-    "Let's have an example use case as following:\n",
+    "In some use cases, we need to add features that are computed as packet level. Thus, nfstream handles such scenario using [**NFPlugin**][nfplugin].\n",
     "\n",
-    "* We want to compute the maximum per flow  packet inter arrival time.\n",
-    "* Our feature will be based on iat that we do not want as feature.\n",
+    "[nfplugin]: https://nfstream.github.io/docs/api#nfplugin\n",
     "\n",
-    "Note that such feature already implemented within nfstream statistical features."
+    "* Let's suppose that we want bidirectional packets with exact IP size equal to 40 counter per flow."
    ]
   },
   {
@@ -259,18 +245,16 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "class iat(NFPlugin):\n",
-    "    def on_init(self, pkt):\n",
-    "        return [-1, pkt.time] # [iat value, last packet timestamp]\n",
-    "    def on_update(self, pkt, flow):\n",
-    "        flow.iat = [pkt.time - flow.iat[1], pkt.time]\n",
-    "\n",
-    "class maximum_iat_ms(NFPlugin):\n",
-    "    def on_init(self, pkt):\n",
-    "        return -1 # we will set it as -1 as init value\n",
-    "    def on_update(self, pkt, flow):\n",
-    "        if flow.iat[0] > flow.maximum_iat_ms:\n",
-    "            flow.maximum_iat_ms = flow.iat[0]"
+    "class Packet40Count(NFPlugin):\n",
+    "    def on_init(self, pkt, flow): # flow creation with the first packet\n",
+    "        if pkt.ip_size == 40:\n",
+    "            flow.udps.packet_with_40_ip_size=1\n",
+    "        else:\n",
+    "            flow.udps.packet_with_40_ip_size=0\n",
+    "        \n",
+    "    def on_update(self, pkt, flow): # flow update with each packet belonging to the flow\n",
+    "        if pkt.ip_size == 40:\n",
+    "            flow.udps.packet_with_40_ip_size += 1"
    ]
   },
   {
@@ -279,7 +263,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "df = NFStreamer(source=\"pcaps/instagram.pcap\", plugins=[iat(volatile=True), maximum_iat_ms()]).to_pandas()"
+    "df = NFStreamer(source=\"tests/pcap/google_ssl.pcap\", udps=[Packet40Count()]).to_pandas()"
    ]
   },
   {
@@ -295,8 +279,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our Dataframe have a new column named `maximum_iat_ms` containing the maximum observed packet \n",
-    "inter arrval time per flow and set to -1 when there is only 1 packet."
+    "Our Dataframe have a new column named `udps.packet_with_40_ip_size`."
    ]
   }
  ],
 
@@ -1 +1 @@
-nfstream>=5.1.5
+nfstream>=6.0.0