Jaejong dsn23
Jaejong dsn23
Abstract—We investigate the feasibility of targeted privacy (the radio layer) [4]. This means that previously-proposed
attacks using only information available in physical channels attacks will only work when the attacker can access either the
of LTE mobile networks and propose three privacy attacks to victim’s mobile device or the base station the victim’s device
demonstrate this feasibility: mobile-app fingerprinting attack,
history attack, and correlation attack. These attacks can reveal is connected to, where the attacker can sniff and analyze the
the geolocation of targeted mobile devices, the victim’s app transport layer traffic that is not protected by LTE encryption.
usage patterns, and even the relationship between two users In this case, compromising cell phones or breaking into base
within the same LTE network cell. An attacker also may launch stations can be challenging in the real world. However, is this
these attacks stealthily by capturing radio signals transmitted really necessary?
over the air, using only a passive sniffer as equipment. To
ensure the impact of these attacks on mobile users’ privacy, we In this paper, we demonstrate that the physical channel
perform evaluations in both laboratory and real-world settings, of LTE alone is sufficient to carry out previously-proposed
demonstrating their practicality and dependability. Furthermore, privacy attacks to identify which mobile applications are in
we argue that these attacks can target not only 4G/LTE but also use on a device. Because our attacks function on the physical
the evolving 5G standards. layer of LTE, they can be carried out through self-deployed
Index Terms—LTE, 4G, Fingerprinting, Privacy, Cellular, Ma-
chine Learning sniffers directly on the exact traffic transmitted in the air, and
do not require physical or logical access to user equipment
or network nodes. They also do not require the cooperation
I. I NTRODUCTION
of mobile vendors. Worse, these techniques scale with the
Mobile communication signals permeate the airwaves as the number (and coverage) of deployed sniffers, and by correlating
number of devices and users increases while an average user the use of messaging applications between different devices,
spends almost 4 hours on their mobile devices every day [1]. we could detect communication between targeted users. The
Information that we send across these airwaves and the mobile capabilities that we describe in this paper allow an adversary
applications with which we send it discloses intimate details with moderate resources, such as a nation-state or a local
about our lives, relationship statuses, sexual preferences, po- police department, to identify the pre-trained apps used by
litical views, and much more. These revelations can be so and detect communication links between targeted users, on
sensitive that even the disclosure of an installed app on a a city-wide scale and with a reasonable success rate, using
user’s phone can have significant personal and professional undetectable techniques.
repercussions [2]. With the unprecedented and continuing To function on the physical layer of LTE, our attack cannot
growth of mobile networks, the protection of mobile user depend on techniques used by previous work, such as the
privacy is more pressing than ever. analysis of IP addresses, ports, domain names, or TLS certifi-
One approach to protecting mobile user privacy is network cates. Instead, our mobile application fingerprinting framework
traffic encryption, such as Transport Layer Security (TLS) pro- uses physical-level information (e.g., transport block sizes) in
tocol. While TLS protects the contents of application traffic, wireless traffic to fingerprint mobile applications used by a
research has shown that an attacker can reliably identify which victim. Using a labeled dataset collected by our framework,
mobile applications are in use by monitoring and fingerprinting we trained a machine learning model to classify traffic and
unencrypted metadata in TLS traffic, such as destination predict mobile applications that are in use. To ensure the
IPs, destination ports, and TLS certificates [3]. Fortunately, practical feasibility of the proposed attacks in the real world,
modern mobile communication standards, such as Long-Term we assess the variables that affect the performance of the
Evolution (LTE), encrypt even this data in the air interface attacks, including changes in performance over time, the
impact of multiple apps, and LTE handover. Based on this
capability, we propose three privacy attacks against targeted
mobile users: A mobile-app fingerprinting attack, a history
attack that reveals the history of a user’s mobile application
usage, and a correlation attack that reveals the relationship
between any two mobile users.
Fig. 1: LTE Architecture and RRC Connection Procedure.
We evaluated our attack in both laboratory and real-world
settings. Our dataset consists of a total of 350,521 traffic traces
recorded over six months, including 220,278 traces recorded in A Radio Network Temporary Identifier (RNTI) is assigned to
a lab setting and 130,243 live network traces in a real-world the UE by the eNB (②) using the DCI (Downlink Control
setting across three major US mobile networks. In the real Information) message, and a Temporary Mobile Subscriber
world, our mobile-app fingerprinting attack achieved F-Scores Identity (TMSI) is assigned by the EPC (④). By assigning
of 74% to 91%; Our history attack achieved an 83% success RNTIs to connected UEs, the eNB differentiates the user
rate, and our correlation attack showed varied but promising from other connected users. Similarly, TMSIs are unique
results, reaching up to 100% precision rates for certain classes identifiers assigned by the EPC when UEs are registered in
of communication. Finally, we estimate the attack costs that LTE networks; they are limited to the context of the serving
an attacker needs to exert for continuous monitoring and eNB. When UEs move to new cells, the TMSIs will no longer
deduce that the attacker can perform the same level of app be valid. An eNB communicates scheduling information to
fingerprinting and privacy attacks as a large organization. its connected UEs through DCI messages that are carried
Our attacks provide new insights into the vulnerability of within a Physical Downlink Control Channel (PDCCH) which
LTE standards to side-channel attacks and potential directions is an unencrypted channel. While the actual traffic is sent
for the security of LTE traffic. Because the evolving 5G over dedicated channels in encrypted forms, e.g., the Physical
standard suffers from similar weaknesses, we as a community Uplink Shared Channel (PUSCH) and the Physical Downlink
must move quickly to address such weaknesses early. Our Shared Channel (PDSCH), PDCCH transmits DCI messages
paper makes the following contributions: in plain text, which makes these messages trivially decodable.
• Physical-layer mobile application fingerprinting. We To collect specific users’ traffic only, it is essentially nec-
propose a mobile-app fingerprinting attack using only essary to know the RNTIs of the UE. The RNTI may change
data available in LTE physical channels. This attack can randomly, however, based on network policies or UE activity.
be carried out by self-deployed sniffers, at scale, and For instance, to save power and network resources, a UE enters
cannot be detected by victims. “idle mode” if no data is transmitted between the UE and its
• Two novel privacy attacks in LTE networks. Atop eNB for a threshold of time (default 10s). When eNB sends a
the classification model, we propose two novel targeted paging call to the idling UE, the UE will switch to “connection
privacy attacks (history attack and correlation attack), mode”, reconnect to the eNB, and receive a new (usually,
which allow attackers to reveal the specific mobile users’ different) RNTI from eNB. Thus, the validity of an RNTI
app usage patterns, locations, and even the relationship depends in part on the application-layer protocol of mobile
between users. apps [12]. The longer a mobile app transmits data and keeps
• Real-world evaluation. We evaluate our proposed attacks the UE in connection mode, the longer its RNTI will stay
in both laboratory and real-world settings. Our attacks valid.
show high accuracy and reliability in both environments.
B. Known Privacy Attacks in Mobile Networks
In the spirit of open science, we publicly release our lab- Mainly privacy attacks in mobile networks have been cov-
created dataset, the trained model, and the source code of our ered in terms of identity mapping, location leaking, and Web
attack framework upon publication.1 fingerprinting using side-channel information on man-in-the-
II. BACKGROUND middle or locally-intercepted traffic.
A. Overview of LTE Networks UE Identity Mapping. International Mobile Subscriber Iden-
As shown Figure 1, LTE networks are comprised of three tity (IMSI) is the permanent unique number of mobile sub-
main components: user equipment (UE), evolved Node B scribers. IMSI catching is a well-known attack aiming to reveal
(eNB), and evolved packet core (EPC). The UE provides end- the number stored inside UE [13], [14]. This process can be
users with access to cellular services. An eNB is a base station done passively and actively, but the passive opportunity to
that provides a radio connection to the UEs. An EPC is a do so is barely rare. In most cases, it relies on an active
network that manages user registration and mobility. Multiple attacker, such as a fake base station [15] or an overshadowing
identifiers are broadcast to UEs to ensure proper data delivery attack [16] that requires sending messages to the victims
via the unencrypted physical channel exposed to the public. and IMSI Extractor [17] using the undetectable low-power
message overshadowing with sniffers. Also, TMSIs and RN-
1 https://github.com/sefcom/LTE-fingerprint TIs can be leaked by decoding the sniffed traffic over the
2
TABLE I: Comparison of network fingerprint attacks (⃝: supported, X: unsupported, △: partially).
Wired Wi-Fi LTE Ours
Capabilities Our Approach
[5], [6] [7], [8] [9]–[11] (LTE)
Physical layer fingerprints X ⃝ △ ⃝ Decoding Physical Channel (section IV)
Multiple categories classification ⃝ ⃝ X ⃝ 3 Classes: Streaming, Mobile VoIP, IM apps classification (section VI)
Targeted privacy attacks X △ X ⃝ History attack(section VII-B), Correlation Attack (section VII-C)
Real-world evaluation ⃝ ⃝ △ ⃝ U.S. major Mobile Network Operators (MNOs) (section VII-A)
air. Both are temporary identification numbers and normally critical data in a hidden manner. Also, the attackers may wait
TMSI is much longer-lived than RNTI. The feasibility of for the appropriate opportunity to execute the devised attack.
the mapping between permanent and temporary IDs allows
an adversary to reveal the subscriber’s identity, which is D. Related work
called an identity mapping attack. The first identity mapping Bae et al. [9] performed a video identification attack to
attack was proposed by matching passively RNTI and TMSI identify mobile users who are watching specific videos and
in Random Access Channel (RACH) procedure [11]. The predict the video title that each of these users watches by
attacker learns the identity of a user by eavesdropping on the leveraging fingerprinting the labeled datasets in LTE networks.
connection establishment procedure, which is not detectable Their attacks achieved a high accuracy against video stream-
and completely passive, and which is a building block of our ing. However, the performance and scalability of the attack
approach. In a different attack, Kohls et al. [10] proposed could be limited to the detection with the known video stream
actively injecting watermarks in the traffic to identify the traffic traces. Kohls et al. [10] investigated a passive fingerprinting
of the specific user. Typically, an identity mapping attack attack and an active identity-mapping attack on encrypted
serves as a stepping stone for the targeted attack [18] which LTE/4G layer-two traffic in the private lab and commercial
allows the adversary to identify and track the specific user network. They inject specific letter patterns into the mobile app
within a cell for gathering the critical data (or traffic) in a to generate a specific signal pattern, which makes it impractical
stealthy manner. in the real world due to the detection of the user which
Location leaks. Due to the prevalence of the cellular network, leads to the blocking of the apps. Subsequently, Rupprecht
the geographic location tracking of the mobile subscriber et al. [11] presented an LTE layer two security analysis
is a common research interest, as intercepting the victim’s and introduce a passive identity mapping attack and website
paging channel and calling or texting the victim to invoke fingerprinting attacks. However, they limited the scope of the
paging messages. The radio signal also leaks the tracking capability to fingerprint the website within experimental data
area code, received power threshold to trigger a handoff to and provided proof-of-concept. Thus, a commercial network
an adjacent cell, and a series of configuration parameters that test for practicality is limited. As summarized in Table I,
could be leveraged to configure a rogue base station [19], [20]. our approach is not the first attempt at fingerprinting the
Additionally, the timing advance parameter in the signaling LTE traffic, nor the first to propose privacy attacks. However,
plane is also used for location-based attacks [21]. compared with previous work, we extend its targeted privacy
attacks to derive the user’s history of activity and relationship
Traffic Fingerprints Research on traffic fingerprinting and
with other callers in a passive manner. Also, we focus on
classification has been extensive, ranging from the first attacks
the practicality of the attacks with various vectors based on
on Secure Socket Layer (SSL) traffic in 1998 by Cheng et
real-life scenarios such as asynchronous sessions, background
al [22] to the latest deep learning attacks on website traffic [3],
traffic noise, handovers, and the attack cost.
[6], [23]–[25]. Table I shows comparisons of prior network
fingerprint attacks performed on the wired, Wi-Fi, and LTE III. OVERVIEW
networks. The patterns of our collected traffic are highly
similar to the transmission and encryption characteristics of A. Threat Model
HTTP and TCP/IP protocols. To determine the targeted victim, we assume that an attacker
While a dataset of encrypted website fingerprints is avail- may profile the victims based on Open Source Intelligence
able [3], [23], [24], there is little LTE traffic dataset collected (OSINT), such as social network accounts or websites provid-
at the physical channel for fingerprinting. Therefore, we cre- ing individual information. Also, the attacker may stay in the
ate these datasets by collecting LTE traffic via the physical victim’s physical cell and access the cell coverage.
channels in a lab setting and real-world setting. We assume that the attacker’s sniffer is pre-installed within
the target range of an LTE cell/eNB (or several sniffers within
C. Targeted Attacks the range of several LTE cells) that victims are connected to.
A targeted attack is a type of dedicated attack that aims The sniffer also supports the LTE radio cell scanning feature to
at a specific user or group to gain access to critical data in search the specific frequency (channel) and MNO information
a stealthy manner [18]. Our attack is aligned with the char- used for targeting the victim. All other attacker’s capabilities
acteristic of targeted attacks, which is intended for pursuing derive from this requirement. By fulfilling the requirement, the
specific victims the attacker is interested in and obtaining attacker can collect wireless messages from a sole downlink or
3
moves between Cell Zone A’ (their home), Cell Zone B’
(their workplace), and Cell Zone C’ (a grocery store). An
attacker has pre-installed traffic sniffing devices in these cell
zones and can continuously trace RNTIs of the victim’s UE
across zones (i.e., handover between cells) using an IMSI
catcher [13], or perform identity mapping attack [11] described
in Section III-E. By launching the history attack, the attacker
will reveal the victim’s movement history and their app usage
in each location cooperatively.
D. Attack III. Correlation Attack
Fig. 2: Examples of history attack and correlation attack. Since capturing messages in the LTE physical channel is
easy and affordable cost, we extend our attack to multiple
victims and propose a correlation attack: Based on appli-
uplink channel with one sniffer. If an attacker wants to collect cation usage patterns that two UEs have at the same time,
wireless messages from multiple N channels, N sniffers are an attacker may infer the relationship between the victims
required. without having any prior knowledge of the victims or their
In this way, the attacker collects and analyzes “traces” (that app usage patterns. When two users communicate with each
is, physical channel metadata) of a victim’s LTE communi- other using the same app, their LTE traffic patterns could be
cations to identify apps on the victim’s mobile device and similar. If traffic patterns of two UEs correlate, we can learn
perform the aforementioned privacy-infringing attacks. The that their users may be related in some way. For example,
attacker can transfer the collected traces for offline analysis, User A and User B in Figure 2 chat with each other on a
whether to a cloud or private computing center for a pre- messaging platform. An attacker could infer their relationship
trained model to analyze the data. In addition, the attacker (based on communication) and level of intimacy (based on the
can launch attacks in a stealthy manner remotely, using their application used to communicate) by launching a correlation
own multiple devices and their own infrastructure. These attack.
requirements are easy to be satisfied for common people at We leverage correlation analysis to predict the interrelation-
an affordable cost (500 to 1,000 USD per Software Defined ships between users’ traffic patterns. Correlation analysis is a
Radio (SDR)-based sniffer), plus computing power). statistical method for measuring the strength of a relationship.
We collect network traffic of mobile apps for each pair of
B. Attack I. Mobile Apps Fingerprinting Attack User A and B and generate graphs with respect to the number
Our mobile app fingerprinting attack is the foundation of of frames. With the help of pre-trained app fingerprints, we
the next two privacy-infringing attacks. It first begins with data can identify apps in use and the exact type of application
collection: recording the victim’s RTNI, IMSI, and radio traffic (e.g., VoIP calls, video streaming, etc.), and then extract traffic
between the victim’s UE and the eNB. The attacker can then patterns from the traffic. By calculating the similarity between
use a pre-trained classifier to identify application fingerprints traffic patterns, we discover the degrees (similarity scores) of
in the data, matching individual users to the applications the relationship between two victims.
used. Over the entire process, the attacker does not need
E. Overview of the Attack Framework
to break any encryption or actively compromise any devices
or infrastructure (such as the victim’s cellphone or any base Figure 3 illustrates the high-level procedures of the proof-of-
stations). All that is required is a passive sniffer, which allows concept framework that we built for the proposed attacks. Our
the attack to be extremely stealthy, and off-the-shelf physical framework enables an over-the-air, automated, and detailed
channel LTE parsing routines. traffic profiling in LTE physical channels, and implements
LTE physical-channel-based mobile app fingerprinting attack
C. Attack II. History Attack as four procedures: identity mapping, data acquisition, data
The mobile app fingerprinting attack allows an attacker to pre-processing, training, and classification.
deduce which apps a user (victim) is running at any specific ➊ Target Identity Mapping. First, a target identity mapping
time. By analyzing all LTE traffic of a victim over a certain is to match RNTI to TMSI or IMSI, enabling us to identify
period of time, we can recognize the app usage patterns of this users within a cell and serving as an important prerequisite step
victim. Since prior work allows us to associate the victim’s for performing the extended privacy attacks in terms of col-
UE and all random RNTIs that eNBs assign to it [26], [27], lecting the traffic of specific apps of the user continuously. We
we will then be able to reveal the victim’s movement history leverage the prior passive method [11] to map the RNTI to the
together with their app usage patterns. TMSI by exploiting the contention-based resolution identity of
Figure 2 exemplifies a history attack on User A, who is the RRC connection setup message as shown in Figure 1. To
staying within or moving between (roaming or handing over) this end, we first collect and maintain a list of active RNTIs
cell zones of the same mobile operator. User A frequently using open-source software OWL [28] which identifies UEs
4
Fig. 3: Overview of the proof-of-concept framework for fingerprinting mobile apps using LTE physical channel messages
within a given cell. Although the RNTIs are refreshed by the A. Selecting Mobile Apps
eNB, we can detect its changes and correctly associate the We select nine popular mobile apps from three categories
UEs by decoding the corresponding DCI (Downlink Control that are representative of common mobile activities: streaming,
Information) records with CRC bit masking over time using messaging, and VoIP. Since video streaming is a common
this method. Alternatively, to map TMSI to the persistent target of traffic differentiation [31], VoIP and IM are social
identifier (IMSI), we could also consider the existing active connection inferable apps. For streaming, we picked Netflix,
methods [13], [29], [30] mentioned in Section II-B, which YouTube, and Amazon Video; For messaging, we picked Face-
inject watermarks in the traffic or perform man-in-the-middle book Messenger, WhatsApp, and Telegram; And Facebook
attacks. Once we adopt and employ them, our attacks are not Call, WhatsApp, and Skype were selected for VoIP. At the time
entirely performed with a passive sniffer in this step anymore. of writing, these apps topped the charts in their corresponding
If we would insist on the passive mode in the attack, our attack categories on Google Play Store. Note that our attack is not
supports only mapping to a TMSI passively. specific to Android apps, since these apps all have counterparts
➋ Data acquisition. Next, the framework sniffs LTE data off on other mobile platforms, such as iOS.
the airwaves to decode DCI messages to obtain the raw data B. Gaining Insights from Observations
that will be used in the attack. Any data-driven methods require a comprehensive under-
➌ Data preprocessing. Next, the framework generates traces standing of data characteristics and recognition of key patterns
based on decoded RNTIs and sorts traces by the user, as derived from these characteristics. To this end, for our selected
determined by the RNTI and IMSI. These traces are used in mobile apps, we conduct controlled experiments to collect
two ways according to their purpose. the traffic over LTE physical channels in a lab setting where
variables can be manipulated and monitored. For each app,
➍ Training and classification. With a dataset of traces and we manually created Application-Layer sessions using the
ground truth for the applications used over time in the dataset, following approach:
the framework trains a classifier. Given a trained classifier, the
• Streaming apps. We pick several different videos and play
framework queries the classifier to determine what applications
each video for 10 minutes.
are responsible for the observed traces. We first identify the
• Messaging apps. We send and receive text messages, files,
class of the application and then identify individual apps
voices, and emoticons for 10 minutes. These actions are
subsequently.
automatically performed by Auto Clicker [32].
In this way, we use the trained classifier in order to finger- • VoIP apps. We play a piece of music or a speech for 10
print mobile apps by analyzing the DCI messages collected minutes.
from a sniffer. In addition to the capability of fingerprinting
Streaming apps. These mobile apps generally use exist-
mobile apps in the framework, we implemented history and
ing Application Layer protocols, such as Real-Time Stream-
correlation attacks. We will present the details of the frame-
ing Protocol (RTSP), Dynamic Adaptive Streaming HTTP
work in the following sections.
(DASH), and Real-Time Messaging Protocol (RTMP) [33].
As a result, they segment their content at the Application
IV. DATA ACQUISITION AND P ILOT S TUDY Layer. We could spot traffic patterns and the impact of these
protocols. For example, in the case of Netflix, frame sizes
The very first step of the fingerprinting attack is collecting distribute almost uniformly between 0 and 4000 bytes, and
data that will later be used in the machine learning component. the intervals between traffic bursts are relatively long. In the
Our attacking framework uses side-channel leaks in DCI mes- cases of Amazon Prime Video and YouTube, we observe a
sages (Section II-A), such as sizes of the payload and intervals more continuous frame transmission pattern with much shorter
between consecutive messages, as features for classification. intervals between bursts (if there are any). Compared with
In the rest of this section, we will detail how the framework traffic patterns of other mobile apps, video streaming apps
collects and decodes DCI messages, as well as which mobile seem to use much more radio resources at the beginning of
apps to include in our study. each session (intuitively, due to video buffering).
5
Messaging apps. Intuitively, the traffic patterns of instant TABLE II: Selected features for the classifier
messaging (IM) apps are of a dynamic nature, since users Type Features
at both ends of a chat have full control over what messages
Time vector Interarrival time, Cumulative time
and media files to send when to send them, etc. We could
observe that IM apps tend to close a session (in the Application Size vector Frame size (block size)
Layer) if neither end of a chat sends anything for some time Direction vector Uplink, Downlink
(usually a few or tens of seconds) to save resources. When
the Application-Layer session is closed, the old RNTI may Identity vector RNTI
timeout; Once the chat is resumed and a new Application-
Layer session is re-established, a new RNTI is assigned in the Accordingly, we select the features from decoded traffics as
Physical Layer of LTE. As a result, the use of IM apps usually shown in Table II. Interarrival time refers to the time difference
involves a more frequent changing of RNTIs 2 . between the arrival of the frame and then the arrival of the next
VoIP apps. This class of apps usually involves a continuous frame. Cumulative time refers to the sum of interarrival time.
transmission and a more constant usage of radio resources Frame size is the size (in bytes) of one frame which is defined
throughout their sessions. Further, VoIP apps are the only class as Transport Block Size (TBS) in decoded LTE PDCCH, and
of mobile apps with a significant and similar amount of data it is the same size as the payload in the data link layer.
transmitted in both directions (uplink and downlink). Since The uplink and downlink features are used for identifying
mobile VoIP calls are usually bidirectional communication, the relations between two callers. Suppose the sender sent
this observation is reasonable. a specific amount of data at a certain time and the receiver
received an equal amount at that time, then we can assume
C. Configuring Devices for Collecting Data they communicated at that time.
Realizing that different categories of mobile apps have To train our classifier, we create a labeled dataset (trace
vastly different traffic patterns (at least in a lab setting), group) by running selected apps on UEs, detecting the RNTI
we continued to pursue our research by performing data within the PDCCH DCI data traces, and associating the
collection in more realistic settings. In the lab setting, we corresponding DCI trace with a label that identifies the app.
used a commercial off-the-shelf (COTS) radio device (USRP We made this labeling possible by identifying RNTIs from
B210 [34]) and a programmable SIM card [35]. We utilized 5 the traffic that we generated by the UE so that it could be
different UEs to collect the traffic: Google Pixel 3XL (Android identified among all other users. This trace grouping process
Pie), Samsung Galaxy Note 8 (Android Pie) and 10+ (Android is also applied to the classification attack phase except for
10), Motorola E5 (Android Pie), and iPhone11 pro (iOS13). labeling.
For the real-world setting, we used a pre-paid SIM card for VI. T RAINING AND C LASSIFICATION
each of the major US mobile carriers: Verizon, AT&T, and
Three major classes of apps show different patterns based on
T-Mobile. We use a server (Intel Core i7-4760 3.6GHz with
the frame size and its frame time. The identification of apps
32 GB RAM) with Ubuntu Server 18.04 LTS. Once the UE
from traffic is a classification problem where the adversary
established a connection with an eNB, we followed the same
gathers labeled traffic traces of candidate apps for a training set
traffic-generation strategy as described in Section IV-B to
to later test against unlabeled network traffic traces. The buffer
generate and collect traffic.
size for rendering, the feature of the video codec, and radio
V. DATA P REPROCESSING configurations could impact these patterns [12]. To adapt to
this dynamic nature, we must use machine learning algorithms
To generate fingerprints and train the model, we must select to build a classifier.
features from the network traffic. We extracted features from To build a classifier from the recorded datasets, we used
the collected data frames by leveraging the subset tool of Random Forests (RF) [37] and the Weka open-source machine
open source srsLTE, pdsch ue [36]. From the perspective of learning software [38]. RF handles multi-dimensional data in
distinction in traffic, we expect the traffic to show different complex and nonlinear relationships between features and is
sizes and intervals of the frames depending on the behavior of relatively fast to train to be easily scaled to handle large
the apps and their communication with back-end servers. For datasets. For additional details, we provide a benchmark for
instance, streaming apps such as Netflix maintain the session learning classifiers in Section VIII-D to explain why we prefer
for a certain period of time with buffering time on both the the RF over other algorithms. To assure the practicality of our
server- and client-side. However, IM apps have very different framework in real-world settings, we consider asynchronous
sizes and intervals from streaming services. To account for the sessions, where the machine learning algorithm has no knowl-
fact that apps show a different behavior, we focus on the size edge about where the sessions in the trace begin and end.
and time variables of the decoded DCI messages. Each session is split using a time window size (msec) and
2 We used input automation software to create continuous chat sessions to
moved to the next frames from the beginning of the session
collect data for avoiding RNTI refreshes. It could not be realistic but for with a certain time frame. We set the time window as 100
detection purposes, we adopt this mimicking method ms empirically and aggregate the frames in each window.
6
Real-world setting contains no data of real users (even in encrypted form), we
can release this dataset to aid the work of future researchers.
In comparison, the real-world setting is based on traffic
Commercial captured from live commercial mobile networks of three
Decoded Finger-
dataset prints
major US mobile carriers: Verizon, AT&T, and T-Mobile. We
Data collection device
Internet registered our own commercial UEs on these networks and
UE
measured our system’s ability to perform the described attacks
srsLTE on these UEs. Figure 4 illustrates the components for both the
Lab(USRP B210)
tests in the lab setting and the real-world settings.
Laboratory setting
Data collection. In order to capture and decode the LTE
Fig. 4: Experimental setup for lab and real-world settings. handshake (PDCCH), we customized the pdsch ue module
of the open-source LTE software tool srsLTE [36]. Then,
We tested for deriving the optimal window size which helps we extract frame metadata, including timestamp and total
us to decide the value of the parameter in the performance block size in bytes, from the decoded traffic. This is purely a
of the model. Through this splitting procedure using sliding decoding (not decryption) task, using standard LTE protocol
windows, we obtain the synchronization points where each decoding tools. The two extracted values are then used to
session presents patterns similar to the classifier. Retraining generate the time series as described in Section V. This data
the classifier. Traffic patterns of a mobile app may change collection procedure is repeated 10 times for a total of over
over time and deviate from the original pattern as collected for 3,000,000 instances for Streaming, Messaging, and VoIP apps.
training. As such, we may train and update the classification The selected mobile apps are listed in Section IV-A
model in an adaptive manner if the trace is unclassified or
incorrectly classified for the same activity. Likewise, if the Building the training dataset. To build a labeled training
trace is detected with a low F-Score due to the introduction of a dataset in the lab and the real-world setting, we generated
new carrier, a new eNB, or a new mobile app, the trace will be mobile-app traffic using our UEs and repeatedly captured
transferred to the Training and Learning phase to adopt it. We traffic. For each app, we used up to 350,000 samples in both
discuss this retraining issue as updating cost in Section VII-D. the lab and real-world settings, capturing the traffic of each
app for the duration of 10 minutes per trace. Network traces
VII. E VALUATION for each of the interactions were collected separately for 6
months per group.
Ethical consideration. We used the Faraday cage [39] to Traffic patterns and frame metadata are sensitive to operator-
block the radio signal between our experimental lab setting specific configuration, such as the specific resource scheduling
and the commercial LTE networks. Thus, our experiments did algorithms that eNodeBs use, which affect the radio resource
not cause any disruption to the live LTE networks or mobile allocation. Therefore, we build datasets and train our frame-
devices other than the ones used for our evaluation, as wireless work for each mobile network operator in a real-world setting.
signals were blocked from reaching unsuspecting UEs of real- Data drift in LTE traffic. We see that the properties
world users. The real-world experiments required our UEs to of network traces change over time and that different
communicate with the base stations of the commercial mobile geographically-located base stations have their own settings
providers, but this communication was done using standard, such as microcells and macrocells. Therefore, it is necessary
unmodified UEs and followed normal LTE protocols. We ran to update the data set periodically to maintain successful attack
no base station in the real-world experiment, only a passive performance. We discuss the implications of this on real-world
sniffer to capture the traffic between UEs and the providers’ uses of these attacks in Section VII-D and VIII.
base stations. Additionally, our Institutional Review Board
(IRB), which oversees research integrity and ethics, requested A. Mobile-app Fingerprinting Attack
that we only stored data from our own UEs, and discard data 1) The Laboratory Setting: Table III shows the classifica-
from other UEs. We accomplished this by filtering for the tion result in the lab setting. The F-Score shows 98% to 99%
RNTIs used by our UEs. While this reduces the types of accuracy for Streaming apps, 93% to 95% for Messaging apps,
measurements we can make from the dataset, it ensures that and 97% to 99% for VoIP apps. This means our classifier
the private data of users uninvolved in (and fundamentally can reliably recognize mobile apps only based on data readily
unable to consent to) the experiment is protected. available in LTE physical channels. Consequently, we will
Lab vs. Real-world settings. We evaluate our attacks in proceed with our fingerprint attack on a commercial network
two different environments: the lab setting and the real-world setup, which will illustrate how fingerprinting attacks can be
setting. The former setting used our own self-configured LTE executed in a real-world scenario.
base station (eNodeB), and our own UEs. As a result, we have 2) The Real-world Setting: The commercial mobile net-
a high level of control over this experiment, which is useful for work exhibits a dynamic and evolving nature [40], which
verifying the proper functionality of our implementation and may significantly affect the features that our classifier relies
the efficacy of our approach. Additionally, since this dataset on. We discuss the various type of real-world factors (time,
7
TABLE III: Mobile app classification results in the laboratory setting with Random Forest. The results of a classification in the
controlled environment show that we can fingerprint the application running on the target mobile device with high accuracy
by capturing wireless signals.
Down+UP Down (1) UP (0)
Category Mobile Apps
F-Score Precision Recall F-Score Precision Recall F-Score Precision Recall
Netflix 0.991 0.991 0.996 0.991 0.991 0.991 0.989 0.990 0.988
Streaming YouTube 0.996 0.995 0.996 0.996 0.995 0.996 0.995 0.994 0.995
Amazon Prime 0.988 0.988 0.988 0.988 0.988 0.988 0.987 0.986 0.987
Facebook 0.949 0.966 0.932 0.950 0.968 0.933 0.873 0.929 0.824
Messenger WhatsApp 0.952 0.945 0.959 0.934 0.923 0.944 0.941 0.930 0.954
Telegram 0.931 0.929 0.933 0.917 0.907 0.917 0.903 0.901 0.906
Facebook 0.975 0.971 0.979 0.973 0.974 0.972 0.973 0.972 0.975
VoIP call WhatsApp 0.986 0.988 0.983 0.984 0.985 0.984 0.985 0.986 0.983
Skype 0.996 0.996 0.996 0.995 0.995 0.996 0.996 0.996 0.996
TABLE IV: Mobile app classification results in a real-world setting (Downlink Only) with Random Forests. In real mobile
networks, precision, recall, and F-score values decrease by 5 - 30%. Nevertheless, we can still identify the apps with sufficient
confidence.
Verizon AT&T T-Mobile
Category Mobile Apps
F-Score Precision Recall F-Score Precision Recall F-Score Precision Recall
Netflix 0.818 0.821 0.815 0.811 0.821 0.801 0.811 0.809 0.814
Streaming YouTube 0.789 0.799 0.798 0.793 0.789 0.798 0.798 0.801 0.795
Amazon Prime 0.854 0.852 0.856 0.858 0.862 0.854 0.858 0.869 0.847
Facebook 0.841 0.841 0.842 0.840 0.853 0.827 0.829 0.833 0.826
Messenger WhatsApp 0.791 0.794 0.789 0.790 0.792 0.789 0.789 0.794 0.784
Telegram 0.748 0.751 0.746 0.748 0.749 0.747 0.747 0.749 0.746
Facebook 0.901 0.900 0.902 0.839 0.891 0.895 0.911 0.912 0.911
VoIP call WhatsApp 0.794 0.799 0.789 0.791 0.789 0.794 0.779 0.780 0.778
Skype 0.866 0.877 0.855 0.863 0.871 0.855 0.854 0.851 0.858
TABLE V: History attack results showing the duration of traffic capture, F-score, and correctness of the classification. From
the empirical observation, the prediction results become unstable if the F-score falls below 70%.
Location Date Start Time End Time Duration Categories F-score Prediction Result
Zone A’ 05/08/2022 15:00:08 15:06:07 0:05:59 Streaming apps 85.35% Netflix TRUE
Zone B’ 05/08/2022 15:30:10 15:35:25 0:05:15 Messaging apps 67.32% Telegram FALSE
Zone C’ 05/08/2022 17:10:08 17:18:05 0:07:57 VoIP apps 85.35% Netflix TRUE
Zone A’ 05/08/2022 17:25:55 17:35:52 0:09:57 Streaming apps 85.54% YouTube TRUE
Zone B’ 05/08/2022 18:12:25 18:18:06 0:05:41 Messaging apps 78.12% Facebook TRUE
Zone A’ 05/09/2022 11:10:52 11:16:51 0:05:59 VoIP apps 72.12% WhatsApp TRUE
Zone B’ 05/09/2022 12:00:08 12:06:07 0:05:59 Messaging apps 78.12% Netflix TRUE
Zone C’ 05/09/2022 12:35:25 12:41:17 0:05:52 Streaming apps 69.65% Amazon FALSE
Zone A’ 05/10/2022 17:12:16 17:22:00 0:09:44 Streaming apps 88.12% YouTube TRUE
Zone B’ 05/10/2022 17:32:16 17:39:26 0:07:10 VoIP apps 66.45% Skype TRUE
Zone A’ 05/10/2022 18:15:00 18:21:11 0:06:11 Messaging apps 90.52% Facebook TRUE
Zone A’ 05/10/2022 18:25:00 18:31:26 0:06:26 Streaming apps 89.41% Netflix TRUE
multiple apps, handover, and location) which can affect the depicted by Cellmapper [41] which is a crowd-sourced cellular
accuracy of the classifier in Section VIII. Table IV shows tower and coverage mapping service). Each data collection
experimental results, where the F-Scores for all classes of device (sniffer) records and decodes traffic within each cell
mobile apps: 79% to 86% for Streaming apps, 74% to 84% for zone (attack zone). In the experiment, the UE roamed (han-
Messaging apps, and 77% to 91% for VoIP apps. As expected, dover) between cells, and the victim used different apps
the classification performance is lower than in lab settings, for at least 10 minutes within each cell zone. As discussed
and we observe noticeable drops in F-Scores (around 5 to 10 earlier, we tracked only the UEs directly associated with our
percentage points). However, even with this efficacy reduction, experiment. Afterward, we integrated all collected traffic into
application fingerprinting remains relatively robust. one dataset, which was then used for mobile-app fingerprinting
and location tracking of UEs.
B. History Attack
To evaluate the history attack, we set up a radio monitoring Table V shows experimental results. We attempted a history
environment (Figure 2) on the T-Mobile network (Figure 5 attack 12 times in total over 3 days and successfully detected
8
TABLE VI: Similarity scores (D(Tw , Ta )) of the captured traffic traces. The mean values show how close the traffic pairs are
similar.
Messaging VoIP
Facebook WhatsApp Telegram Facebook Call WhatsApp Call Skype
Mean STD-DEV Mean STD-DEV Mean STD-DEV Mean STD-DEV Mean STD-DEV Mean STD-DEV
Lab 0.850 0.131 0.862 0.129 0.750 0.125 0.896 0.085 0.886 0.085 0.930 0.128
AT&T 0.754 0.088 0.772 0.089 0.654 0.091 0.724 0.066 0.754 0.064 0.643 0.080
T-Mobile 0.716 0.082 0.736 0.086 0.688 0.063 0.696 0.059 0.741 0.096 0.676 0.091
Verizon 0.692 0.100 0.652 0.077 0.634 0.095 0.672 0.057 0.782 0.099 0.610 0.052
Average 0.721 0.085 0.730 0.078 0.653 0.082 0.697 0.075 0.759 0.088 0.642 0.083
TABLE VII: Precision and recall values for the similarity classification in correlation attacks. The values are the results of
classification through logistic regression. VoIP apps are easier to identify contact between users.
Messaging VoIP
Facebook WhatsApp Telegram Facebook Call WhatsApp Call Skype
Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall Precision Recall
Lab 0.891 0.833 0.865 0.834 0.975 0.961 1.000 0.980 0.993 0.937 1.000 0.994
AT&T 0.726 0.697 0.659 0.637 0.734 0.694 0.698 0.689 0.736 0.725 0.792 0.766
T-Mobile 0.728 0.727 0.773 0.761 0.773 0.767 0.774 0.750 0.869 0.846 0.736 0.738
Verizon 0.814 0.825 0.727 0.699 0.788 0.792 0.756 0.767 0.774 0.767 0.716 0.719
C. Correlation Attack
As described previously in Section III-D, the attacker may
use the captured LTE traffic patterns that he collected from
different UEs to identify relationships between users (i.e.,
correlation attack). Figure 6 shows three consecutive steps in
our correlation attack. (1) Radio Scanning scans the broadcast
channel to identify the victim’s radio information, (2) App
Detection uses a hierarchical classification method based on
Random Forest, and (3) Similarity Calculation uses the
Dynamic Time Warping (DTW) algorithm as a distance metric
to compare recorded traces [42].
Fig. 5: History attack setup on T-Mobile Network. To use DTW in our analysis, we generated 10 VoIP and
instant messaging traffic traces for each pair of metadata to
measure the similarity in both lab and real-world settings,
where Tw being the time threshold to be considered in the
calculation as a unit of traffic, and Ta as the number of frames
in a time window. By default, we set Tw to 1 second. After
creating a cost matrix, DTW compares the two-time series data
using Euclidean distance, represented as d(D(Tw , Ta )), and
finds the similarity value D(Tw , Ta ) using the Equation (1):
D(Tw − 1, Ta − 1),
D(Tw , Ta ) = d(Tw , Ta ) + min D(Tw − 1, Ta ), (1)
D(Tw , Ta − 1)
Fig. 6: Privacy attack procedure.
Table VI summarizes similarity scores (i.e., D(Tw , Ta )) for
predicting users’ relationships in both the laboratory setting
all mobile apps used in 10 of those cases, achieving an and the real-world setting. Under each set, we run the exper-
83% success rate. Our results confirmed that through the iment 10 times for each mobile app. We see that our attacks
history attack, an attacker may obtain another user’s private were largely successful with similarity scores ranging from
information, such as their per-location app usage, with high 0.61 to 0.93. The results show that traffic traces can be used
confidence and without being noticed by the victim. to reliably detect contact between users.
9
Collecting cost ⃝: 3 In real-world considerations, we discussed
in Section VIII, the attacker may want to train the dataset with
reference to the time and size of the multiple and background
traffic. If we indicate the number of training apps by At , and
assume that apps have Av versions that are different enough
to diminish the classifier’s performance, the number of apps
the attacker needs to record is An = At × Av × Ai , where
Fig. 7: Structuring adversary cost. Ai is the number of instances per app. We indicate the data
recording cost as Colcost (An ).
After comparing similarity scores between different mobile Training Cost ⃝: 5 The training cost is to train the classifier
apps, we found that apps generating lower volumes of traffic with the collected data, which includes the cost of measuring
usually had low similarity scores. To achieve better correlation features Fm and training a classifier Tc . Hence the cost of
attack results, we must collect a sufficient amount of traffic, training is T raincost (An , Fm , Tc ). If Ts indicates the cost of
which may take longer for apps that generate less traffic. training with a single instance of a traffic trace, then the cost
We also tried adjusting the time window to measure the of training the system could be T raincost (An , Fm , TC ) =
effect of smaller samples on the classification. We found that An × Ts .
when the time window shrinks, the similarity score increases App identification cost ⃝, 4 ⃝:
6 For identifying the app,
until the time window reaches a certain threshold. Hence, we the attacker should record test data Td , measure features
can determine the optimal value for the time window for each Fm , and classify using the classifier Tc . Let Vn indicate
app. When we get an optimal time window value, we set it the number of targeted victims and Aa indicate the average
as the new default value for optimizing the DTW calculation. number of apps run by each victim. Then the amount of
This way we iterate through a sequence of optimal values for test data is Td = Vn × Aa . The total testing cost could be
the time window and improve the similarity measurement. Colcost (Td ) + Idcost (Td , Fm , Tc ).
However, the similarity of traffic traces alone does not
guarantee the correlation between the two users; It only shows Retraining Cost ⃝: 10 To keep the performance of the classifier,
the similarity of two independent traffic traces. In other words, the attacker should retrain the classifier over time. If the
it does not prove that the two users are communicating using attacker would keep the performance of the classifier above
the same mobile application. In addition to matching the two a threshold X, the retraining costs contain the cost of training
traffic traces using DTW, we leveraged logistic regression [43] the data (An ), measuring the features Fm and training the
and determined whether the matched traffic traces belong to classifier (Tc ), which is denoted as Retraincost (An , Fm , Tc ).
traces of certain applications. If on average, apps change D day periods, the daily updating
cost is Retraincost (An , Fm , Tc ) /D. In order to reflect the
Table VII shows the precision and recall values of our
daily change in the traffic patterns of mobile apps, the attacker
logistic regression model in the lab and real-world settings.
trains the classifier daily for up to D days. Then, the overall
We collected the similarity values from both our lab and real-
cost of an attacker to keep an attack performance:
world environments. We collected data from each app in our
controlled network environment using our own eNodeB. For P erf (An , Fm , Tc , Td ) = Colcost (An )
the real-world data, we collected data from commercial mobile
+ T raincost (An , Fm , Tc )
network companies including AT&T, T-Mobile, and Verizon.
Facebook Call and Skype results under the lab setting show + Colcost (Td ) + Idcost (Td , Fm , Tc )
a 100% precision, which means that the model exhibits 100% (2)
true positive and 0% false positive rates. Despite the high true
positive rates, other values, especially the recall values, are If the performance falls down under the threshold X in D
lower than 70%, implying that the model fails to classify many days and the attacker should retrain the dataset as described
instances of contact between users. While we expect that the in Equation 3.
overall accuracy of the model would increase if trained on a
larger dataset of similarity, we stress that, given high precision, Cost(An , Fm , Tc , Td )
an attacker just “needs to get lucky once” in detecting a = P erf (An , Fm , Tc , Td ) (3)
connection between users over time. PD
+ 1 (Retraincost /D), if P erf () < X
D. Analytical Attacker Cost Model +0, otherwise
Based on Figure 7, we build the analytical cost model of For example, in Figure 8, the performance degrades by less
an attacker to sustain an attack performance and describe the than 70% in seven days, so the attacker needs to retrain the
scenarios in a holistic investigation. We will skip ⃝
1 , ⃝
2 , ⃝
7 classifier at this point to maintain the performance. Retraining
,⃝8 , ⃝9 , and ⃝11 as common costs, while we will break down the model for D days for keeping the performance could be
the fingerprinting attack cost for each task. expensive for the attackers depending on his aiming.
10
Fig. 9: Impact of noise traffic.
Fig. 8: Decrease in performance over time.
VIII. D ISCUSSION where the presence of noise may impede the effectiveness of
A. Real-world Considerations and Limitations the attack results. In another scenario where multiple UEs
are concurrently using the same apps within the same cell
Time effect: Mobile apps are constantly updating critical fixes
coverage area, there is a possibility that it could result in
or feature improvements. This is reflected in the traffic traces
higher levels of noise traffic at the radio layers. However,
and accordingly in the performance of the attacks. The critical
the traffic we collect and train is exclusively obtained from
effect of time in the performance represents a challenge to the
a single targeted UE, which is filtered through RNTI-based
adversary who should train the classifier on a regular basis.
techniques. Consequently, our attacks are not influenced by the
Depending on the performance that an attacker aims to keep,
interference resulting from multiple UEs executing identical
the cost of training may be different and we evaluate this cost
applications concurrently or any increase in the number of
in Section VII-D. To measure the effect of time, we run the
UEs. (see Section III-A Threat Model).
experiment by training a classifier with traces of the mobile
apps recorded at the time (day) t = 1 for 10 minutes and Handover case: In a real-world setting, there may be extra
test it using traces recorded within 20 days of the mobile costs, technical issues, and inconsistencies that may inhibit the
apps. Figure 8 shows the result of the experiment with the time-demanding traffic sniffing. In the case of the handover
F-Score measured for the classifier for streaming mobile apps scenario, we already showed the feasibility of the trace in
(T-mobile, YouTube) over the same apps on different days. history attack in Section VII-B based on the assumption of
For the rest of the mobile apps, we observed similar drops in the threat model leveraging IMSI catcher and identity mapping
performance. attacks. In another scenario, switching off the victim’s smart-
phone may not use an LTE data connection but instead Wi-Fi
Impacts of noise traffic: It is highly probable that the UE
or even downgrade to 3G. In other words, the user’s radio
operates numerous other apps simultaneously in real-world
resource usage pattern may not be detected by this attack.
usage. Thus we consider this case as noise traffic and attempt
When the attacker tries to create a dataset for the different
to measure the impact of this noise traffic by increasing
radio technologies, our approach should be redesigned with the
the number of running apps. We evaluate the impact on
corresponding radio interface and customized with a machine-
the performance when trained on a single running app and
learning model.
tested on traces recorded under multiple apps and background
apps. In order to simulate this environment, we run a single
app (YouTube on T-Mobile) while running the apps in the B. Countermeasures
background sequentially. We run the 5 to 10 apps in the
background with a delay of 3-4 seconds which were chosen Because our privacy attacks rely on tracking the temporal
randomly from the Google store’s top 10 free apps including ID (RNTI) of the victim, a frequent reassignment of the RNTI
the 9 apps we selected for fingerprinting. In this way, we from the base station can disrupt the tracking and collecting of
created datasets of 5 different sizes (10, 20, 30, 40, and 50 LTE traffic. Obfuscating approaches of the traffic characteris-
K instances) which are used for comparing with the trained tics at layer two also can be implemented to prevent revealing
classifier. Then we train the classifier using a single app trace similarities [44]. Also, pre-existing works in the anonymity
for 10 minutes and test it with the multiple apps’ traces we network approach [45] could be deployed if they are suitable
recorded. As shown in Figure 9, we observe a drop in the F- for use in mobile networks. Although this technique can
Score while we increase the size of the dataset by running mitigate the effectiveness of the traffic fingerprint, generally
multiple background apps. When we increase the amount obfuscating traffic imposes high-performance overhead on data
of background traffic by 10 K instances, there is a drop in transmission, and thus, is difficult to apply in practice. Bae at
performance with the range of 3% to 13%, and when the al. [9] proposed and implemented to modify eNodeB enforcing
effective performance is assumed to be 0.6, if the instance the encryption of the connection procedure to hide the RNTI.
increases more than 30K instances, it may be impossible to However, the additional messages and performance overhead
identify. Accordingly, our proposed attack reveals a limitation remained for adoption in commercials.
11
C. Extension to 5G TABLE VIII: Performance Comparison of Algorithms
There are significant changes in the radio layer of the 5th Algorithm LR kNN CNN RF
generation mobile networks (5G) to support various types of
Streaming 0.613 0.752 0.591 0.819
radio applications (e.g., e-health, automotive, public safety, Calling 0.876 0.760 0.717 0.850
and smart grids) [46]. To guarantee the quality of the service, Messenger 0.605 0.694 0.706 0.793
the 5G application requires a dedicated virtual network service.
However, in the case of the end user, even though the radio Average 0.698 0.735 0.677 0.821
technologies are different, the high-level behavior of the ap- Number Number
plication is not influenced. Due to the different frequency and Parameters C=1 k=4 of class = 3, of tree = 100,
radio channel technologies, we could equip the proper device LF= SCE Seed = 1
to support the radio spectrum to apply our framework. Also, • LR: C refers to the inverse of the regularization strength
5G adopts a new protection mechanism, a globally unique • kNN: k refers to the number of the nearest neighbors
• CNN: LF (Loss Function) sets SCE (Softmax Cross-Entropy)
Subscription Permanent Identifier (SUPI) and Subscription • RF: Number of trees sets Cross-validation
Concealed Identifier (SUCI) [47], to prevent the exposure of • Dataset (real-world): Mixed in equal proportions for each class app
subscriber’s identifier such as IMSI which used to be mapped – Streaming 265,599 / Calling 109,692 / Messenger 38,333
to the identity of the subscriber. Therefore, we would need – Splitting of the dataset: 80% training, 20% testing
to study the correlation methodology with SUPI/SUCI for
collecting a specific user’s traffic. network. It is considerably slower and can become impractical
in certain circumstances. In contrast, RF is an ensemble of
D. Benchmark for Learning Classifiers decision trees where the leaf node represents either the major
We compare the performance against various supervised class for classification problems or the average in the case of
learning algorithms to determine the effectiveness of the regression problems. RF and CNN are different techniques that
classifier and ensure which classifier is effective in our ap- learn differently but can be used in similar domains. However,
proach. We identified the optimal hyperparameters for each RF does not require high-performance hardware such as a
algorithm and present only the results obtained using these GPU and is hence less computationally expensive compared
selected hyperparameters. A weighted accuracy is used to to Neural Networks. From our experience, CNN requires more
measure their performance on the same dataset. We selected data and training to predict the results. Since our data is
four algorithms, namely Logistic Regression (LR), k-Nearest simple tabular data representing the frame size and cumulative
Neighbors (kNN), Convolutional Neural Networks (CNN), and time, we prefer RF over CNN due to its efficiency in training
Random Forest (RF), for our evaluation. To create the dataset, and resource consumption. Therefore, we believe the results
we mixed the apps from each of the three classes (Streaming, obtained with the RF classifier are comparable with its more
Calling, Messenger), based on real-world deployments. The accurate counterpart.
results including details on the configuration parameters and
implementation of the benchmark classifiers can be found in IX. C ONCLUSION
Table VIII. Based on the weighted average of accuracy values, The air interface of the LTE is easily overlooked by the op-
we can see that the RF model performed the best, with an erators and attackers regarding privacy leakage. Also, there has
accuracy of 0.821. The kNN model performed the second best, been little research regarding applied attacks based on finger-
with an accuracy of 0.735. The LR model performed the third prints using physical side-channel. In this work, we introduce
best, with an accuracy of 0.698, and the Convolutional Neural the physical channel fingerprinting method by decoding the
Network model performed the worst, with an accuracy of LTE traffic captured from the air interface. With the collected
0.677. The main limitation of LR is the assumption of linearity dataset from the lab and real-world settings, we built the
between the dependent variable and the independent variables. hierarchical classification model by leveraging sophisticated
In our work, the data is rarely linearly separable. However, the machine learning techniques. To evaluate the performance
relationship between input and output is nonlinear. We use and assure its applicability, we designed and implemented
the cross-validation to determine optimal k for kNN model. the targeted privacy attack framework to perform the History
Thus we perform an iterative process whereby we train and and Correlation attacks in both lab and real-world settings
test the model across a range of k values, from 1 to 10. The evaluation result reveals the sufficient performance of the
For each k value, we calculate the accuracy of the model. model and attack framework. Moreover, our attack framework
Ultimately, the optimal k value is chosen as 4 based on the is implemented with open-source and affordable cost SDR
highest accuracy achieved on the test set. kNN revealed that devices to be practically adapted to anyone.
when applied to large datasets, the prediction stage may exhibit
signs of reduced processing speed. CNN should keep learning ACKNOWLEDGMENT
until it comes out with the best set of features to obtain a This work was partially supported by the grants from Na-
satisfying predictive performance. Also, neural networks are tional Science Foundation (NSF-CICI-2232911) and Institute
organized in layers made up of interconnected nodes which for Information & Communications Technology Promotion
contain an activation function that computes the output of the (IITP-MSIT-2017-0-00168).
12
R EFERENCES [24] J. Ren, M. Lindorfer, D. J. Dubois, A. Rao, D. Choffnes, and N. Vallina-
Rodriguez, “A longitudinal study of pii leaks across android app ver-
[1] Comparitech. (2022) Screen time statistics: Average screen time sions,” in Network and Distributed System Security Symposium (NDSS),
in us vs. the rest of the world. [Online]. Available: https: 2018.
//www.comparitech.com/tv-streaming/screen-time-statistics/ [25] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y. Fratantonio,
[2] T. Guardian. (2021) Us church official resigns after news V. Van Der Veen, and C. Platzer, “Andrubis–1,000,000 apps later: A
outlet uses phone data to out him as grindr user. view on current android malware behaviors,” in 2014 third international
[Online]. Available: https://www.theguardian.com/us-news/2021/jul/21/ workshop on building analysis datasets and gathering experience returns
church-official-resigns-grindr-use-location-data-obtained-the-pillar for security (BADGERS). IEEE, 2014, pp. 3–17.
[3] T. van Ede, R. Bortolameotti, A. Continella, J. Ren, D. J. Dubois, [26] MathWorks. (2022) Lte toolbox: Ue detection using downlink
M. Lindorfer, D. Choffnes, M. van Steen, and A. Peter, “Flowprint: signals. [Online]. Available: https://www.mathworks.com/help/lte/
Semi-supervised mobile-app fingerprinting on encrypted network traf- examples/ue-detection-using-downlink-signals.html
fic,” 2020. [27] Q. Technologies. (2022) Network signal guru. [Online]. Available:
[4] 3GPP, “3gpp system architecture evolution (sae); security architecture,” https://apkpure.com/network-signal-guru/com.qtrun.QuickTest
TS33.401, 2015, latest release: 17.2.0 (2022-06-17). [Online]. Available: [28] N. Bui and J. Widmer, “Owl: A reliable online watcher for lte control
http://www.3gpp.org/DynaReport/33401.htm channel measurements,” in Proceedings of the 5th Workshop on All
[5] M. Swarnkar, N. Hubballi, N. Tripathi, and M. Conti, “Apphunter: Things Cellular: Operations, Applications and Challenges, 2016, pp.
Mobile application traffic classification,” in 2018 IEEE International 25–30.
Conference on Advanced Networks and Telecommunications Systems [29] S. R. Hussain, M. Echeverria, O. Chowdhury, N. Li, and E. Bertino,
(ANTS). IEEE, 2018, pp. 1–6. “Privacy attacks to the 4g and 5g cellular paging protocols using side
[6] V. Rimmer, D. Preuveneers, M. Juarez, T. Van Goethem, and W. Joosen, channel information.” in NDSS, 2019.
“Automated website fingerprinting through deep learning,” arXiv [30] J. Baek, S. Kyung, H. Cho, Z. Zhao, Y. Shoshitaishvili, A. Doupé, and
preprint arXiv:1708.06376, 2017. G.-J. Ahn, “Wi not calling: Practical privacy and availability attacks
[7] J. S. Atkinson, J. E. Mitchell, M. Rio, and G. Matich, “Your wifi in wi-fi calling,” in Proceedings of the 34th Annual Computer Security
is leaking: What do your mobile apps gossip about you?” Future Applications Conference, 2018, pp. 278–288.
Generation Computer Systems, vol. 80, pp. 546–557, 2018. [31] F. Li, A. A. Niaki, D. Choffnes, P. Gill, and A. Mislove, “A large-scale
[8] S. Chen, R. Wang, X. Wang, and K. Zhang, “Side-channel leaks in analysis of deployed traffic differentiation practices,” in Proceedings of
web applications: A reality today, a challenge tomorrow,” in 2010 IEEE the ACM Special Interest Group on Data Communication, 2019, pp.
Symposium on Security and Privacy. IEEE, 2010, pp. 191–206. 130–144.
[9] S. Bae, M. Son, D. Kim, C. Park, J. Lee, S. Son, and Y. Kim, “Watching [32] T. D. Studio. (2022) Auto clicker. [Online]. Available: https:
the watchers: Practical video identification attack in lte networks,” in //play.google.com/store/
31st USENIX Security Symposium (USENIX Security 22), 2022, pp. [33] YouTube. (2022) Live streaming api. [On-
1307–1324. line]. Available: https://developers.google.com/youtube/v3/live/guides/
[10] K. Kohls, D. Rupprecht, T. Holz, and C. Pöpper, “Lost traffic encryption: ingestion-protocol-comparison
fingerprinting lte/4g traffic on layer two,” in Proceedings of the 12th [34] N. Instruments. (2022) Usrp software defined radio de-
Conference on Security and Privacy in Wireless and Mobile Networks, vice. [Online]. Available: https://www.ni.com/en-us/shop/select/
2019, pp. 249–260. usrp-software-defined-radio-device
[11] D. Rupprecht, K. Kohls, T. Holz, and C. Pöpper, “Breaking LTE on [35] openLTE. (2022) Programming you own usim card. [On-
layer two,” in IEEE Symposium on Security & Privacy (SP). IEEE, line]. Available: https://sourceforge.net/p/openlte/wiki/Programming%
May 2019. 20you%20own%20USIM%20card/
[12] Schuster, Roei and Shmatikov, Vitaly and Tromer, Eran, “Beauty and [36] I. Gomez-Miguelez, A. Garcia-Saavedra, P. D. Sutton, P. Serrano,
the burst: Remote identification of encrypted video streams,” in 26th C. Cano, and D. J. Leith, “srslte: an open-source platform for lte
U SEN IX Security Symposium (U SEN IX Security 17), 2017. evolution and experimentation,” in Proceedings of the Tenth ACM
[13] F. Van Den Broek, R. Verdult, and J. de Ruiter, “Defeating imsi International Workshop on Wireless Network Testbeds, Experimental
catchers,” in Proceedings of the 22Nd ACM SIGSAC Conference on Evaluation, and Characterization, 2016, pp. 25–32.
Computer and Communications Security, 2015, pp. 340–351. [37] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp.
[14] A. Shaik, R. Borgaonkar, N. Asokan, V. Niemi, and J.-P. Seifert, 5–32, 2001.
“Practical attacks against privacy and availability in 4g/lte mobile [38] U. of Waikato Machine Learning Group. (2022) Weka: Data mining
communication systems,” arXiv preprint arXiv:1510.07563, 2015. software in java. [Online]. Available: https://www.cs.waikato.ac.nz/ml/
[15] Z. Li, W. Wang, C. Wilson, J. Chen, C. Qian, T. Jung, L. Zhang, K. Liu, weka/
X. Li, and Y. Liu, “Fbs-radar: Uncovering fake base stations at scale in [39] G. Instruments. (2022) The faraday cage: What is it? how does it
the wild.” in NDSS, 2017. work? [Online]. Available: https://www.gamry.com/application-notes/
[16] H. Yang, S. Bae, M. Son, H. Kim, S. M. Kim, and Y. Kim, “Hiding instrumentation/faraday-cage/
in plain signal: Physical signal overshadowing attack on lte,” in 28th [40] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis,” in Annual
USENIX Security Symposium (USENIX Security 19), 2019, pp. 55–72. International Cryptology Conference. Springer, 1999, pp. 388–397.
[17] M. Kotuliak, S. Erni, P. Leu, M. Roeschlin, and S. Čapkun, “LTrack: [41] Cellmapper. (2022) Crowd-sourced cellular tower and coverage mapping
Stealthy tracking of mobile phones in LTE,” in 31st USENIX Security service. [Online]. Available: https://www.cellmapper.net/
Symposium (USENIX Security 22), 2022, pp. 1291–1306. [42] D. J. Berndt and J. Clifford, “Using dynamic time warping to find
[18] A. Sood and R. Enbody, Targeted cyber attacks: multi-staged attacks patterns in time series.” in KDD workshop, vol. 10, no. 16. Seattle,
driven by exploits and malware. Syngress, 2014. WA, 1994, pp. 359–370.
[19] R. P. Jover, “Lte security, protocol exploits and location track- [43] P. Peduzzi, J. Concato, E. Kemper, T. R. Holford, and A. R. Feinstein,
ing experimentation with low-cost software radio,” arXiv preprint “A Simulation Study of the Number of Events per Variable in Logistic
arXiv:1607.05171, 2016. Regression Analysis,” Journal of clinical epidemiology, vol. 49, no. 12,
[20] D. F. Kune, J. Koelndorfer, N. Hopper, and Y. Kim, “Location leaks on 1996.
the gsm air interface,” ISOC NDSS (Feb 2012), 2012. [44] C. V. Wright, S. E. Coull, and F. Monrose, “Traffic morphing: An
[21] J. D. Roth, M. Tummala, J. C. McEachen, and J. W. Scrofani, “On efficient defense against statistical traffic analysis.” in NDSS, vol. 9.
location privacy in lte networks,” IEEE Transactions on Information Citeseer, 2009.
Forensics and Security, vol. 12, no. 6, pp. 1358–1368, 2017. [45] I. Tor Project. (2022) The tor project: Privacy & freedom online.
[22] H. Cheng and R. Avnur, “Traffic Analysis of SSL Encrypted Web [Online]. Available: https://www.torproject.org/
Browsing,” URL citeseer. ist. psu. edu/656522. html, 1998. [46] N. Alliance, “5g white paper,” Next generation mobile networks, white
[23] J. Ren, A. Rao, M. Lindorfer, A. Legout, and D. Choffnes, “Recon: paper, vol. 1, 2015.
Revealing and controlling pii leaks in mobile network traffic,” in [47] 3GPP, “System architecture for the 5g system (5gs),” TS23.501,
Proceedings of the 14th Annual International Conference on Mobile 2017, latest release: 17.5.0 (2022-06-15). [Online]. Available: http:
Systems, Applications, and Services, 2016, pp. 361–374. //www.3gpp.org/DynaReport/23501.htm
13