CISSE v011 I01 p7
CISSE v011 I01 p7
1, Winter 2024
Abstract—In the era of digitalization, a massive amount of AT&T $25 million for failing to protect the clients’ personal
data has been generated from people’s online activities or use information in 2015 [6]; and the University of Mississippi
of portable/wearable devices. The data often carries rich Medical Center was hit with a $2.75 million fine in 2016 by
information about people. Therefore, privacy technologies are the Department of Health and Human Services over a health
needed, from data generation to usage and from transmission data breach [7]. According to Ponemon Institute, the average
to storage, to protect people’s sensitive information. Although total cost of a data breach increased from $3.86 million in
the research community is making great progress in addressing 2020 to $4.24 million in 2021 [8]. The occurrence of the
advanced privacy protection technologies, very few educational
violations caused people to panic [9], especially considering
materials have been developed to incorporate the latest
their daily online activities which generate a massive amount
research results and engage students in learning privacy
technologies, especially for younger generations. In this paper,
of data containing personal information.
we present our newly designed educational materials on Research on privacy has been intensively conducted in
privacy technologies, which can be used for training high the scientific community. Despite the critical societal
quality cybersecurity professionals to meet the ever-increasing importance, privacy education has not been well integrated
demand. The developed learning modules not only incorporate into the undergraduate computer science curricula. Privacy
the latest research results in privacy technologies but also issues are often treated as optional topics instead of key
include effective hand-on lab activities. To help other
fundamental concepts in security learning. A main obstacle
institutions effectively teach privacy technologies, we organized
a faculty training workshop in summer 2022. Twenty-nine
is the serious lack of effective learning materials which can
faculty from twenty institutions nationwide participated in the enable students to understand critical privacy concepts and
training. Survey results show that the participants gained a gain hands-on skills. To address these problems and better
better understanding of privacy issues and demonstrated prepare qualified graduates for future U.S. workforce, the
strong interest in teaching privacy technologies after attending researchers from AAA University and BBB University
the workshop. collaborated in developing innovative learning materials on
privacy protection.
Keywords—Data Privacy, De-anonymization, Relationship
Privacy, Image Privacy, Location Privacy, Web Tracking, IoT
The rest of this paper is organized as follows: Section II
Security and Privacy presents a background of current privacy education and our
project objectives. Section III briefly introduces seven lab
I. INTRODUCTION modules that we developed. Section IV gives two examples
of the hands-on lab activities of the learning modules: Data
With the fast development of the networking services Anonymization and Web Tracking. Section V analyzes the
such as social networks, Internet of Things, and mobile
feedback of the faculty who attended our training workshop
applications, cybersecurity has never been more challenging
on using the developed lab modules. Section VI concludes
than today [1], [2]. Cybersecurity education has been
the paper.
attached with great importance in the newly released
computer science curricula. Similarly, industries favor II. BACKGROUND AND PROJECT OBJECTIVES
qualified workers with security technologies and the demand
is higher than ever [3], [4]. A student possessing this skill set Recently, a team of cross-disciplinary members,
will have a distinct competitive advantage in the market. As including computer scientists, educators, and social
an integral part of cybersecurity, privacy protection has scientists, at the International Computer Science Institute
gained more and more attention from both industry and (ICSI) and UC Berkeley, developed an online privacy
academia due to serious privacy breaches in recent years. For curriculum which targets younger students [10]. This team
example, Facebook was sued over the Cambridge Analytica designed and implemented ten principles with the purpose of
data scandal [5] in which 87 million users’ profiles were spreading the awareness of protecting privacy among
harvested; the Federal Communications Commission fined younger students and helping them better understand what
happens to personal information when it goes online, how it Learning Outcomes: Students will be able to (1) explain
might be used to negatively affect users, and how they can the importance and necessity of data privacy, (2) explain k-
defend their privacy by limiting what they share. They anonymity and its weaknesses, (3) analyze the utility of
focused on online privacy in general, which doesn’t cover as anonymized data, and (4) apply ARX to anonymize sensitive
broad topics of privacy protection as we do in this project. personal data.
Additionally, they are more knowledge based instead of
hands-on learning based. Lab Design and Implementation: This lab consists of
three tasks. The first task is about linkage attack. In this task,
In this project, we focused on developing effective and a brief explanation of linkage attack is provided and then an
hands-on learning modules on privacy protection. Through exercise is given for students to apply the linkage attack on a
the engaging lab activities, we expect to motivate students’ small dataset. The second task is on k-anonymity. In this task,
interests in privacy technologies and deepen their the relevant concepts of k-anonymity protection model are
understanding of privacy issues. The specific objectives of first provided with small examples; Then, how to apply an
the project include: open-source software, ARX, to data anonymization using the
k-anonymity model is introduced. Hands-on exercises are
• Design self-contained privacy learning modules by provided for both parts. The reason that ARX was picked for
encapsulating the hands-on labs and related lecture this lab is two-fold: First, ARX is free, open-source,
contents, which can be infused into teaching functionally rich, and well maintained. Second, ARX can be
different security and privacy subjects and be easily installed on Windows, Linux, and macOS. The third task is
adapted by other institutions. on attacking k-anonymity in which two attack methods,
• Develop effective hands-on labs on privacy breach homogeneity attack and background knowledge attack, are
and protection on various topics, including cutting- introduced and practiced.
edge fields such as the Internet of Things (IoT) and Challenges: Students are required to apply a linkage
social media, with a special effort on developing attack and explain the attack result in the first task. In the
engaging lab setting/labware which enables students second task, students need to apply the k-anonymity model
to gain first-hand experience. to a given table and perform required operations using ARX
• Evaluate the effectiveness of the experiential to anonymize a given dataset. Students are asked to use k-
learning approach on students’ learning outcomes, anonymity attack methods to perform attacks on a large
experience, motivation and attitudes towards privacy anonymized dataset in the third task.
study. B. De-anonymization
III. LAB MODULE OVERVIEW De-anonymization is referred to as re-identifying target
people from anonymized data with extra knowledge. The
So far, we have developed seven privacy learning anonymized dataset is called Target Set (TS) and the dataset
modules on different topics, including Data Privacy, De- with extra knowledge is called Auxiliary Set (AS). De-
anonymization, Relationship Privacy in Online Social anonymizing social media data can use either descriptive
Networks, Image Privacy, Location Privacy, Web Tracking, information, such as users’ hobbies, membership groups,
and IoT Security & Privacy. We created a VirtualBox image location information or behavioral patterns online [13]–[16],
for each lab in the modules, which can easily be adopted by or structural information, such as centrality and
educators from other institutions. Next, we briefly introduce neighborhood topology [17]–[21], or both [22]. De-
the lab modules. Two detailed lab examples will be described anonymization can be implemented with seed based and
in Section IV. signature based attacks. The seed based attacks [23], [24]
A. Data Privacy start with a small number of seeds which are identifiable
Prevalent data collection, both offline and online, by users, and attempt to identify their neighbors, and then their
governments and private entities has gradually become a new neighbors’ neighbors, and so forth. The signature based
“norm”. On the one hand, new technologies such as artificial attacks do not assume the availability of any seeds; instead,
intelligence and data mining heavily rely on gigantic volumes they rely on node signatures [22], [25], [26], which are
of data. On the other hand, privacy concerns about uniquely generated from the nodes’ descriptive or/and
omnipresent data collections have been growing especially structural information, and then match node signatures
when many different datasets can be obtained and crossed by between TS and AS to re-identify users.
the same entity. A well-known linking attack against Learning Outcomes: students will be able to (1) master
people’s data privacy was revealed by Dr. Sweeney [11]. the definition of de-anonymization, (2) know some de-
This module aims to highlight the importance and challenges anonymization technologies, (3) understand how the target
of data privacy and provide hands-on experience for students set and the auxiliary set are prepared, (4) understand the
to understand data anonymization. The lab introduces basic implementation of seed based de-anonymization, (5)
concepts of data anonymization, linkage attack, and the k- understand how to use profile and topological attributes in
anonymity privacy model. This lab provides an opportunity de-anonymization and (6) analyze experiment results to
for students to practice using ARX [12], a data observe what could impact the de-anonymization accuracy.
anonymization tool, to apply k-anonymity.
Lab Design and Implementation: we developed a web = 4. A user can make queries using any display strategies
application using Angular JS, Node.js, Spring, and implemented and visualize the results in a graph
MongoDB. A dataset [27] collected from Weibo (i.e., a implemented with Cytoscape Web [28]. In reality, each
Chinese Twitter) was used to generate TS and AS. The user’s impact on the sociability of an OSN should be
system consists of three primary components: TS and AS measured by multiple factors, including but not limited to his
data generation, de-anonymization, and experiment analysis. online activities and social connections. However, due to the
The data generation component allows users to configure the restriction on the use of Twitter APIs, only a certain number
size of each set, and the percentage of their common nodes of queries can be made in a time window through Twitter
as well as how to anonymize TS, like injecting randomness APIs, therefore, a real-time evaluation was not doable.
into gender or year of birth values. The de-anonymization Instead, the system randomly selects node weights in [0,
component runs a seed-based algorithm in the backend and 1000) as impact indicators. The total weight of nodes visible
visualizes results in a graph implemented with Cytoscape.js in the result graph represents those nodes’ impact on the site
[28]. It allows users to specify the number of initial seeds and sociability and it increases with more queries. A line chart is
decide what information to leverage for de-anonymization, plotted with the number of users whose relations are
either profile attributes (i.e., Year of Birth and Gender) or compromised with queries.
structural attributes (i.e., degree and centrality). The
summary of the de-anonymization result presents with the Challenges: Students are required to run several queries
number of pairs of nodes matched between TS and AS and with three different display strategies and observe their
the number of correct matches. Lastly, the analysis privacy violation levels. Also, they need to compare the
component provides a user-friendly interface to analyze the impact of the strategies on the sociability of OSNs to evaluate
experiment results by plotting different charts. the trade-off between the preservation of relationship privacy
and the retention of sociability.
Challenges: Students are required to run several groups
of experiments with different configurations, from data D. Image Privacy
generation to de-anonymization. Then they need to observe Photo sharing has become a popular activity for online
and analyze what could affect the de-anonymization social network users. Semantically rich photos often contain
accuracy of TS (e.g., how much of TS is anonymized) and not only the information that the uploaders want to share but
how. also the information that is sensitive to others. However, most
of the current OSNs do not have well-defined mechanisms
C. Relationship Privacy for user privacy protection. This labware, which we named
Among the massive amount of data generated by social Facelock was developed for teaching photo privacy. The goal
media platforms, relationship data has been focused more on is to increase students’ awareness of privacy protection while
protection in recent years [29], [30]. Researchers noticed that sharing photos in OSN. Through the hands-on activities,
users’ relationship privacy might be compromised by others students will gain understanding of photo privacy and the
using friend search engines [31]. A friend search engine is an essential concepts of face recognition.
application which can retrieve friend lists of individual users.
In designing a friend search engine, OSN operators tend to Learning Outcomes: Students will (1) be aware of the
display the entire friend list in response to each query in order privacy issues related to photo sharing on OSNs, (2) gain
to increase sociability of the site. However, some users may basic understanding of face recognition and object detection
not feel comfortable to displaying their full friend lists. In through deep learning technologies, (3) know how to use
[31], the author proposed a privacy-aware friend search blurring techniques to maintain a trade-off between privacy
engine which handles the trade-off between privacy protection and utility loss, and (4) understand the access
protection and the sociability of OSNs by setting a k value to control mechanism of OSN users for image sharing and
control the number of friends displayed and selecting which privacy protection.
friends should be displayed. Labware Design and Implementation: The photo privacy
Learning Outcomes: Students are able to (1) be aware that labware consists of three components: (1) a SQLite database,
their behaviors through OSN applications may compromise (2) a web server, and (3) a deep learning based face
other users’ privacy, (2) observe the results from different recognition library. The SQLite database is used to store the
display strategies of a friend search engine, (3) compare and information of the registered users, including their post
evaluate different display strategies in terms of privacy records, profiles, friend lists, etc. For face recognition
preservation and impact on sociability, and (4) understand purposes, each registered user must upload a ”standard”
the trade-off between preserving users’ privacy and picture as part of his profile. The picture will be used for user
enhancing the sociability of OSNs. detection in the photos through a face recognition library.
The Facelock web server hosts a Facebook-like social
Labware Design and Implementation: We developed a network environment where users can post messages and
web application interacting with the Twitter API “GET pictures, search people, and join friend circles, etc. Besides,
followers/list” [32] to query users’ followers. The system Facelock reinforces photo privacy protection. When a user
implements three display strategies Random K (randomly attempts to post a photo, the web server will use the face
selecting K friends), Rank K (choosing the most influential recognition library to tag users in the photo. Each tagged user
ones to the sociability of the OSN) and Top K [31], where K will receive an alert and then take action to edit the photo to
meet his privacy preference. The post will not be made users, the server can plot charts to show how a particular
publicly available until all tagged users respond with their user’s privacy preference changes over time.
privacy protection choices. The face recognition library plays
a key role in user tagging. There have been different deep Challenges: Students are required to use our Android
learning models developed for face recognition. Considering client to request landmarks with different anonymization
the model complexity and the speed required, the one options. Then, they need to go to the LSB web site to filter
adopted by Facelock is an open source application available requests and view his trace on the map. Last, students are
on GitHub [33]. This library is light weight and highly required to visit the analytic service to analyze the trade-off
accurate. Requiring only one profile picture for each OSN of location protection and its cost (i.e., the accuracy loss of
user, it can achieve the recognition accuracy of 99.38%. LBS responses).
page, observe the change of database records upon each IV. EXAMPLES OF HANDS-ON LAB ACTIVITIES
product browsing activity, and analyze how the
advertisement server displays the visitors’ browser history A. Data Anonymization
information on the third-party website. Students are also Our data privacy lab asks students to apply the knowledge
required to manipulate browser settings to understand of data anonymization and k-anonymity and skills of using
different web tracking and protection mechanisms. ARX anonymization tool to anonymize a reasonably large
dataset that contains 30,162 records. During the hands-on
G. IoT Security & Privacy practice, the instructor will first provide the necessary
Recent years have witnessed the exponential growth of background of k-anonymity and then introduce ARX and
Internet of Things (IoT) technologies as well as the soaring demonstrate its basic operations. After that, students can
increase of attacks against IoT devices. Those attacks often launch ARX from the provided VirtualBox VM instance.
exploit weak security protection exposed on many IoT They need to follow the instructions to perform the following
technologies. Once IoT devices are compromised, they often operations: (1) start a new project; (2) import the provided
become “bots” and are remotely controlled by attackers. dataset using the File Import function; (3) mark quasi-
Those IoT bots can be used to launch a variety of attacks that identifying attributes in the input data section; (4) create a
not only can damage system security, such as distributed hierarchy for each quasi-identifying attribute without a
denial-of-service (DDoS) attacks against legitimate servers, predefined hierarchy or import a hierarchy file for its
but also can breach data privacy, such as data theft or corresponding attribute; (5) create a k-anonymity privacy
espionage through compromised wireless routers and model with k set to 2 once all the hierarchies are set; (6)
Internet cameras. The famous Mirai botnet is such an customize certain configuration attributes in the general
example [37]. This module was designed to help students settings; (7) perform anonymization; (8) visualize and review
better understand how IoT security and privacy are attacked anonymization results; (9) analyze utility; and (10) generate
in practice by adapting the Mirai source code. a certificate file. A screenshot of the ARX interface after step
5 is depicted in Fig. 1.
Learning Outcomes: Students will be able to (1) describe
key facts about the Mirai malware, (2) explain Mirai operates
in both the infecting and the attacking phases, (3) explain
how to prevent the spread of Mirai, and (4) practice infecting
a simulated Internet of Things (IoT) device with provided
Mirai emulation executables.
Lab Design and Implementation: This lab was built using
modified Mirai source code, which was carefully designed to
preserve essential Mirai operations including infection and
spreading without the worry of security breach on the local
area network (LAN) or even Internet. The lab environment
consists of four virtual machines (VM): a command and
control (C&C) server VM, a loader VM, a router VM, and a
LAMP (Linux, Apache, MySQL, PHP/Perl/Python) server
Fig. 1. A Screenshot of Using ARX for Data Anonymization
VM. The C&C server VM is to control the execution of the
Mirai “botnet”, which is the router VM in our case. The
B. Web Tracking by Browser Fingerprinting
router VM is the machine to be infected and used to launch a
DoS attack on the LAMP server VM, which is a local web Our web tracking lab, depicted in Fig. 2, was developed
server accessible to all other VMs. The loader VM is used to on top of Dr. Wenliang Du’s SEED lab “Web Tracking” [38]
load Mirai onto the router VM. All the four VMs are placed which only introduces web cookies. Therefore, we focus on
in an internal network within VirtualBox to contain the the introduction of our browser fingerprinting lab and skip
spread of Mirai. The lab implemented important functions for the cookie part. During the hands-on activities, students are
operating a Mirai botnet including network scanning for required to take the following steps to understand the
victims, loading appropriate code to attack a victim, concepts, observe the results, and study the scripts: (1) visit
controlling a bot remotely, and launching an attack from a E-commerce websites and browse the products; (2) log into
bot. the MySQL database of the advertisement server, select the
database (named “revive adserver”), and open the record
Challenges: Students are required to follow instructions table (named “bt FingerprintLog”). After selecting each
to perform a sequence of command line operations on product, they will observe the record change in the table; (3)
different VMs to practice how an IoT (emulated) device can visit the E-commerce site (“www.wtlabelgg.com”) and
be compromised by Mirai and later used for spreading observe the product displayed in the banner area; (4) revisit
malware or launching an attack. an E-commerce website and open the source code of it to
examine the JavaScript code on how the browser fingerprint
is generated and how the product information is embedded;
(5) open each product page and examine the source code,
study how the web uses PHP script to pass the browser
fingerprint and product ID to a tracking script at the Before the workshop, participants were not well aware of
advertisement server side; (6) open the source code of the the session topics. The means of their responses to a 5 point
tracking script to examine how the browser fingerprint and scale was around 3.51, which meant they knew only a few
product information are inserted into the database; (7) revisit words about the session topics. After the workshop was
the social network and open its source code to study how the completed, participants’ response mean increased to 4.66,
script retrieve the historic records from the advertisement which meant that they knew the basic terms and could apply
server and display the result based on an analysis of the the concepts. Comparing the participants’ responses to the
visitor behaviors. Finally, students will be asked to pre- and post-workshop surveys, we found that participants
manipulate the scripts to display different private information statistically significantly improved their awareness of all the
of the user. seven session topics: Data Privacy (DP), Relationship
Privacy (RP), Image Privacy (IP), De-anonymization (DA),
V. EVALUATION IoT Security & Privacy (IoT), Location Privacy (LP), and
Some of the developed lab modules had been tested Web Tracking (WT). The means and standard deviations of
among students through some security courses that we taught the participants’ awareness of the workshop session topics
in the past few years [XX, YY, ZZ]. To disseminate the before and after the workshop, the paired sampled student t-
project outcomes and widely evaluate the lab modules, we test results, and cohen’s 𝑑𝑑 effect sizes are presented in Table
organized a faculty training workshop in summer 2022. We I. The comparisons of the pre versus post awareness found
believe that the well trained faculty can broaden the project statistically significant difference at p = 0.01 level. This
impact and engage more students into the fields of privacy means that participants greatly improved their awareness of
and cybersecurity. In this paper, we focus on the survey the session topics. Cohen’s d effect sizes close to 1 or bigger
analysis of the workshop participants. than 1 indicate a large group mean difference.
A total of 29 faculty from twenty institutions nationwide TABLE I. CHANGES IN PARTICIPANTS’
participated in the workshop, including 20 men (69%) and 9 AWARENESS OF THE TOPICS
women (31%). There were 10 Full Professors (34%), 8
Assistant Professor (28%), 6 Associate Professor (21%), 4 Lab Pre: µ (σ) Post: µ (σ) t df p-value d
lecturers (14%), and 1 part-time instructor (3%). 25
participants completed both the pre-workshop and post- DP 4.08 (0.79) 4.84 (0.36) 5.467 24 < 0.001 * 1.072
workshop surveys. Majority of the respondents were at the
Computer Science Department in their institutions. RP 3.44 (1.06) 4.72 (0.79) 6.799 24 < 0.001 * 1.333
Computer science and computer security were the two mostly
reported course titles among the courses participants have IP 3.32 (0.88) 4.56 (0.63) 6.972 24 < 0.001 * 1.367
taught. 17 of the participants (68%) reported their ethnicity
as Asian, 5 of them (20%) as African American, and 3 of DA 3.04 (1.25) 4.52 (0.81) 6.268 24 < 0.001 * 1.229
them (12%) as Caucasian.
TABLE II. CHANGES IN PARTICIPANTS’ 6 4.95 5.70 5.74 5.60 5.50 5.67 5.48
INTEREST IN TEACHING THE TOPICS
7 5.36 5.65 5.53 5.40 5.67 5.44 5.43
Lab Pre: µ (σ) Post: µ (σ) t df p-value d
The # within the Parenthesis Represents the Surveys Returned
DP 4.16 (0.96) 4.56 (0.69) 2.28 24 0.031 * 0.447