0% found this document useful (0 votes)

96 views18 pages

Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset

This document presents a new framework for privacy-preserving deep action recognition using adversarial learning. The framework aims to anonymize input videos such that target task performance is optimized while protecting privacy attributes. Two new strategies, model restarting and model ensemble, are proposed to achieve stronger universal privacy protection against any attacker models. The document also introduces a new dataset called PA-HMDB51 that contains annotated action labels and privacy attributes to facilitate research in this area.

Uploaded by

Jagannath Sethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views18 pages

Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset

Uploaded by

Jagannath Sethi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

1

Privacy-Preserving Deep Action Recognition:

An Adversarial Learning Framework
and A New Dataset
Zhenyu Wu*, Haotao Wang*, Zhaowen Wang, Hailin Jin, and Zhangyang Wang

Abstract—We investigate privacy-preserving, video-based action recognition in deep learning, a problem with growing importance in
smart camera applications. A novel adversarial training framework is formulated to learn an anonymization transform for input videos
such that the trade-off between target utility task performance and the associated privacy budgets is explicitly optimized on the
arXiv:1906.05675v6 [cs.CV] 21 Mar 2021

anonymized videos. Notably, the privacy budget, often defined and measured in task-driven contexts, cannot be reliably indicated using
any single model performance because strong protection of privacy should sustain against any malicious model that tries to steal
private information. To tackle this problem, we propose two new optimization strategies of model restarting and model ensemble to
achieve stronger universal privacy protection against any attacker models. Extensive experiments have been carried out and analyzed.
On the other hand, given few public datasets available with both utility and privacy labels, the data-driven (supervised) learning
cannot exert its full power on this task. We first discuss an innovative heuristic of cross-dataset training and evaluation, enabling the
use of multiple single-task datasets (one with target task labels and the other with privacy labels) in our problem. To further address
this dataset challenge, we have constructed a new dataset, termed PA-HMDB51, with both target task labels (action) and selected
privacy attributes (skin color, face, gender, nudity, and relationship) annotated on a per-frame basis. This first-of-its-kind video dataset
and evaluation protocol can greatly facilitate visual privacy research and open up other opportunities. Our codes, models, and the
PA-HMDB51 dataset are available at: https:// github.com/ VITA-Group/ PA-HMDB51.

Index Terms—Visual privacy, action recognition, privacy-preserving learning, adversarial learning.

1 I NTRODUCTION

S MART surveillance or smart home cameras, such as

Amazon Echo and Nest Cam, are now found in mil-
lions of locations to link users to their homes or offices
to address it.
We ask if it is possible to alleviate privacy concerns
without compromising user convenience. At first glance,
remotely. They provide the users with a monitoring service the question itself is posed as a dilemma: we would like
by notifying environment changes, a lifelogging service, and a camera system to recognize important events and assist
intelligent assistance. However, the benefits come at the daily human life by understanding its videos while prevent-
heavy price of privacy intrusion from time to time. Due to ing it from obtaining sensitive visual information (such as
the computationally demanding nature of visual recognition faces, gender, skin color, etc.) that can intrude individual
tasks, only some of the tasks can run on the resource-limited privacy. Thus, it becomes a new and appealing problem to
local devices, which makes transmitting (part of) data to the find an appropriate transform to obfuscate the captured raw
cloud necessary. Growing concerns have been raised [1]– visual data at the local end, so that the transformed data
[5] towards personal data uploaded to the cloud, which will only enable specific target utility tasks while obstruct-
could be potentially misused or stolen by malicious third- ing undesired privacy-related budget tasks. Recently, some
parties. Many laws and regulations in the United States new video acquisition approaches [10]–[12] were proposed
and the European Union [6]–[9] also bring up guidelines for to intentionally capture or process videos in extremely
handling Personally Identifiable Information. This new privacy low-resolution to create privacy-preserving “anonymized”
risk is fundamentally different from traditional concerns videos and showed promising empirical results.
over unsecured transmission channels (e.g., malicious third- This paper takes one of the first steps towards address-
party eavesdropping), and therefore requires new solutions ing this challenge of privacy-preserving, video-based action
recognition, via the following contributions:
• Zhenyu Wu is with the Department of Computer Science and Engi- • A General Adversarial Training and Evaluation Frame-
neering, Texas A&M University, College Station, TX, 77840. E-mail:
wuzhenyu sjtu@tamu.edu
work. We address the privacy-preserving action recog-
• Haotao Wang and Zhangyang Wang are with the Department of Electrical nition problem with a novel adversarial training frame-
and Computer Engineering, The University of Texas at Austin, TX, work. The framework explicitly optimizes the trade-off
78712. E-mail: {htwang, atlaswang}@utexas.edu between target utility task performance and the associated
• Zhaowen Wang and Hailin Jin are with Adobe Research, San Jose, CA,
95110. E-mail: {zhawang, hljin}@adobe.com privacy budgets by learning to anonymize the original
• The first two authors Zhenyu Wu and Haotao Wang contributed equally videos. To reduce the training instability (as discussed
to this work. in Section 3.1), we design and compare three different
• Correspondence to Zhangyang Wang (atlaswang@utexas.edu).
optimization strategies. We empirically find one strategy
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 2

generally outperforms the others under our framework unauthorized access from attackers. However, they are not
and give intuitive explanations. immediately applicable to preventing authorized agents
• Practical Approximations of “Universal” Privacy Pro- (such as the back-end analytics) from the unauthorized
tection. The privacy budget in our framework cannot be abuse of information, causing privacy breach concerns. A
defined w.r.t. one model that predicts privacy attributes. few encryption-based solutions, such as Homomorphic En-
Instead, the ideal protection of privacy must be univer- cryption (HE) [20], [21], were developed to locally encrypt
sal and model-agnostic, i.e., preventing every possible visual information. The server can only get access to the
attacker model from predicting private information. To encrypted data and conduct a utility task on it. However,
resolve this so-called “∀ challenge”, we propose two ef- many encryption-based solutions will incur high computa-
fective strategies, i.e., restarting and ensembling, to enhance tional costs at local platforms. It is also challenging to gen-
the generalization capability of the learned anonymiza- eralize the cryptosystems to more complicated classifiers.
tion to defend against unseen models. We leave it as our Chattopadhyay et al. [22] combined the detection of regions
future work to find better methods for this challenge. of interest and the real encryption techniques to improve
privacy while allowing general surveillance to continue.
• A New Dataset with Action and Privacy Annotations.
When it comes to evaluating privacy protection on com- Anonymization by Empirical Obfuscations. An alternative
plicated privacy attributes, there is no off-the-shelf video approach towards a privacy-preserving vision system is
dataset with both action (utility) and privacy attributes based on the concept of anonymized videos. Such videos
annotated, either for training or testing. Such a dataset are intentionally captured or processed by empirical obfus-
challenge is circumvented in our previous work [13] by cations to be in special low-quality conditions, which only
using the VISPR [14] dataset as an auxiliary dataset to allow for recognizing some target events or activities while
provide privacy annotations for cross-dataset evaluation avoiding the unwanted leak of the identity information for
(details in Section 3.6). However, this protocol inevitably the human subjects in the video.
suffers from the domain gap between the two datasets: Ryoo et al. [10] showed that even at the extremely low
while the utility was evaluated on one dataset, the privacy resolutions, reliable action recognition could be achieved
was measured on a different dataset. The incoherence in by learning appropriate downsampling transforms, with
utility and privacy evaluation datasets makes the obtained neither unrealistic activity-location assumptions nor extra
utility-privacy trade-off less convincing. To reduce this specific hardware resources. The authors empirically veri-
gap, in this paper, we construct the very first testing fied that conventional face recognition easily failed on the
benchmark dataset, dubbed Privacy-Annotated HMDB51 generated low-resolution videos. Butler et al. [11] used im-
(PA-HMDB51), to evaluate privacy protection and action age operations like blurring and superpixel clustering to get
recognition on the same videos simultaneously. The new anonymized videos, while Dai et al. [12] used extremely low
dataset consists of 515 videos originally from HMDB51. resolution (e.g., 16×12) camera hardware to get anonymized
For each video, privacy labels (five attributes: skin color, videos. Winkler et al. [23] used cartoon-like effects with a
face, gender, nudity, and relationship) are annotated on a customized version of mean shift filtering. Wang et al. [24]
per-frame basis. We benchmark our proposed framework proposed a lens-free coded aperture (CA) camera system,
on the new dataset and justify its effectiveness. producing visually unrecognizable and unrestorable im-
The paper is built upon our prior work [13] with multiple age encodings. Pittaluga & Koppal [25], [26] proposed to
improvements: (1) a detailed discussion and comparison on use privacy-preserving optics to filter sensitive information
three optimization strategies for the proposed framework; from the incident light-field before sensor measurements
(2) a more extensive experimental and analysis section; are made, by k -anonymity and defocus blur. Earlier work
and (3) most importantly, the construction of the new PA- of Jia et al. [27] explored privacy-preserving tracking and
HMDB51 dataset, together with the associated benchmark coarse pose estimation using a network of ceiling-mounted
results. time-of-flight low-resolution sensors. Tao et al. [28] adopted
a network of ceiling-mounted binary passive infrared sen-
2 R ELATED W ORK sors. However, both works [27], [28] handled only a limited
set of activities performed at specific constrained areas in
2.1 Privacy Protection in Computer Vision the room.
With pervasive cameras for surveillance or smart home The usage of low-quality anonymized videos by obfus-
devices, privacy-preserving action recognition has drawn cations was computationally cheap and compatible with
increasing interests from both industry and academia. sensor’s bandwidth constraints. However, the proposed ob-
Transmitting Feature Descriptors. A seemingly reasonable fuscations were not learned towards protecting any visual
and computationally cheaper option is to extract feature privacy, thus having limited effects. In other words, pri-
descriptors from raw images and transmit those features vacy protection came as a “side product” of obfuscation,
only. Unfortunately, previous studies [15]–[19] revealed that and was not a result of any optimization, making the pri-
considerable details of original images could still be recov- vacy protection capability very limited. What is more, the
ered from standard HOG, SIFT, LBP, 3D point clouds, Bag- privacy-preserving effects were not carefully analyzed and
of-Visual-Words or neural network activations (even if they evaluated by human study or deep learning-based privacy
look visually distinctive from natural images). recognition approaches. Lastly, none of the aforementioned
Homomorphic Cryptographic Solutions. Most classical empirical obfuscations extended their efforts to study deep
cryptographic solutions secure communication against learning-based action recognition, making their task per-

2
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 3

formance less competitive. Similarly, the recent progress The visual privacy issues faced by blind people were
of low-resolution object recognition [29]–[31] also put their revealed in [46] with the first dataset in this area. Concrete
privacy protection effects in jeopardy. privacy attributes were defined in [14] with their correla-
Learning-based Solutions. Very recently, a few learning- tion with image content. The authors categorized possible
based approaches have been proposed to address privacy private information in images, and they ran a user study to
protection or fairness problems in vision-related tasks [13], understand privacy preferences. They then provided a siz-
[32]–[40]. Many of them exploited ideas from adversarial able set of 22k images annotated with 68 privacy attributes,
learning. They addressed this problem by learning data on which they trained privacy attributes predictors.
representations that simultaneously reduce the budget cost
of privacy or fairness while maintaining the utility task
performance. 3 M ETHOD
Wu et al. [32] proposed an adversarial training frame-
3.1 Problem Definition
work dubbed Nuisance Disentangled Feature Transform
(NDFT) to utilize the free meta-data (i.e., altitudes, weather Objective. Assume our training data X (raw visual data
conditions, and viewing angles) in conjunction with associ- captured by camera) are associated with a target utility task
ated UAV images to learn domain-robust features for object T and a privacy budget B . Since T is usually a supervised
detection in UAV images. Pittaluga et al. [34] preserved the task, e.g., action recognition or visual tracking, a label set
utility by maintaining the variance of the encoding or favor- YT is provided on X , and a standard cost function LT (e.g.,
ing a second classifier for a different attribute in training. cross-entropy) is defined to evaluate the task performance
Bertran et al. [35] motivated the adversarial learning frame- on T . Usually, there is a state-of-the-art deep neural network
work as a distribution matching problem and defined the fT , which takes X as input and predicts the target labels.
objective and the constraints in mutual information. Roy & On the other hand, we need to define a budget cost function
Boddeti [36] measured the uncertainty in the privacy-related JB to evaluate its input data’s privacy leakage: the smaller
attributes by the entropy of the discriminator’s prediction. JB (·) is, the less private information its input contains.
Oleszkiewicz et al. [41] proposed an empirical data-driven ∗
We seek an optimal anonymization function fA to trans-
privacy metric based on mutual information to quantify the ∗
form the original X to anonymized visual data fA (X), and
privatization effects on biometric images. Zhang et al. [37] an optimal target model fT∗ such that:
presented an adversarial debiasing framework to mitigate • fA∗ has filtered out the private information in X , i.e.,
the biases concerning demographic groups. Ren et al. [38]
learned a face anonymizer in video frames while main-
taining the action detection performance. Shetty et al. [39] JB (fA∗ (X)) JB (X);
presented an automatic object removal model that learns
• the performance of fT is minimally affected when using
how to find and remove objects from general scene images ∗
the anonymized visual data fA (X) compared to when
via a generative adversarial network (GAN) framework.
using the original data X , i.e.,

2.2 Privacy Protection in Social Media/Photo Sharing LT (fT∗ (fA∗ (X)), YT ) ≈ minfT LT (fT (X), YT ).
User privacy protection is also a topic of extensive inter-
est in the social media field, especially for photo sharing. To achieve these two goals, we mathematically formulate
The most common means to protect user privacy in an the problem as solving the following optimization problem:
uploaded photo is to add empirical obfuscations, such as
fA∗ , fT∗ = argmin[LT (fT (fA (X)), YT ) + γJB (fA (X))]. (1)
blurring, mosaicing, or cropping out certain regions (usually Af ,f
T
faces) [42]. However, extensive research showed that such
an empirical approach could be easily hacked [43], [44]. Definition of JB and LT . The definition of the privacy
A recent work [45] described a game-theoretical system in budget cost JB is not straightforward. Practically, it needs
which the photo owner and the recognition model strive to be placed in concrete application contexts, often in a task-
for antagonistic goals of disabling recognition, and better driven way. For example, in smart workplaces or smart
obfuscation ways could be learned from their competition. homes with video surveillance, one might often want to
However, their system was only designed to confuse one avoid disclosure of the face or identity of persons. Therefore,
specific recognition model via finding its adversarial per- to reduce JB could be interpreted as to suppress the success
turbations. Fooling only one recognition model can cause rate of identity recognition or verification. Other privacy-
obvious overfitting as merely changing to another recog- related attributes, such as race, gender, or age, can be simi-
nition model will likely put the learning efforts in vain: larly defined too. We denote the privacy-related annotations
such perturbations cannot even protect privacy from human (such as identity label) as YB , and rewrite JB (fA (X)) as
eyes. The problem setting in [45] thus differs from our target JB (fB (fA (X)), YB ), where fB denotes the privacy budget
problem. Another notable difference is that we usually hope model which takes (anonymized or original) visual data
to generate minimum perceptual quality loss to photos as input and predicts the corresponding private informa-
after applying any privacy-preserving transform to them tion. Different from LT , minimizing JB will encourage
in social photo sharing. There is no such restriction in our fB (fA (X)) to diverge from YB . Without loss of generality,
scenario. We can apply a much more flexible and aggressive we assume both fT and fB to be classification models and
transformation to the image. output class labels. Under this assumption, we choose both

3
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 4

LT and LB as the cross-entropy function, and JB as the minimax objectives like those in GANs, which are often
negative cross-entropy function: interpreted as a two-party competition game. In contrast,
our Eq. (3) is more “hybrid” and can be interpreted as a
JB , −H(YB , fB (fA (X))),
more complicated three-party competition, where (adopting
where H(·, ·) is the cross-entropy function. machine learning security terms) fA is an obfuscator, fT
Two Challenges. Such a supervised, task-driven definition is a utilizer collaborating with the obfuscator, and fB is
of JB poses at least two challenges: (1) Dataset challenge: an attacker trying to breach the obfuscator. Therefore, we
The privacy budget-related annotations, denoted as YB , see no obvious best choice from the off-the-shelf minimax
often have less availability than target utility task labels. algorithms to achieve our objective.
Specifically, it is often challenging to have both YT and YB We are thus motivated to try different state-of-the-
available on the same X ; (2) ∀ challenge: Considering the art minimax optimization algorithms on our framework.
nature of privacy protection, it is not sufficient to merely We tested two state-of-the-art minimax optimization algo-
suppress the success rate of one fB model. Instead, we rithms, namely GRL [48] and K -Beam [47], on our frame-
define a privacy prediction function family work and proposed an innovative entropy maximization
method to solve Eq. (3). We empirically show our entropy
P : fA (X) 7→ YB , maximization algorithm outperforms both state-of-the-art
so that the ideal privacy protection by fA should be reflected minimax optimization algorithms and discuss its advan-
as suppressing every possible model fB from P . That differs tages. In Section 3.3, we present the comparison of three
from the common supervised training goal, where only one methods and hope it will benefit future research on similar
model needs to be found to fulfill the target utility task problems.
successfully.
We address the dataset challenge by two ways: (1) cross 3.2 Basic Framework
dataset training and evaluation (Section 3.4); and more Pipeline. Our framework is a privacy-preserving action
importantly (2) building a new dataset annotated with both recognition pipeline that uses video data as input. It is
utility and privacy labels (Section 5). We defer their discus- a prototype of the in-demand privacy protection in smart
sion to respective experimental paragraphs. camera applications. Figure 1 depicts the basic framework
Handling the ∀ challenge is more challenging. Firstly, we implementing the proposed formulation in Eq. (3). The
re-write the general form in Eq. (1) with the task-driven framework consists of three parts: the anonymization model
definition of JB as follows: fA , the target utility model fT , and the privacy budget
fA∗ , fT∗ = argmin(fA ,fT ) [LT (fT (fA (X)), YT )+ model fB . fA takes raw video X as input, filters out pri-
(2) vate information in X , and outputs the anonymized video
γ supfB ∈P JB (fB (fA (X)), YB )].
fA (X). fT takes fA (X) as input and carries out the target
The ∀ challenge is the infeasibility to directly solve Eq. (2), utility task. fB also take fA (X) as input and try to predict
due to the infinite search space of fB in P . Secondly, we the private information from fA (X). All three models are
propose to solve the following approximate problem by implemented with deep neural networks, and their param-
setting fB as a neural network with a fixed structure: eters are learnable during the training procedure. The entire
fA∗ , fT∗ = argmin(fA ,fT ) [LT (fT (fA (X)), YT )+ pipeline is trained under the guidance of the hybrid loss
(3) of LT and JB . The training procedure has two goals. The
γ maxfB JB (fB (fA (X)), YB )]. ∗
first goal is to find an optimal anonymization model fA
Lastly, we propose “model ensemble” and “model restart- that can filter out the private information in the original
ing” (Section 3.5) to handle the ∀ challenge better and boost video while keeping useful information for the target utility
the experimental results further. task. The second goal is to find a target model that can
Considering the ∀ challenge, the evaluation protocol for achieve good performance on the target utility task using
∗
privacy-preserving action recognition is more intricate than anonymized videos fA (X). Similar frameworks have been
traditional action recognition task. We propose a two-step used in feature disentanglement [49]–[52]. After training,
∗
protocol (as described in Section 3.6) to evaluate fA and the learned anonymization model can be applied on a local
fT∗ on the trade-off they have achieved between target task device (e.g., smart camera), by designing an embedded
utility and privacy protection budget. chipset responsible for the anonymization at the hardware-
Solving the Minimax. Solving Eq. (3) is still challenging level [38]. We can convert raw video to anonymized video
because the minimax problem is hard by its nature. Tra- locally and only transfer the anonymized video through the
ditional minimax optimization algorithms based on alter- Internet to the backend (e.g., cloud) for target utility task
nating gradient descent can only find minimax points for analysis. The private information in the raw videos will be
convex-concave problems, and they achieve sub-optimal unavailable on the backend.
solutions on deep neural networks since they are neither Implementation. Specifically, fA is implemented using the
convex nor concave. Some very recent minimax algorithms, model in [53], which can be taken as a 2D convolution-based
such as K -Beam [47], have been shown to be promising frame-level filter. In other words, fA converts each frame
in none convex-concave and deep neural network appli- in X into a feature map of the same shape as the original
cations. However, these methods rely on heavy parameter frame. We use state-of-the-art human action recognition
tuning and are effective only in limited situations. Besides, model C3D [54] as fT and state-of-the-art image classifica-
our optimization goal in Eq. (3) is even harder than common tion models, such as ResNet [55] and MobileNet [56], as fB .

4
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 5

GRL [48] is a state-of-the-art algorithm to solve such a

minimax problem. The underlying mathematical gist is to
solve the problem by alternating minimization:

θA ← θA − αA ∇θA (LT (θA , θT ) − γLB (θA , θB )), (4a)
Anonymized θT ← θT − αT ∇θT LT (θA , θT ), (4b)
Video θB ← θB − αB ∇θB LB (θA , θB ). (4c)
We denote this method as GRL in the following parts
Target Utility Privacy Budget and give the details in Algorithm 1.
Task: Task:
Algorithm 1: GRL Algorithm
1 Initialize θA , θT and θB ;
2 for t0 ← 1 to max iter do
Raw Video Learned
3 Update θA using Eq. (4a)
Anonymization
4 while Acc(fT (fA (X v )), YTv ) ≤ thT do
Fig. 1: Basic adversarial training framework for privacy-preserving 5 Update θT using Eq. (4b)
action recognition. 6 end
7 while Acc(fB (fA (X t )), YBt ) ≤ thB do
8 Update θB using Eq. (4c)
Since the action recognition model we use is C3D, we need 9 end
to split the videos into clips with a fixed frame number. 10 end
Each clip is a 4D tensor of shape [T, W, H, C], where T is
the number of frames in each clip and W , H , C are the
width, height, and the number of color channels in each 3.3.2 Alternating optimization of two loss functions
frame respectively. Unlike fT , which takes a 4D tensor as The goal in Eq. (3) can also be formulated as alternatingly
an input data sample, fB takes a 3D tensor (i.e., a frame) as solving the following two optimization problems:
input. We average1 the logits over the temporal dimension ∗
of each video clip to calculate JB and predict the budget θA , θT∗ = argmin(θA ,θT ) LT (θA , θT ), (5a)
∗ ∗
task label. θB , θA = argminθB argmaxθA LB (θA , θB ). (5b)
Eq. (5a) is a standard minimization problem which can
be solved by end-to-end training fA and fT . Eq. (5b) is a
3.3 Optimization Strategies minimax problem which we solve by state-of-the-art mini-
In the following algorithms, we denote θB as the pa- max optimization method “K -Beam” [47]. K -Beam method
rameters of fB . Similarly, fA and fT are parameter- keeps track of K different sets of budget model parameters
i K
ized by θA and θT respectively. αA , αB , αT are learn- (denoted as {θB }i=1 ) during training time, and alternatingly
i K
ing rates used to update θA , θB , θT . thB and thT updates θA and {θB }i=1 .
are accuracy thresholds for the target utility task and Inspired by K -Beam method, we present the following
the privacy budget prediction. max iter is the max- parameter update rules to alternatingly solve the two loss
imum number of iterations. For simplicity concern, functions in Eq. (5):
we abbreviate LT (fT (fA (X)), YT ), LB (fB (fA (X)), YB ), θA , θT ← θA , θT − αT ∇(θT ,θA ) LT (θA , θT ), (6a)
and JB (fB (fA (X)), YB ) as LT (θA , θT ), LB (θA , θB ), and i
JB (θA , θB )2 respectively. Acc is a function to compute ac- j ← argmini∈{1,...,K} LB (θA , θB ), (6b-i)
j
curacy on the privacy budget and the target utility tasks, θA ← θA + αA ∇θA LB (θA , θB ), (6b-ii)
given training data (X t , YBt and YTt ) or validation data (X v , i
θB ← θBi i
− αB ∇θBi LB (θA , θB ), ∀i ∈ {1, . . . , K}. (6c)
YBv and YTv ).
We denote this method as Ours-K -Beam in the follow-
ing parts and give the details in Algorithm 2, where d iter
3.3.1 Gradient reverse layer (GRL) is the number of iterations used in the step of maximizing
We consider Eq. (3) as a minimax problem [57]: LB .
3.3.3 Maximize entropy
∗
θA , θT∗ = ∗
argmin(θA ,θT ) L(θA , θT , θB ), We empirically find that minimizing negative cross-entropy
∗ ∗ ∗
JB , which is a concave function, causes numerical instabili-
θB = argmaxθB L(θA , θT , θB ), ties in Eq. (4a). So, we replace JB with −HB , the negative
entropy of fB (fA (X)), which is a convex function3 :
where L(θA , θT , θB ) = LT (θA , θT ) + γJB (θA , θB ) =
LT (θA , θT ) − γLB (θA , θB ). HB (fB (fA (X))) , H(fB (fA (X))),
3. This point discusses the convexity or concavity of different loss
1. AVERAGING the logits temporally gave a better performance in functions when viewing them as the outermost function in the com-
privacy budget prediction of JB , compared with MAXING the logits. posite function. Both loss functions are neither convex nor concave
2. Remember that JB is the negative cross-entropy by definition. w.r.t. model weights.

5
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 6

Algorithm 2: Ours-K -Beam Algorithm privacy attributes, we run into the dataset challenge: in
i K the literature, no existing datasets have both human action
1 Initialize θA , θT and {θB }i=1 ;
labels and privacy attributes provided on the same videos.
2 for t0 ← 1 to max iter do
Given the observation that a privacy attributes predictor
3 /*LT step:*/
trained on VISPR can correctly identify privacy attributes
4 while Acc(fT (fA (X v )), YTv ) ≤ thT do
occurring in UCF101 and HMDB51 videos (examples in the
5 Update θT , θA using Eq. (6a)
Appendix C), we hypothesize that the privacy attributes
6 end
have good “transferability” across UCF101/HMDB51 and
7 /*LB Max step:*/
VISPR. Therefore, we can use a privacy prediction model
8 Update j using Eq. (6b-i)
trained on VISPR to assess the privacy leak risk on
9 for t1 ← 1 to d iter do
UCF101/HMDB51.
10 Update θA using Eq. (6b-ii)
In view of that, we propose to use cross-dataset training
11 end
and evaluation as a workaround method. In brief, we train
12 /*LB Min step:*/
action recognition (target utility task) on human action
13 for i ← 1 to K do
i datasets, such as UCF101 [58] and HMDB51 [59], and train
14 while Acc(fB (fA (X t )), YBt ) ≤ thB do
i privacy protection (budget task) on visual privacy dataset
15 Update θB using Eq. (6c)
VISPR [14], while letting the two interact via their shared
16 end
component - the learned anonymization model. More specif-
17 end
ically, during training, we have two pipelines: one is fA and
18 end
fT trained on UCF101 or HMDB51 for action recognition;
the other is fA and fB trained on VISPR to suppress multi-
ple privacy attribute prediction. The two pipelines share the
where H(·) is the entropy function. Minimizing −HB is same parameters for fA . During the evaluation, we evaluate
equivalent to maximizing entropy, which will encourage model utility (i.e., action recognition) on the testing set of
“uncertain” predictions. We replace JB in Eq. (4a) by −HB , UCF101 or HMDB51 and privacy protection performance
abbreviate HB (fB (fA (X))) as HB (θA , θB ), and propose the on the testing set of VISPR. Such cross-dataset training
following new update scheme: and evaluation shed new possibilities on training privacy-
preserving recognition models, even under the practical
θA ← θA − αA ∇θA (LT (θA , θT ) − γHB (θA , θB )), (7a)
shortages of datasets that have been annotated for both
θT , θA ← θT , θA − αT ∇θT ,θA LT (θA , θT ), (7b) tasks. Notably, “cross-dataset training” and “cross-dataset
θB ← θB − αB ∇θB LB (θA , θB ), (7c) testing (or evaluation)” are two independent strategies used
in this paper; they can be used either together or sepa-
where LT and LB are still cross-entropy loss functions as
rately. Details of our three experiments (SBU, UCF-101, and
in Eq. (4). Unlike in Eq. (4b), where we only update θT
HMDB51) are explained as follows:
when minimizing LT , we train θT and θA in an end-to-end
• SBU (Section 4.1): we train and evaluate our framework
manner as shown in Eq. (7b), since we find it achieves better
on the same video set by considering actor identity as
performance in practice.
a simple privacy attribute. Neither cross-training nor cross-
We denote this method as Ours-Entropy in the following
evaluation is involved.
parts and give the details in Algorithm 3.
• UCF101 (Section 4.2): we perform both cross-training and
cross-evaluation, on UCF-101 + VISPR. Such a method
Algorithm 3: Ours-Entropy Algorithm provides an alternative to flexibly train and test privacy-
1 Initialize θA , θT and θB ; preserving video recognition for different utility/privacy
2 for t0 ← 1 to max iter do combinations, without annotating specific datasets.
3 Update θA using Eq. (7a) • HMDB51 (Section 5.5), we use cross-training on HMDB51
4 while Acc(fT (fA (X v )), YTv ) ≤ thT do + VISPR datasets similarly to the UCF-101 experiment;
5 Update θT , θA using Eq. (7b) but for testing, we evaluate both utility and privacy
6 end performance on the same, newly-annotated PA-HMDB51
7 while Acc(fB (fA (X t )), YBt ) ≤ thB do testing set. Therefore, it involves cross-training, but no
8 Update θB using Eq. (7c) cross-evaluation.
9 end Beyond the above initial attempt, we further construct
10 end a new dataset dedicated to the privacy-preserving action
recognition task, which will be presented in Section 5.

3.5 Addressing the ∀ Challenge by Privacy Budget

3.4 Addressing the Dataset Challenge by Cross- Model Restarting and Ensemble
Dataset Training and Evaluation: An Initial Attempt To improve the generalization ability of learned fA over all
An ideal dataset to train and evaluate our framework would possible fB ∈ P (i.e., privacy cannot be reliably predicted
be a set of human action videos with both action labels by any model), we hereby discuss two simple and easy-
and privacy attributes provided. On the SBU dataset, we to-implement options. Other more sophisticated model re-
can use the actor pair as a simple privacy attribute. But sampling or model search approaches, such as [60], will be
when we want to evaluate our method on more complex explored in future work.

6
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 7
i
3.5.1 Privacy Budget Model Restarting and HB (θA , θB ) respectively. The new parameter updating
scheme is:
Motivation. The max step over JB (fB (fA (X)), YB ) in
i
Eq. (3) leads to the optimizer being stuck in bad local θA ← θA − αA ∇θA (LT + γ maxθBi ∈P̄t −HB (θA , θB )), (9a)
solutions (similar to “mode collapse” in GANs), that will θA , θT ← θA , θT − αT ∇(θA ,θT ) LT (θA , θT ), (9b)
hurdle the entire minimax optimization. Model restarting i i i
θB ← θB − αB ∇θBi LB (θA , θB ), ∀i ∈ {1, . . . , M }. (9c)
provides a mechanism to “bypass” the bad solution when
it occurs, thus enabling the minimax optimizer to explore That’s to say, we only suppress the model fB i
with the
better solution. largest privacy leakage −HB , i.e., the “most confident” one
Approach. At certain point of training (e.g., when the about its current privacy prediction, when updating the
privacy budget LB (fB (fA (X)), YB ) stops decreasing any anonymization model fA . But we still update all M budget
further), we re-initialize fB with random weights. Such a models on the budget task. The formal description of Ours-
random restarting aims to avoid trivial overfitting between Entropy with model restarting and ensemble is given in
i M
fB and fA (i.e., fA is only specialized at confusing the cur- Algorithm 4, where {θB }i=1 is reinitialized every rstrt iter
rent fB ), without requiring more parameters. We then start iterations. Likewise, GRL and Our-K -Beam can also be
to train the new model fB to be a strong competitor, w.r.t. incorporated with restarting and ensemble, whoses details
the current fA (X): specifically, we freeze the training of are shown in Appendix A.
fA and fT , and change to minimizing LB (fB (fA (X)), YB ),
until the new fB has been trained from scratch to become Algorithm 4: Ours-Entropy Algorithm (with Model
a strong privacy prediction model over current fA (X). We Restarting and Model Ensemble)
then resume adversarial training by unfreezing fA and fT , i M
1 Initialize θA , θT and {θB }i=1 ;
as well as switching the loss for fB back to the adversarial
2 for t0 ← 1 to max iter do
loss (negative entropy or negative cross-entropy). Such a
3 if t ≡ 0 (mod rstrt iter) then
random restarting can repeat multiple times. i M
4 Reinitialize {θB }i=1
5 end
3.5.2 Privacy Budget Model Ensemble 6 Update θA using Eq. (9a)
7 while Acc(fT (fA (X v )), YTv ) ≤ thT do
Motivation. Ideally in Eq. (3) we should maximize the error 8 Update θT , θA using Eq. (9b)
over the “current strongest possible” attacker fA from P 9 end
(a large and continuous fB family), over which search- 10 for i ← 1 to M do
i
ing/sampling is impractical. Therefore we propose a pri- 11 while Acc(fB (fA (X t )), YBt ) ≤ thB do
i
vacy budget model ensemble as an approximation strategy, 12 Update θB using Eq. (9c)
where we approximate the continuous P with a discrete set 13 end
of M sample functions. Such a strategy is empirically veri- 14 end
fied in Section 4 and 5 to address the critical “∀ Challenge” 15 end
in privacy protection, i.e., enhancing the defense against
unseen attacker models (compared to the clear “attacker
overfitting” phenomenon when sticking to one fA during
training). 3.6 Two-Step Evaluation Protocol
∗
Approach. Given the budget model ensemble P̄t , The solution to Eq. (2) gives an anonymization model fA
∗
{fBi }M
i=1 , where M is the number of fB s in the ensemble
and a target utility task model fT . We need to evaluate
during training, we turn to minimize the following dis- fA∗ and fT∗ on the trade-off they have achieved between
cretized surrogate of Eq. (2): target task utility and privacy protection in two steps: (1)
whether the learned target utility task model maintains sat-
fA∗ , fT∗ = argminfA ,fT [LT (fT (fA (X)), YT )+ isfactory performance on anonymized videos; (2) whether
(8) the performance of an arbitrary privacy prediction model on
γ maxfBi ∈P̄t JB (fBi (fA (X)), YB )].
anonymized videos will deteriorate.
Suppose we have a training dataset X t with target
The previous basic framework is a special case of Eq. (8) and budget task ground truth labels YTt and YBt , and an
with M = 1. The ensemble strategy can be easily incorpo- evaluation dataset X e with target and budget task ground
rated with restarting. truth labels YTe and YBe . In the first step, when evaluating the
target task utility, we should follow the traditional routine:
3.5.3 Incorporate Budget Model Restarting and Budget compare fT∗ (fA ∗
(X e )) with YTe to get the evaluation accuracy
Model Ensemble with Ours-Entropy on the target utility task, denoted as AT , which we expect to
be as high as possible. In the second step, when evaluating
Budget Model Restarting and Budget Model Ensemble can the privacy protection, it is insufficient if we only observe
∗ ∗
be easily incorporated with all three optimization schemes that the learned fA and fB lead to poor classification ac-
e
described in Section 3.3. We take Ours-Entropy as an ex- curacy on X , because of the ∀ challenge: the attacker can
ample here. When model ensemble is used, we abbreviate select any privacy budget model to steal private information
LB (fBi (fA (X)), YB ) and HB (fBi (fA (X))) as LB (θA , θB
i
) from anonymized videos fA ∗
(X e ). To empirically verify that
7
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 8

fA∗ prohibits reliable privacy prediction for other possible • Box-Black-Body: Boxing and blackening bodies.
budget models, we propose a novel procedure: • Seg-Black-Face: Segmenting and blackening faces.
• We randomly re-sample N privacy budget prediction • Seg-Black-Body: Segmenting and blackening bodies.
i N Box-Blur-Face: Boxing and blurring faces.
models P̄e , {fB }i=1 from P for evaluation. Note that •
these N models used in evaluation P̄e have no overlap • Box-Blur-Body: Boxing and blurring bodies.
with the M privacy budget model ensemble P̄t used in • Seg-Blur-Face: Segmenting and blurring faces.
training, i.e., P̄e ∩ P̄t = ∅. • Seg-Blur-Body: Segmenting and blurring bodies.
• We train these N models P̄e on anonymized training Our Proposed Approaches. The previous two groups
∗
videos fA (X t ) to make correct predictions on private of baselines are compared with our proposed three ap-
i
information, i.e., minfBi LB (fB (fA∗ (X t )), YBt ), ∀fBi ∈ P̄e . proaches:
∗
Note that fA is fixed during this training procedure. • GRL: as described in Section 3.3.1.
i
• After that, we apply each fB on anonymized evaluation • Ours-K -Beam: as described in Section 3.3.2. We have tried
∗ e i
videos fA (X ) and compare the outputs fB (fA∗ (X e )) K = 1, 2, 4, 8.
with YBe to get privacy budget accuracy of the i-th budget • Ours-Entropy: as described in Section 3.3.3. In the privacy
i
model, i.e., Acc(fB (fA∗ (X e )), YBe ). budget model ensemble P̄t , the M models are chosen
• We select the highest accuracy among all N privacy bud- from MobileNet-V2 [62] family with different width mul-
get models and use it as the final privacy budget accuracy tipliers. We have tried M = 1, 2, 4, 8.
ANB , which we expect to be as low as possible. Specifically, All three approaches are evaluated with and without pri-
we have vacy budget model restarting.
AN i ∈P̄ Acc(f (f (X )), Y ).
B = maxfB e
i
B A
∗ e e
B (10) Evaluation. In the two-step evaluation (as described in
Section 3.6), we have used N = 10 different state-of-the-
art classification networks, namely ResNet-V1-{50,101} [55],
4 S IMULATION E XPERIMENTS
ResNet-V2-{50,101} [63], Inception-V1 [64], Inception-
We show the effectiveness of our framework on privacy- V2 [65], and MobileNet-V1-{0.25,0.5,0.75,1} [56], as P̄e . Note
preserving action recognition on existing datasets. that P̄e ∩ P̄t = ∅. All detailed numerical results reported in
Overview of Experiment Settings. The target utility task following sections can be found in Appendix B.
is human action recognition, since it is a highly demanded
feature in smart home and smart workplace applications. 4.1 Identity-Preserving Action Recognition on SBU:
Experiments are carried out on three widely used human Single-Dataset Training
action recognition datasets: SBU Kinect Interaction Dataset
We compare our proposed approaches with the groups of
[61], UCF101 [58] and HMDB51 [59]. The privacy budget
baseline approaches to show our methods’ significant su-
task varies in different settings. In the SBU dataset exper-
periority in balancing privacy protection and model utility.
iments, the privacy budget is to prevent the videos from
We use three different optimization schemes described in
leaking human identity information. In the experiments on
Section 3.3 on our framework and empirically show all three
UCF101 and HMDB51, the privacy budget is to protect vi-
largely outperform the baseline methods. We also show
sual privacy attributes as defined in [14]. We emphasize that
that adding the model ensemble and model restarting, as
the general framework proposed in Section 3.2 can be used
described in Section 3.5, to the optimization procedure can
for a large variety of target utility tasks and privacy budget
further improve the performance of our method.
task combinations, not only limited to the aforementioned
settings. 4.1.1 Experiment Setting
Following the notations in Section 3.2, on all the video SBU Kinect Interaction Dataset [61] is a two-person inter-
action recognition datasets including SBU, UCF101 and action dataset for video-based action recognition. 7 partici-
HMDB51, we set W = 112, H = 112, C = 3, and T = 16 pants performed actions, and the dataset is composed of 21
(C3D’s required temporal length and spatial resolution). sets. Each set uses different pairs of actors to perform all 8
Note that the original resolution for SBU, UCF101 and interactions. However, some sets use the same two actors
HDMB51 are 640×480, 320×240 and 320×240, respectively. but with different actors acting and reacting. For example,
We downsample video frames to resolution 160×120. To re- in set 1, actor 1 is acting, and actor 2 is reacting; in set 4,
duce the spatial resolution to 112×112, we use random-crop actor 2 is acting, and actor 1 is reacting. These two sets
and center-crop in training and evaluation, respectively. have the same actors, so we combine them as one class to
Baseline Approaches. We consistently use two groups of better fit our experimental setting. In this way, we combine
approaches as baselines across the three action recognition all sets with the same actors and finally get 13 different actor
datasets. These two groups of baselines are naive downsam- pairs. This dataset’s target utility task is action recognition,
ples and empirical obfuscations. The group of naive downsam- which could be taken as a classification task with 8 different
ples chooses downsample rates from {1, 2, 4, 8, 16}, where 1 classes. The privacy budget task is to recognize the actor
stands for no down-sampling. The group of empirical obfus- pairs of the videos, which could be taken as a classification
cations includes approaches selected from different combi- task with 13 different classes.
nations in {box, segmentation} × {blurring, blackening} ×
{face, human body}. Details are listed below: 4.1.2 Implementation Details
• Naive Downsamle: Spatially downsample each frame. In Algorithms 1-3, we set step sizes αT = 10−5 , αB = 10−2 ,
• Box-Black-Face: Boxing and blackening faces. αA = 10−4 , accuracy thresholds thT = 85%, thB = 99%
8
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 9

90 Our methods, in contrast, achieve a great balance between

utility and privacy protection. Ours-Entropy can decrease
80 ANB by around 30% with nearly no harm on AT .
Action accuracy AT (%)

Comparison of three methods. K -Beam is a state-of-the-art

70 minimax optimization problem, and we apply it to solve
a sub-problem (i.e., Eq. (5b)) of our more complex three-
60 party competition game. Unfortunately, we empirically find
the K -Beam algorithm becomes more unstable when we
50 introduce a new competing party to the minimax opti-
mization problem. GRL is originally proposed for domain
40 adaptation. On our new visual privacy protection task, we
Naive Downsample Ours-K-Beam find it unstable and sensitive to model initialization. By
30
Empirical Obfuscation Ours-K-Beam+ replacing the concave negative cross-entropy loss function
GRL Ours-Entropy
with the convex negative entropy, Ours-Entropy empirically
GRL+ Ours-Entropy+
20 stabilizes the optimization and gives the best performance
20 40 60 80 100 among all three methods.
Actor Pair Accuracy AN
B (%) The results also show the effectiveness of model restart-
ing and model ensemble: model restarting can further sup-
Fig. 2: The trade-off between privacy budget and action utility on
SBU dataset. For Naive Downsample method, a larger marker means
press ANB with little harm on AT , and model ensemble with
a larger adopted down-sampling rate. For Ours-K -Beam method, a larger M can improve the trade-off even more.
larger marker means a larger K (number of beams) in Algorithm 2.
For Ours-Entropy and Ours-Entropy (restarting), a larger marker means a
larger M (number of ensemble models) in Algorithm 4. Methods with
“+” superscript are incorporated with model restarting. Vertical and 4.2 Action Recognition on UCF101 with Multiple Pri-
horizontal purple dashed lines indicate AN B and AT on the original vacy Attributes Protected: Cross-Dataset Training and
non-anonymized videos, respectively. The black dashed line indicates Evaluation
where AN B = AT . Detailed experimental settings and numerical results
for each method can be found in Appendix B.
4.2.1 Experiment Setting
UCF101 is an action recognition dataset with 13,320 real-life
and max iter = 800. In Algorithm 2, we set d iter to human action videos collected from YouTube. It contains
be 30. In Algorithm 4, we set rstrt iter to be 100. Other videos of 101 different actions. We use the official train-test
hyper-parameters of Algorithm 4 are identical with those split for this dataset.
in Algorithm 3. We set γ =2 in Eq. (3) and use Adam The target utility task T is to do human action recogni-
optimizer [66] to update all parameters. tion on UCF101, which can be taken as a video classification
task with 101 classes.
4.1.3 Results and Analyses VISPR is a dataset with 22,167 images annotated with 68
We present the experimental results of our proposed meth- privacy attributes, e.g., semi-nudity, hobbies, face, race, gen-
ods and other baseline methods in Figure 2, which shows der, skin color, and so on. Each attribute of an image is la-
the trade-off between the action recognition accuracy AT , beled as “present” or “non-present” depending on whether
and the actor pair recognition accuracy AN B . In order to in- the specific privacy attribute information is contained in
terpret this figure, we should note that a desirable trade-off the image. Among the 68 attributes, there are 7 attributes
should incur maximal target accuracy AT (y-axis) and mini- that frequently appear in both UCF101 datasets and the
mal privacy budget accuracy AN B (x-axis). Therefore, a point smart home videos. Therefore we select these 7 attributes for
closer to the top-left corner represents an anonymization protection in our experiments. These 7 attributes are semi-
∗ nudity, occupation, hobbies, sports, personal relationship,
model fA with more desirable performance. The magenta
dotted line suggests the target accuracy AT on original social relationship, and safe. The privacy budget task B is to
unprotected videos. This can be roughly considered as the predict privacy attributes on the VISPR dataset, which can
AT upper bound for all privacy protection methods, under be taken as a multi-label image classification task (7 labels,
∗ each is a binary classification task). We adopt the class-based
the assumption that fA will unavoidably filter out some
useful information for the target utility task. mean average precision (cMAP) [14] as AN B to measure the
As we can see, Ours-K -Beams, Ours-Entropy, and GRL performance of the privacy budget task. The official train-
all largely outperform the two groups of naive baselines. test split is used on the VISPR dataset.
{box, segmentation} × {blurring, blackening} × {face} and
naive downsample with a low rate (e.g., 2 and 4) can lead to
4.2.2 Implementation Details
decent action accuracy, but the privacy budget accuracy AN B
is still very high, meaning these methods fail to protect pri- In Algorithm 4, we set step sizes αT = 10−5 , αB = 10−2 ,
vacy. On the other hand, {box, segmentation} × {blurring, αA = 10−4 , accuracy thresholds thT = 70%, thB = 99%,
blackening} × {body} and naive downsample with a high max iter = 800 and rstrt iter = 100. We set γ = 0.5 in
rate (e.g., 8 and 16) can effectively suppress AN B to a low Eq. (3) and use Adam optimizer to update all parameters.
level, but AT also suffers a huge negative impact, which Values of K and M are identical to those in SBU experi-
means the anonymized videos are of little practical utility. ments.

9
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 10

65
Action accuracy AT (%)

40 Naive Downsample Ours-K-Beam

Empirical Obfuscation Ours-K-Beam+

35 GRL Ours-Entropy

GRL+ Ours-Entropy+
30
25 30 35 40 45 Original M=1 M=1+ M=4+
Fig. 4: The center frame of example videos before (column 1) and after
Privacy Attributes cMAP AN
B (%) (columns 2-4) applying the anonymization transform learned by Ours-
Fig. 3: The trade-off between privacy budget and action utility on UCF- Entropy. The first row shows a frame from a “pushing” video in the
101/VISPR Dataset. For Naive Downsample method, a larger marker SBU dataset; the second row shows a frame from a “handstand” video
means a larger down sampling rate is adopted. For Ours-K -Beam in the UCF101 dataset; the third row shows a frame from a “push-
method, a larger marker means a larger K (number of beams) in up” video in the PA-HMDB51 dataset. Privacy attributes in the last two
Algorithm 2. For Ours-Entropy and Ours-Entropy (restarting), a larger rows include semi-nudity, face, gender, and skin color. Model restarting
marker means a larger M (number of ensemble models) in Algorithm 4. and ensemble settings are indicated below each anonymized image. M
Methods with “+” superscript are incorporated with model restarting. is the number of ensemble models. Methods with a “+” superscript are
Vertical and horizontal purple dashed lines indicate AN incorporated with model restarting.
B and AT on
the original non-anonymized videos, respectively. The black dashed
line indicates where AN B = AT . Detailed experimental settings and
numerical results for each method can be found in Appendix B.
between privacy budget and action utility achieved by a
∗
learned anonymization model fA . To solve this problem,
we annotate and present the very first human action video
4.2.3 Results and Analyses
dataset with privacy attributes labeled, named PA-HMDB51
We present the experimental results in Figure 3. All naive (Privacy-Annotated HMDB51). We evaluate our method
downsample and empirical obfuscation methods cause AT on this newly built dataset and further demonstrate our
to drop dramatically while ANB only drops a little bit, which method’s effectiveness.
means the utility of videos is greatly reduced while the
private information is hardly filtered out. In contrast, with
the help of model restarting and model ensemble, Ours- 5.2 Selecting and Labeling Privacy Attributes
Entropy can decrease AN B by 7% while keeping AT as high
as that on the original raw videos, meaning the privacy A recent work [14] has defined 68 privacy attributes which
is protected at almost no cost on the utility. Hence, Ours- could be disclosed by images. However, most of them sel-
Entropy outperforms all naive downsample and empirical dom make any appearance in public human action datasets.
obfuscation baselines in this experiment. It also shows an We carefully select 5 privacy attributes that are most rel-
advantage over GRL and Ours-K -Beam. evant to our smart home settings out of the 68 attributes
from [14]: skin color, gender, face, nudity, and personal
relationship (only intimate relationships such as friends,
4.3 Anonymized Video Visualization
couples or family members are considered in our setting).
We provide the visualization of the anonymized videos The detailed description of each attribute, their possible
on SBU, UCF101, and our new dataset PA-HMDB51 (see ground truth values, and their corresponding meanings are
Section 5) in Figure 4. To save space, we only show the listed in Table 1. Some annotated frames in our PA-HMDB51
center frame of each anonymized video. The visualization dataset are shown in Table 2 as examples.
shows that the privacy attributes in the anonymized videos Privacy attributes may vary during the video clip. For
are filtered out, but it is still possible to recognize the actions. example, in some frames, we may see a person’s full face,
while in the next frames, the person may turn around, and
5 PA-HMDB51: A N EW B ENCHMARK his/her face is no longer visible. Therefore, we decide to
label all the privacy attributes on each frame 4 .
5.1 Motivation
The annotation of privacy labels was manually per-
There is no public dataset containing both human action and formed by a group of students at the CSE department of
privacy attribute labels on the same videos in the literature. Texas A&M University. Each video was annotated by at least
This poses two challenges. Firstly, the lack of available three individuals and then cross-checked.
datasets has increased the difficulty in employing a data-
driven joint training method. Secondly, this complication 4. A tiny portion of frames in some HMDB51 videos do not contain
has made it impossible to directly evaluate the trade-off any person. No privacy attributes are annotated on those frames.

10
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 11
1.0
5.3 HMBD51 as the Data Source draw sword
kiss
draw sword
kiss
ride bike ride bike
Now that we have defined the 5 privacy attributes, we need dribble
hug
dribble
hug
dive dive
to identify a source of human action videos for annotation. cartwheel cartwheel
drink drink
There are a number of choices available, such as [58], [59], climb
eat
climb
eat
0.8
[67]–[69]. We choose HMDB51 [59] to label privacy at- walk
fall floor
walk
fall floor
stand stand
tributes since it consists of more diverse private information, sword sword
pick pick
especially nudity/semi-nudity and relationship. swing baseball swing baseball
handstand handstand
We provide a per-frame annotation of the selected 5 shoot bow
fencing
shoot bow
fencing
privacy attributes on 515 videos selected from HMDB51. shoot ball
talk
0.6
talk
shoot ball
kick ball kick ball
In this paper, we treat all 515 videos as testing samples5 . push push
pour pour
Our ultimate goal would be to create a larger-scale version wave
ride horse
wave
ride horse
of PA-HMDB51 that allows for both training and testing flic flac
turn
flic flac
turn
shake hands shake hands
coherently on the same benchmark. For now, we use PA- somersault somersault
climb stairs 0.4 climb stairs
HMDB51 to facilitate better testing, while still consider- smoke
golf golf
smoke
ing cross-dataset training as a rough yet useful option to situp
shoot gun
situp
shoot gun
punch punch
train privacy-preserving video recognition (before the larger run run
sit sit
dataset becomes available). smile
kick
smile
kick
sword exercise 0.2 sword exercise
laugh laugh
throw throw
pushup pushup
5.4 Dataset Statistics hit hit
chew chew
5.4.1 Action Distribution clap
pullup
clap
pullup
catch catch
When selecting videos from the HMDB51 dataset, we con- jump
brush hair
jump
brush hair
0.0
sider two criteria on action labels. First, the action labels 0 5 10 15 20 25 0 1 2 3 4 0 1 0 1 2 0 1 2 0 1 2 3
skin color relation face nudity gender

should be balanced. Second (and more implicitly), we select Fig. 5: Left: action distribution of PA-HMDB51. Each bar shows the
more videos with non-trivial privacy labels. For example, number of videos with a certain action. E.g., the last bar shows there
“brush hair” action contains many videos with a “semi- are 25 “brush hair” videos in the PA-HMDB51 dataset; Right: action-
attribute correlation in the PA-HMDB51 dataset. The x-axis are all
nudity” attribute, and “pull-up” action contains many possible values grouped by bracket for each privacy attribute. The y -
videos with a “partially visible face” attribute. Despite their axis are different action types. The color represents ratio of the number
practical importance, these privacy attributes are relatively of frames of some action annotated with a specific privacy attribute
value w.r.t. the total number of frames of the action.
less seen in the entire HMDB51 dataset, so we tend to select
more videos with these attributes, regardless of their action
classes. The resultant distribution of action labels is depicted
0 1 2 3 4
in Figure 5 (left panel), showing a relative class balance.
SC 3 71 14 10 2
5.4.2 Privacy Attribute Distribution RL 84 16
We try to make the label distribution for each privacy
FC 25 38 37
attribute as balanced as possible by manually selecting those
videos containing uncommon privacy attribute values in ND 40 44 15
original HMDB51 to label. For instance, videos with semi-
GR 2 55 36 7
nudity are overall uncommon, so we deliberately select
those videos containing semi-nudity into our PA-HMDB51 Fig. 6: Label distribution per privacy attribute in the PA-HMDB51. SC,
dataset. Naturally, people are reluctant to release data that RL, FC, ND, and GR stand for skin color, relationship, face, nudity,
and gender, respectively. The rounded ratio numbers are shown as
contains privacy concerns to the public, so the privacy at- white text (in % scale). Definitions of label values (0, 1, 2, 3, 4) for each
tributes are highly unbalanced in any public video datasets. attribute are described in Table 1.
Although we have used this method to reduce the data
imbalance, the PA-HMDB51 is still unbalanced. Frame-level
label distributions of all 5 privacy attributes are shown in
Figure 6. “brush hair” since this action is carried out much more often
by females than by males. We show the correlation between
5.4.3 Action-Attribute Correlation privacy attributes and actions in Figure 5 (right panel) and
more details in Appendix D.
If there was a strong correlation between a privacy attribute
and an action, it would be harder to remove the private
information from the videos without much harm to the ac-
6. For “skin color” and “gender,” we allow multiple labels to coexist.
tion recognition task. For example, we would expect a high For example, if a frame showed a black person’s shaking hands with
correlation between the attribute “gender” and the action a white person, we would label “black” and “white” for the “skin
color” attribute. In the visualization, we use “coexisting” to represent
5. Labeling per-frame privacy attributes on a video dataset is ex- the multi-label coexistence and we don’t show in detail whether it is
tremely labor-consuming and subjective (needing individual labeling “white and black coexisting” or “black and yellow coexisting.” For the
then cross-checking). As a result, the current size of PA-HMDB51 is remaining three attributes, we label each attribute using the highest
limited. So far, we have only used PA-HMDB51 as the testing set, and privacy-leakage risk among all persons in the frame. E.g., given a frame
we seek to annotate more data and hopefully expand PA-HMDB51 for where a group of people are hugging, if there is at least one complete
training as future work. face visible, we would label the “face” attribute as “completely visible.”

11
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 12
TABLE 1: Attribute definition in the PA-HMDB51 dataset
and “coexisting” into the one class of “can tell.” This is
Attribute Possible Values & Meaning because we must keep the testing protocol on PA-HMDB51
0: Skin color of the person(s) is/are unidentifiable. consistent with the training protocol on VISPR (a multi-
1: Skin color of the person(s) is/are white. label, “either-or” type binary classification task, so that our
Skin Color 2: Skin color of the person(s) is/are brown/yellow. models cross-trained on UCF101-VISPR can be evaluated
3: Skin color of the person(s) is/are black. directly. We hope to extend training to PA-HMDB51 in the
4: Persons with different skin colors are coexisting.6 future so that the privacy budget task can be formulated
0: Invisible (< 10% area is visible). and evaluated as a multi-label classification problem.
Face 1: Partially visible (≥ 10% and ≤ 70% area is visible). All implementation details are identical with the UCF101
2: Completely visible (> 70% area is visible). case, except that we adjust thT = 0.7 and thB = 0.95.
0: The gender(s) of the person(s) is/are unidentifiable.
1: The person(s) is/are male. 5.5.2 Results and Analysis
Gender
2: The person(s) is/are female. The results on PA-HMDB51 are shown in Figure 7. Our
3: Persons with different genders are coexisting. methods achieve much better trade-off between privacy
0: No-nudity with long sleeves and pants. budget and action utility compared with baseline methods.
Nudity 1: Partial-nudity with short sleeves, skirts, or shorts. When M = 4, our methods can decrease privacy cMAP by
2: Semi-nudity with half-naked body. around 8% with little harm to utility accuracy. Overall, the
Relationship
0: Personal relationship is unidentifiable. privacy gains are more limited compared to the previous
1: Personal relationship is identifiable. two experiments, because no (re-)training is performed; but
the overall comparison trends show the same consistency.
TABLE 2: Examples of the annotated frames in the PA-HMDB51 dataset
Asymmetrical Privacy Attributes Protection Cost. Differ-
Frame Action Privacy Attributes ent privacy attributes have different protection costs. After
applying the learned anonymization optimized by Ours-
• skin color: white
• face: invisible
Entropy (restarting, M =4) on PA-HMDB51, the drop in AP
Brush • gender: female of “face” is much more significant than “gender,” which
hair • nudity: semi-nudity indicates that the “gender” attribute is much harder to sup-
• relationship: unrevealed
press than “face.” Such observation agrees that the gender
attribute can be revealed by face, body, clothing, and even
• skin color: black hairstyle. In future work, we will take such cost asymme-
• face: completely visible
• gender: male
try into account by using a weighted loss combination of
Situp
• nudity: semi-nudity different privacy attributes or training dedicated privacy
• relationship: unrevealed
protector for the most informative private attribute.
Human Study on the Privacy Protection of Our Learned
Anonymization. We use a human study to evaluate
5.5 Benchmark Results on PA-HMDB51: Cross-Dataset the trade-off between privacy budget and action utility
Training achieved by our learned anonymization transform. We take
both privacy protection and action recognition into account
5.5.1 Experiment Setting in the study. We emphasize here that both privacy protection
We train our models using cross-dataset training on and action recognition are evaluated on video level. There
HMDB51 and VISPR datasets as we did in Section 4.2, ex- are 515 videos distributed on 51 actions in the PA-HMDB51.
cept that we use the 5 attributes defined in Table 1 on VISPR For each action in the PA-HMDB51, we randomly pick one
instead of the 7 used in Section 4.2. The trained models video for the human study. Among the 51 selected videos,
are directly evaluated on the PA-HMDB51 dataset7 for both we only keep 30 videos to reduce the human evaluation
target utility task T and privacy budget task B , without any cost. There were 40 volunteers involved in the human
re-training or adaptation. We exclude the videos in the PA- study. In the study, they were asked to label all the privacy
HMDB51 from the HMDB51 to get the training set. Similar attributes and the action type on the raw videos and the
to the UCF101 experiments, the target utility task T (i.e., anonymized videos. According to the experimental results
action recognition) can be taken as a video classification (shown in Appendix E), the actions in the anonymized
problem with 51 classes, and the privacy budget task B videos are still distinguishable to humans, but the privacy
(i.e., privacy attribute prediction) can be taken as a multi- attributes are not recognizable at all. This human study
label image classification task with two classes for each further justifies that our learned anonymization transform
privacy attribute label. Notably, although PA-HMDB51 has can protect the privacy and maintain target utility task
provided concrete multi-class labels with specific privacy performance simultaneously.
attribute classes, we convert them into binary labels during
testing. For example, for “gender” attribute, we have pro- 6 C ONCLUSION
vided ground truth labels “male,” “female,” “coexisting,”
and “cannot tell,” but we only use “can tell” and “cannot We propose an innovative framework to address the newly-
tell” in our experiments, via combining “male,” “female” established problem of privacy-preserving action recogni-
tion. To tackle the challenging adversarial learning process,
7. We only use PA-HMDB51 as the testing set so far, since the current we investigate three different optimization schemes. To fur-
size of PA-HMDB51 is limited for training. ther tackle the ∀ challenge of universal privacy protection,

12
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 13

70 [4] B. Heater, “Amazon’s camera-equipped echo look raises new

questions about smart home privacy,” April 2017. [Online].
65 Available: https://tinyurl.com/y5qp758s
Action accuracy AT (%)

[5] T. Brewster, “Thousands of banned chinese surveillance cameras

60 are watching over america,” August 2019. [Online]. Available:
https://tinyurl.com/y3vztmzf
55 [6] T. U. S. D. of Justice, “Privacy act of 1974, as amended, 5 u.s.c. §
552a,” 1974. [Online]. Available: https://tinyurl.com/yb2u2s6g
50 [7] M. A. Weiss and K. Archick, “Us-eu data privacy: from safe harbor
to privacy shield,” 2016.
45 [8] T. U. S. D. of Homeland Security, “Passenger name records
agreements,” 2004. [Online]. Available: https://tinyurl.com/
40 y4g2djf3
[9] R. Leenes, R. Van Brakel, S. Gutwirth, and P. De Hert, Data
35 Naive Downsample Ours-K-Beam protection and privacy:(In) visibilities and infrastructures, 2017.
Empirical Obfuscation Ours-K-Beam+ [10] M. S. Ryoo, B. Rothrock, C. Fleming, and H. J. Yang, “Privacy-
30 GRL Ours-Entropy preserving human activity recognition from extreme low resolu-
GRL+ Ours-Entropy+ tion,” in AAAI, 2017.
25 [11] D. J. Butler, J. Huang, F. Roesner, and M. Cakmak, “The privacy-
62.5 65.0 67.5 70.0 72.5 75.0 77.5 utility tradeoff for remotely teleoperated robots,” in HRI, 2015.
Privacy Attributes cMAP AN
B (%)
[12] J. Dai, B. Saghafi, J. Wu, J. Konrad, and P. Ishwar, “Towards
privacy-preserving recognition of human activities,” in ICIP, 2015.
Fig. 7: The trade-off between privacy budget and action utility on [13] Z. Wu, Z. Wang, Z. Wang, and H. Jin, “Towards privacy-preserving
PA-HMDB51 Dataset. For Naive Downsample method, a larger marker visual recognition via adversarial training: A pilot study,” in
means a larger downsampling rate is adopted. For Ours-K -Beam ECCV, 2018.
method, a larger marker means a larger K (number of beams) in [14] T. Orekondy, B. Schiele, and M. Fritz, “Towards a visual privacy
Algorithm 2. For Ours-Entropy and Ours-Entropy (restarting), a larger advisor: Understanding and predicting privacy risks in images,”
marker means a larger M (number of ensemble models) in Algorithm 4. in ICCV, 2017.
Methods with “+” superscript are incorporated with model restarting. [15] F. Pittaluga, S. J. Koppal, S. B. Kang, and S. N. Sinha, “Revealing
Vertical and horizontal purple dashed lines indicate AN
B and AT on scenes by inverting structure from motion reconstructions,” in
the original non-anonymized videos, respectively. The black dashed CVPR, 2019.
line indicates where AN B = AT . Detailed experimental settings and [16] A. Dosovitskiy and T. Brox, “Inverting visual representations with
numerical results for each method can be found in Appendix B. convolutional networks,” in CVPR, 2016.
[17] H. Kato and T. Harada, “Image reconstruction from bag-of-visual-
words,” in CVPR, 2014.
we propose the privacy budget model restarting and en- [18] P. Weinzaepfel, H. Jégou, and P. Pérez, “Reconstructing an image
semble strategies. Both are shown to improve the privacy- from its local descriptors,” in CVPR, 2011.
[19] A. Mahendran and A. Vedaldi, “Visualizing deep convolutional
utility trade-off further. Various simulations verified the neural networks using natural pre-images,” IJCV, 2016.
effectiveness of the proposed framework. More importantly, [20] C. Gentry et al., “Fully homomorphic encryption using ideal
we collect the first dataset for privacy-preserving video lattices.” in STOC, 2009.
action recognition, an effort that we hope could engage a [21] P. Xie, M. Bilenko, T. Finley, R. Gilad-Bachrach, K. Lauter, and
broader community into this research field. M. Naehrig, “Crypto-nets: Neural networks over encrypted data,”
arXiv, 2014.
We note that there is much room to improve the pro- [22] A. Chattopadhyay and T. E. Boult, “Privacycam: a privacy pre-
posed framework before it can be used in practice. For serving camera using uclinux on the blackfin dsp,” in CVPR, 2007.
example, the definition of privacy leakage risk is core to [23] T. Winkler, A. Erdélyi, and B. Rinner, “Trusteye. m4: protecting the
the framework. Considering the ∀ challenge, current LB sensor—not the camera,” in AVSS, 2014.
[24] Z. W. Wang, V. Vineet, F. Pittaluga, S. N. Sinha, O. Cossairt, and
defined with any specific fB is insufficient; the privacy S. Bing Kang, “Privacy-preserving action recognition using coded
budget model ensemble could only be viewed as a rough aperture videos,” in CVPRW, 2019.
discretized approximation of P . More elaborated ways to [25] F. Pittaluga and S. J. Koppal, “Privacy preserving optics for minia-
approach this ∀ challenge may lead to a further break- ture vision sensors,” in CVPR, 2015.
[26] F. Pittaluga and S. J. Koppal, “Pre-capture privacy for small vision
through in achieving the optimization goal.
sensors,” TPAMI, 2017.
[27] L. Jia and R. J. Radke, “Using time-of-flight measurements for
ACKNOWLEDGEMENT privacy-preserving tracking in a smart room,” IINF, 2014.
[28] S. Tao, M. Kudo, and H. Nonaka, “Privacy-preserved behavior
Z. Wu, H. Wang, and Z. Wang were partially supported by analysis and fall detection by an infrared ceiling sensor network,”
NSF Award RI-1755701. The authors would like to sincerely Sensors, 2012.
thank Scott Hoang, James Ault, and Prateek Shroff for [29] Z. Wang, S. Chang, Y. Yang, D. Liu, and T. S. Huang, “Studying
very low resolution recognition using deep networks,” CVPR,
assisting the labeling of the PA-HMDB51 dataset. 2016.
[30] B. Cheng, Z. Wang, Z. Zhang, Z. Li, D. Liu, J. Yang, S. Huang, and
T. S. Huang, “Robust emotion recognition from low quality and
R EFERENCES low bit rate video: A deep learning approach,” in ACII, 2017.
[1] C. H. Donna Lu, “How abusers are exploiting smart [31] M. Xu, A. Sharghi, X. Chen, and D. J. Crandall, “Fully-coupled
home devices?” October 2019. [Online]. Available: https: two-stream spatiotemporal networks for extremely low resolution
//tinyurl.com/y4moswga action recognition,” in WACV, 2018.
[2] M. W. Tobias, “Is your smart security camera protecting your [32] Z. Wu, K. Suresh, P. Narayanan, H. Xu, H. Kwon, and Z. Wang,
home or spying on you?” August 2016. [Online]. Available: “Delving into robust object detection from unmanned aerial vehi-
https://tinyurl.com/y2yu7hc3 cles: A deep nuisance disentanglement approach,” in ICCV, 2019.
[3] D. Harwell, “Doorbell-camera firm ring has partnered with 400 [33] P. M Uplavikar, Z. Wu, and Z. Wang, “All-in-one underwater im-
police forces, extending surveillance concerns,” August 2019. age enhancement using domain-adversarial learning,” in CVPRW,
[Online]. Available: https://tinyurl.com/szaxvxv 2019.

13
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 14

[34] F. Pittaluga, S. Koppal, and A. Chakrabarti, “Learning privacy [64] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,
preserving encodings through adversarial training,” in WACV, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with
2019. convolutions,” in CVPR, 2015.
[35] M. Bertran, N. Martinez, A. Papadaki, Q. Qiu, M. Rodrigues, [65] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Re-
G. Reeves, and G. Sapiro, “Adversarially learned representations thinking the inception architecture for computer vision,” in CVPR,
for information obfuscation and inference,” in ICML, 2019. 2016.
[36] P. C. Roy and V. N. Boddeti, “Mitigating information leakage in [66] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
image representations: A maximum entropy approach,” in CVPR, tion,” in ICLR, 2015.
2019. [67] B. G. Fabian Caba Heilbron, Victor Escorcia and J. C. Niebles,
[37] B. H. Zhang, B. Lemoine, and M. Mitchell, “Mitigating unwanted “Activitynet: A large-scale video benchmark for human activity
biases with adversarial learning,” in AIES, 2018. understanding,” in CVPR, 2015.
[38] Z. Ren, Y. Jae Lee, and M. S. Ryoo, “Learning to anonymize faces [68] C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li,
for privacy preserving action detection,” in ECCV, 2018. S. Vijayanarasimhan, G. Toderici, S. Ricco, R. Sukthankar et al.,
“Ava: A video dataset of spatio-temporally localized atomic visual
[39] R. R. Shetty, M. Fritz, and B. Schiele, “Adversarial scene editing:
actions,” in CVPR, 2018.
Automatic object removal from weak supervision,” in NeurIPS,
[69] W. Kay, J. Carreira, K. Simonyan, B. Zhang, C. Hillier, S. Vijaya-
2018.
narasimhan, F. Viola, T. Green, T. Back, P. Natsev et al., “The
[40] T. Wang, J. Zhao, M. Yatskar, K.-W. Chang, and V. Ordonez, kinetics human action video dataset,” arXiv, 2017.
“Balanced datasets are not enough: Estimating and mitigating
gender bias in deep image representations,” in ICCV, 2019.
[41] W. Oleszkiewicz, P. Kairouz, K. Piczak, R. Rajagopal, and Zhenyu Wu received the M.S. and B.E. de-
T. Trzciński, “Siamese generative adversarial privatizer for bio- grees from the Ohio State University and Shang-
metric data,” in ACCV, 2018. hai Jiao Tong University, in 2017 and 2015
[42] Y. Li, N. Vishwamitra, B. P. Knijnenburg, H. Hu, and K. Caine, respectively. He is currently a Ph.D. student
“Blur vs. block: Investigating the effectiveness of privacy- at Texas A&M University, advised by Prof.
enhancing obfuscation for images,” in CVPRW, 2017. Zhangyang Wang. His research interests include
privacy/fairness in machine learning, efficient vi-
[43] S. J. Oh, R. Benenson, M. Fritz, and B. Schiele, “Faceless person
sion, object detection, and adversarial learning.
recognition: Privacy implications in social media,” in ECCV, 2016.
[44] R. McPherson, R. Shokri, and V. Shmatikov, “Defeating image
obfuscation with deep learning,” arXiv, 2016.
[45] S. J. Oh, M. Fritz, and B. Schiele, “Adversarial image perturbation
for privacy protection a game theory perspective,” in ICCV, 2017. Haotao Wang received the B.E. degree in EE
[46] D. Gurari, Q. Li, C. Lin, Y. Zhao, A. Guo, A. Stangl, and J. P. from Tsinghua University, China, in 2018. He is
Bigham, “Vizwiz-priv: A dataset for recognizing the presence and working toward a Ph.D. degree at the University
purpose of private visual information in images taken by blind of Texas at Austin, under the supervision of Prof.
people,” in CVPR, 2019. Zhangyang Wang. His research interests include
computer vision and machine learning, espe-
[47] J. Hamm and Y.-K. Noh, “K-beam minimax: Efficient optimization
cially in fairness/privacy in machine learning, ad-
for deep adversarial learning,” in ICML, 2018.
versarial robustness, and model compression.
[48] Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by
backpropagation,” in ICML, 2015.
[49] X. Xiang and T. D. Tran, “Linear disentangled representation
learning for facial actions,” TCSVT, 2018.
[50] G. Desjardins, A. Courville, and Y. Bengio, “Disentangling factors Zhaowen Wang received the B.E. and M.S.
of variation via generative entangling,” arXiv, 2012. degrees from Shanghai Jiao Tong University,
China, in 2006 and 2009 respectively, and the
[51] A. Gonzalez-Garcia, J. van de Weijer, and Y. Bengio, “Image-to-
Ph.D. degree in ECE from UIUC in 2014. He
image translation for cross-domain disentanglement,” arXiv, 2018.
is currently a Senior Research Scientist with
[52] S. Reddy, I. Labutov, S. Banerjee, and T. Joachims, “Unbounded the Creative Intelligence Lab, Adobe Inc. His
human learning: Optimal scheduling for spaced repetition,” in research focuses on understanding and enhanc-
SIGKDD, 2016. ing images, videos and graphics via machine
[53] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time learning algorithms, with a particular interest in
style transfer and super-resolution,” in ECCV, 2016. sparse coding and deep learning.
[54] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learn-
ing spatiotemporal features with 3d convolutional networks,” in
ICCV, 2015. Hailin Jin is a Senior Principal Scientist at
[55] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for Adobe Research. He received his M.S. and
image recognition,” CVPR, 2016. Ph.D. in EE from WUSTL in 2000 and 2003. Be-
[56] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, tween fall 2003 and fall 2004, he was a postdoc
T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient researcher at the CS Department, UCLA. His
convolutional neural networks for mobile vision applications,” current research interests include deep learning,
arXiv, 2017. computer vision, and natural language process-
[57] D.-Z. Du and P. M. Pardalos, Minimax and applications, 2013. ing. His work can be found in many Adobe prod-
[58] K. Soomro, A. R. Zamir, and M. Shah, “Ucf101: A dataset of 101 ucts, including Photoshop, After Effects, Pre-
human actions classes from videos in the wild,” arXiv, 2012. miere Pro, and Photoshop Lightroom.
[59] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre, “Hmdb:
a large video database for human motion recognition,” in ICCV,
2011. Zhangyang Wang is currently an Assistant Pro-
[60] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transfer- fessor of ECE at UT Austin. He was an Assistant
able architectures for scalable image recognition,” in CVPR, 2018. Professor of CSE, at TAMU, from 2017 to 2020.
[61] K. Yun, J. Honorio, D. Chattopadhyay, T. L. Berg, and D. Samaras, He received his Ph.D. in ECE from UIUC in
“Two-person interaction detection using body-pose features and 2016, and his B.E. in EEIS from USTC in 2012.
multiple instance learning,” in CVPRW, 2012. Prof. Wang is broadly interested in the fields of
[62] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. machine learning, computer vision, optimization,
Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” and their interdisciplinary applications. His latest
in CVPR, 2018. interests focus on automated machine learning
[63] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep (AutoML), learning-based optimization, machine
residual networks,” in ECCV, 2016. learning robustness, and efficient deep learning.

14
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 15
TABLE 3: Detailed numerical results of all the experiments. AT stands
A PPENDIX A for target utility task (action recognition) accuracy while AN
B stands for
GRL AND O URS -K -B EAM WITH M ODEL R ESTART- privacy budget prediction performance (accuracy in classification task
and cMAP in multi-label classification task). r is the sampling rate for
ING the downsampling baselines. {box: X, segmentation: S} × {blurring:
B, blackening: K} × {face: F, human body: D} are different empirical
In this section, we provide the formal descriptions of GRL obfuscation baselines. K is the number of different sets of budget model
parameters tracked by Ours-K -Beam. M is the number of ensemble
and Ours-K -Beam algorithms with model restarting in Al- budget models used by Ours-Entropy. Methods with “+” superscript
gorithm 5 and Algorithm 6 respectively. are incorporated with model restarting.

SBU UCF101 HMDB51

Algorithm 5: GRL algorithm (with model restart- Method AT ANB
AT ANB
AT ANB
r=1 88.8 99.5 69.3 45.5 68.9 77.3
ing)
r=2 87.9 99.2 67.9 42.1 66.3 75.3
1 Initialize θA , θT and θB ; Downsample r=4 81.9 99.1 50.2 38.6 59.4 73.3
r=8 74.9 97.6 43.4 34.4 52.7 72.8
2 for t ← 1 to max iter do r = 16 64.4 97.2 32.8 31.2 48.3 70.9
3 if t ≡ 0 (mod rstrt iter) then XKF 88.5 98.9 64.2 41.3 66.7 74.3
4 Reinitialize {θB } XKD 24.6 21.4 45.7 24.7 38.2 68.9
SKF 88.4 98.4 64.5 41.0 67.4 74.8
5 end SKD 44.6 49.5 46.5 30.7 41.5 70.2
Obfuscation
6 Update θA using Eq. (4a) XBF 88.3 98.5 63.2 40.7 66.2 74.3
7 while Acc(X v , YTv ) ≤ thT do XBD 58.3 43.4 46.5 29.2 40.8 69.2
SBF 88.6 98.5 64.1 40.5 67.5 75.4
8 Update θT using Eq. (4b) SBD 54.2 36.3 47.6 31.5 39.5 68.6
9 end Ours-GRL
GRL 80.9 85.4 63.6 35.9 62.9 71.4
10 while Acc(X t , YB t
) ≤ thB do GRL+ 82.8 84.3 65.4 36.3 63.3 70.5
K=1 75.5 85.9 64.3 41.7 64.8 72.5
11 Update θB using Eq. (4c) K = 1+ 78.4 83.9 65.7 40.6 65.4 73.2
12 end K=2 80.5 87.7 64.5 40.4 64.4 71.0
Ours-K -Beam
13 end K = 2+ 84.5 80.6 65.5 39.5 63.5 69.3
K=4 80.9 88.4 65.9 39.7 62.2 69.5
K = 4+ 82.6 81.5 65.9 38.6 65.7 70.2
K=8 68.2 80.2 66.2 38.4 66.3 72.0
K = 8+ 74.7 80.3 67.0 37.5 66.4 71.3
M =1 83.2 73.6 64.3 40.7 65.2 70.1
Algorithm 6: Ours-K -Beam algorithm (with model M = 1+ 81.7 60.5 66.6 38.4 66.0 70.4
restarting) M =2 84.1 72.4 64.9 39.9 65.9 70.4
i K Ours-Entropy M = 2+ 82.6 57.9 67.5 36.3 67.3 70.3
1 Initialize θA , θT and {θB }i=1 ; M =4 82.7 73.0 64.0 38.6 67.7 71.5
2 for t ← 1 to max iter do M = 4+ 78.0 54.6 65.3 35.2 64.7 71.6
M =8 80.8 67.2 64.7 38.3 67.5 70.5
3 if t ≡ 0 (mod rstrt iter) then M = 8+ 82.2 47.7 66.4 34.5 64.6 70.1
i K
4 Reinitialize {θB }i=1
5 end
6 /*LT step:*/ A PPENDIX C
7 while Acc(X v , YTv ) ≤ thT do T RANSFERABILITY STUDY OF PRIVACY ATTRIBUTES
8 Update θT , θA using Eq. (6a)
BETWEEN UCF101/HMDB51 AND VISPR
9 end
10 /*LB Max step:*/ In Figure 11, we show some example frames from UCF101
11 Update j using Eq. (6b-i) and HMDB51 with privacy attributes predicted using the
12 for t2 ← 1 to d iter do privacy prediction model pretrained on VISPR. In each
13 Update θA using Eq. (6b-ii) example, the overlayed red text lines denote the predicted
14 end privacy attributes (as defined in the VISPR dataset [14]),
15 /*LB Min step:*/ showing a high risk of privacy leak in common daily
16 for i ← 1 to K do videos. The predicted privacy attributes include “approx-
17 while Acc(X t , YB t
) ≤ thB do imate age,” “approximate height,” “approximate weight,”
i “semi-nudity,” “full-nudity,” “partial face,” “complete face,”
18 Update θB using Eq. (6c)
19 end “eye color,” “hair color,” “skin color,” “race,” “gender,”
20 end “personal relationship,” and “work occasion.” We qualita-
21 end tively examine a large number of UCF101 frames and de-
termine that privacy attributes prediction is highly reliable.
Such high reliability in prediction justifies our hypothesis
that the privacy attributes have good “transferability” across
UCF101/HMDB51 and VISPR.

A PPENDIX B
D ETAILED NUMERICAL RESULTS A PPENDIX D
M ORE S TATISTICS ON PA-HMDB51 DATASET
In Table 3, we provide detailed numerical results reported Action Distribution The distribution of action labels (as
in Figures 2, 3 and 7. discussed in Section 5.4.3 of main paper) in PA-HMDB51

15
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 16
TABLE 4: Human study on the raw videos and the anonymized
are depicted in Figure 8, showing a relative class balance. videos. Random guess baseline is also provided.“P,” “R,” “F1” stand
Action-Attribute Correlation We show the correlation be- for precision, recall and F1-Score.
tween privacy attributes and actions (as discussed in Sec-
Raw Videos Anonymized Videos Random Guess
tion 5.4.3 of main paper) in Figure 9. P R F1 P R F1 P R F1
Skin Color 0.98 1.00 0.99 0.98 0.12 0.21 0.51 0.94 0.66
Face 0.97 0.99 0.98 0.94 0.40 0.56 0.47 0.66 0.55
draw sword
Gender 0.98 1.00 0.99 0.94 0.36 0.52 0.49 0.91 0.64
kiss Nudity 0.99 0.99 0.99 0.61 0.09 0.16 0.48 0.51 0.49
ride bike Relationship 0.97 0.88 0.92 0.47 0.39 0.43 0.49 0.14 0.22
dribble Micro-Avg 0.98 0.99 0.98 0.86 0.25 0.39 0.49 0.64 0.55
hug Macro-Avg 0.98 0.97 0.97 0.79 0.27 0.38 0.49 0.63 0.51
Weighted-Avg 0.98 0.99 0.98 0.88 0.25 0.37 0.49 0.64 0.52
dive Samples-Avg 0.98 0.99 0.98 0.45 0.24 0.30 0.49 0.61 0.51
cartwheel
drink
climb
eat
account in the study. We emphasize here that both privacy
walk protection and action recognition are evaluated on video
fall floor level.
stand Experiment Setting There are 515 videos distributed on 51
sword
actions in PA-HMDB51. For each action in PA-HMDB51,
pick
swing baseball
we randomly pick one video for human study. Among the
handstand 51 selected videos, we only keep 30 videos to reduce the
shoot bow human evaluation cost. The center frame of these 30 videos
fencing are shown in Figure 10. There are 40 volunteers involved
talk in the human study. In the study, they were asked to label
shoot ball
kick ball
all the privacy attributes and the action type on the raw videos
push and the anonymized videos. The guideline for labeling the
pour privacy attributes in the human study is listed below:
wave • gender: the person(s)’ gender can be told;
Actions

ride horse
• nudity: the persons is/are in semi-nudity (wearing
flic flac
turn shorts/skirts or naked to the waist);
shake hands • relationship: relationships (such as friends, couples, etc.)
somersault between/among the actors/actress can be told;
climb stairs • face: more than 10% of the face is visible;
golf
• skin color: the skin color of the person(s) can be told;
smoke
• no privacy attributes found: you cannot tell any privacy
situp
shoot gun attribute.
punch All the experimental results of multi-label privacy attributes
run
prediction in the human study are shown in Table 4. Table 4
sit
smile
shows the human study results on the raw videos, the
kick anonymized videos, and random guess. We use the same
sword exercise notations in the paper. AT stands for action recognition ac-
laugh curacy, and AB stands for the multi-label privacy attributes
g
throw prediction macro-F1 score. We use ArT , AaT , and AT to
pushup
hit
represent the action recognition accuracy on the raw videos,
chew on the anonymized videos, and by random guess. Likewise,
g
clap we use ArB , AaB and AB to represent the macro-F1 score
pullup of the privacy attributes prediction on the raw videos, on
catch the anonymized videos, and by random guess. AaB (0.38) is
jump g
much lower than ArB (0.97) and close to AB (0.51), justifying
brush hair ∗
0 5 10 15 20 25 the good privacy protection of our learned fA . AaT (0.5783)
Number of Videos
is comparable to ArB (0.9616) and significantly higher than
Fig. 8: Action distribution of PA-HMDB51. Each column shows the AgB (0.0333), justifying the good target utility preserving of
number of videos with a certain action. For example, the first column ∗
our learned fA .
shows there are 25 “brush hair” videos in PA-HMDB51 dataset.

A PPENDIX E
H UMAN S TUDY ON O UR L EARNED A NONYMIZATION
T RANSFORM
We use a human study to evaluate the privacy-utility trade-
∗
off achieved by our learned anonymization transform fA .
We take both privacy protection and action recognition into

16
Hit
dr dr dr dr dr
aw aw aw aw aw

unidentifiable
partial-nudity
completely visible
partially visible
unidentifiable
brown/yellow
unidentifiable

coexisting
semi-nudity
no-nudity
invisible
identifiable
coexisting

female
black

male
white
sw sw sw sw sw

Shoot ball
or or or or or
rid k d rid k d rid k d rid k d rid k d
e iss e iss e iss e iss e iss
dr bike dr bike dr bike dr bike dr bike
ib ib ib ib ib
bl bl bl bl bl
he he he he he

Sit
ca d ug ca d ug ca d ug ca d ug ca d ug

Jump
rtw iv rtw iv rtw iv rtw iv rtw iv
h e h e h e h e h e

Brush Hair Cartwheel

dr eel dr eel dr eel dr eel dr eel
i i i i i
cli nk cli nk cli nk cli nk cli nk
m m m m m
b b b b b
ea ea ea ea ea
fa wa t fa wa t fa wa t fa wa t fa wa t
ll lk ll lk ll lk ll lk ll lk

Kick
flo flo flo flo flo

Smile
Catch
st or st or st or st or s t or
a a a a a
sw sw nd sw sw nd sw sw nd sw sw nd sw sw nd
in o in o in o in o in o
g p rd g p rd g p rd g p rd g p rd
habase ick habase ick habase ick habase ick habase ick
n b n b n b n b n b
sh dst all sh dst all sh dst all sh dst all sh dst all
oo an oo an oo an oo an o o an
t d t d t d t d t d

Kiss
Chew
fenbow fenbow fenbow fenbow fenbow
c c c c c
sh ing sh ing sh ing sh ing sh ing

Somersault
oo tal oo tal oo tal oo tal oo tal
ki t b k ki t b k ki t b k ki t b k ki t b k
ck all ck all ck all ck all ck all
ba ba ba ba ba
pu ll pu ll pu ll pu ll pu ll
posh posh posh posh posh

Pick

Stand
Climb
rid w ur rid w ur rid w ur rid w ur rid w ur
e av e av e av e av e av
h e h e h e h e h e

17
fli ors fli ors fli ors fli ors fli ors
sh c flae sh c flae sh c flae sh c flae sh c flae
face

ak t c ak t c ak t c ak t c ak t c
nudity

gender
so e h urn so e h urn so e h urn so e h urn so e h urn
cli mer and cli mer and cli mer and cli mer and cli mer and
skin color

m sa s m sa s m sa s m sa s m sa s
relationship

b ul b ul b ul b ul b ul

Dive

Pour
st t st t st t st t st t

Sword
air air air air air
s s s s s
smgolf smgolf smgolf smgolf smgolf
ok ok ok ok ok
sh si e sh si e sh si e sh si e sh si e
oo tu oo tu oo tu o o tu oo tu
tg p tg p tg p tg p tg p
pu un pu un pu un pu un pu un
nc nc nc nc nc

Eat
h h h h h
ru ru ru ru ru

Pullup
n n n n n

Sword Ex.
sw si sw si sw si sw si sw si
or smil t or smil t or smil t or smil t or smil t
d k e d k e d k e d k e d k e
ex ic ex ic ex ic ex ic ex ic

Fig. 10: Example frames from PA-HMDB51 used in the human study.
er k er k er k er k er k
c c c c c
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

lauise lauise lauise lauise lauise

th gh th gh th gh th gh th gh

Push
pu row pu row pu row pu row pu row

Throw
Flic-flac
sh sh sh sh sh
up up up up up
h h h h h
ch it ch it ch it ch it ch it
ew ew ew ew ew
c c c c c
pu lap pu lap pu lap pu lap pu lap
ll ll ll ll ll
ca up ca up ca up ca up ca up

Golf

Turn
row “identifiable” and the column “kiss” shows the percentage of frames with “identifiable relationship” label in all kiss frames.

br jutch br jutch br jutch br jutch br jutch

Pushup
us m us m us m us m us m
h p h p h p h p h p
ha ha ha ha ha
ir ir ir ir ir

Run

Walk
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00

privacy attribute value w.r.t. the total number of frames of the specific action. For example, in the “relationship” subplot, the intersection block of
Fig. 9: Action-attribute correlation in PA-HMDB51 dataset. The color represents ratio of the number of frames of some action containing a specific
17

Handstand
Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 18

ApplyLipStick HandStand

Kiss Pullup

Pushup ShavingBeard

Sit Sit-up

Fig. 11: Privacy attributes prediction on selected frames from UCF101 and HMDB51. In each example, the overlayed red text lines denote the
privacy attributes (as defined in the VISPR dataset [14]) predicted by the privacy prediction model pretrained on VISPR, showing a high risk
of privacy leak in videos recording daily activities. The common privacy attributes in daily activities include “approximate age,” “approximate
weight,” “hair color,” “skin color,” “partial face,” “complete face,” “race,” “semi-nudity,” “gender,” “personal relationship,” and so on.
18

BRS Physiology
25% (4)
BRS Physiology
1 page
Prepladder Emergency Medicine
No ratings yet
Prepladder Emergency Medicine
75 pages
Novel Person Detection and Suspicious Activity Recognition Using Enhanced Yolov5 and Motion Feature Map
No ratings yet
Novel Person Detection and Suspicious Activity Recognition Using Enhanced Yolov5 and Motion Feature Map
36 pages
Privacy Preserving Human Activity Recognition Framework Using An Optimized Prediction Algorithm
No ratings yet
Privacy Preserving Human Activity Recognition Framework Using An Optimized Prediction Algorithm
11 pages
Electronics 11 01449 v2
No ratings yet
Electronics 11 01449 v2
34 pages
A2XP: Towards Private Domain Generalization
No ratings yet
A2XP: Towards Private Domain Generalization
10 pages
A Framework For Personal Data Protection in The IoT
No ratings yet
A Framework For Personal Data Protection in The IoT
8 pages
Zhenyu Wu Towards Privacy-Preserving Visual ECCV 2018 Paper
No ratings yet
Zhenyu Wu Towards Privacy-Preserving Visual ECCV 2018 Paper
19 pages
Exploring The Role of Human Behavior Analytics in Strengthening Privacy-Preserving Systems For Sensitive Data Protection
No ratings yet
Exploring The Role of Human Behavior Analytics in Strengthening Privacy-Preserving Systems For Sensitive Data Protection
16 pages
AI Meets Anonymity: How Named Entity Recognition Is Redefining Data Privacy
No ratings yet
AI Meets Anonymity: How Named Entity Recognition Is Redefining Data Privacy
9 pages
AI Privacy and Security Final
No ratings yet
AI Privacy and Security Final
59 pages
5G Core
No ratings yet
5G Core
101 pages
CCST 9047 Lecture8
No ratings yet
CCST 9047 Lecture8
49 pages
W Pg#s
No ratings yet
W Pg#s
30 pages
Presentation DISGUISE
No ratings yet
Presentation DISGUISE
19 pages
Privacy and Security Issues in Deep Learning A Survey
No ratings yet
Privacy and Security Issues in Deep Learning A Survey
28 pages
Efficient Video Privacy Protection Against Malicious Face Recognition Models
No ratings yet
Efficient Video Privacy Protection Against Malicious Face Recognition Models
10 pages
AI Paper
No ratings yet
AI Paper
20 pages
Maskanyone Toolkit: Offering Strategies For Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving
No ratings yet
Maskanyone Toolkit: Offering Strategies For Minimizing Privacy Risks and Maximizing Utility in Audio-Visual Data Archiving
17 pages
Quantifying The Vulnerability of Attributes For Effective PR
No ratings yet
Quantifying The Vulnerability of Attributes For Effective PR
12 pages
Prevention of The Crime of Using Narcoti
No ratings yet
Prevention of The Crime of Using Narcoti
12 pages
Software Architecture ForDocument Anonymization
No ratings yet
Software Architecture ForDocument Anonymization
18 pages
Event Anonymization Privacy-Preserving Person Re-Identification and Pose Estimation in Event-Based Vision
No ratings yet
Event Anonymization Privacy-Preserving Person Re-Identification and Pose Estimation in Event-Based Vision
17 pages
Big Data Privacy
No ratings yet
Big Data Privacy
28 pages
Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning
No ratings yet
Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning
13 pages
1 s2.0 S0020025521012391 Main
No ratings yet
1 s2.0 S0020025521012391 Main
16 pages
Future Generation Computer Systems: Application To Smart Iot Environments
No ratings yet
Future Generation Computer Systems: Application To Smart Iot Environments
16 pages
Metalink
No ratings yet
Metalink
9 pages
Book - A State of The Art Review On Adversarial Machine Learning
No ratings yet
Book - A State of The Art Review On Adversarial Machine Learning
66 pages
Lec 4
No ratings yet
Lec 4
14 pages
Balancing Privacy and Accuracy Exploring The Impact of Data Anonymization On Deep Learning Models in Computer Vision
No ratings yet
Balancing Privacy and Accuracy Exploring The Impact of Data Anonymization On Deep Learning Models in Computer Vision
13 pages
Replacement Autoencoder: A Privacy-Preserving Algorithm For Sensory Data Analysis
No ratings yet
Replacement Autoencoder: A Privacy-Preserving Algorithm For Sensory Data Analysis
12 pages
Aaai 2020
No ratings yet
Aaai 2020
8 pages
2019 CHEN - Surveillance Cameras - Context - Is - King - Privacy - Perceptions - of - Camera-Based - Surveillance
No ratings yet
2019 CHEN - Surveillance Cameras - Context - Is - King - Privacy - Perceptions - of - Camera-Based - Surveillance
6 pages
Harnessing AI For Data Privacy Through A Multidimensional Framework
No ratings yet
Harnessing AI For Data Privacy Through A Multidimensional Framework
20 pages
VISIONHUB Suspicious Movement Classification and Weapon Object Detection Using Recurrent Neural Network RNN and Region Based Convolutional Neural Network R CNN
No ratings yet
VISIONHUB Suspicious Movement Classification and Weapon Object Detection Using Recurrent Neural Network RNN and Region Based Convolutional Neural Network R CNN
61 pages
Lec 1
No ratings yet
Lec 1
8 pages
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
No ratings yet
Differential Privacy Preservation in Deep Learning Challenges Opportunities and Solutions
11 pages
Privacy-Aware Document Visual Question Answering
No ratings yet
Privacy-Aware Document Visual Question Answering
36 pages
Achieving Epsilon-Object Indistinguishability in Surveillance Videos Through Trajectory Randomization
No ratings yet
Achieving Epsilon-Object Indistinguishability in Surveillance Videos Through Trajectory Randomization
10 pages
Ttaa
No ratings yet
Ttaa
30 pages
InfoSec Project Report
No ratings yet
InfoSec Project Report
10 pages
WACV Privacy
No ratings yet
WACV Privacy
10 pages
Irjet V7i61094
No ratings yet
Irjet V7i61094
3 pages
109-110 Privacy Concerns in Big Data Processing
No ratings yet
109-110 Privacy Concerns in Big Data Processing
22 pages
Database Anonymization
No ratings yet
Database Anonymization
138 pages
AIDEAS Źródło Moduł 5 - Rozdzielenie Tożsamości I Jakości Obrazu W Anonimizacji Obrazów I Wideo
No ratings yet
AIDEAS Źródło Moduł 5 - Rozdzielenie Tożsamości I Jakości Obrazu W Anonimizacji Obrazów I Wideo
17 pages
Submitted By:: Prepared For
No ratings yet
Submitted By:: Prepared For
19 pages
Security and Privacy Challenges in Deep Learning
No ratings yet
Security and Privacy Challenges in Deep Learning
4 pages
AI Privacy Safety 2
No ratings yet
AI Privacy Safety 2
4 pages
Privacy-Preserving Large Language Models - Mechanisms, Applications, and Future Directions
No ratings yet
Privacy-Preserving Large Language Models - Mechanisms, Applications, and Future Directions
8 pages
TARP-VP Towards Evaluation of Transferred Adversarial Robustness and Privacy On Label Mapping Visual Prompting Models
No ratings yet
TARP-VP Towards Evaluation of Transferred Adversarial Robustness and Privacy On Label Mapping Visual Prompting Models
21 pages
A Survey On Differential Privacy For Unstructured Data Content
No ratings yet
A Survey On Differential Privacy For Unstructured Data Content
28 pages
Report
No ratings yet
Report
15 pages
Modeling Privacy Control in Context-Aware Systems: An Example
No ratings yet
Modeling Privacy Control in Context-Aware Systems: An Example
5 pages
Securing Autonomous Vehicles Visual Perception Adversarial Patch Attack and Defense Schemes With Experimental Validations
No ratings yet
Securing Autonomous Vehicles Visual Perception Adversarial Patch Attack and Defense Schemes With Experimental Validations
11 pages
Liu 2021
No ratings yet
Liu 2021
36 pages
Bootstrap 4
No ratings yet
Bootstrap 4
444 pages
Common Services Portal: Digital Gujarat Portal - Citizen User Manual
No ratings yet
Common Services Portal: Digital Gujarat Portal - Citizen User Manual
26 pages
Request For Amendment of SSS Web Employer Profile
100% (1)
Request For Amendment of SSS Web Employer Profile
1 page
Open Data Privacy
No ratings yet
Open Data Privacy
111 pages
1 s2.0 S092523122400434X
No ratings yet
1 s2.0 S092523122400434X
17 pages
PCCSA - Prepaway.premium - Exam.50q: Number: PCCSA Passing Score: 800 Time Limit: 120 Min File Version: 1.0
No ratings yet
PCCSA - Prepaway.premium - Exam.50q: Number: PCCSA Passing Score: 800 Time Limit: 120 Min File Version: 1.0
17 pages
CTS ASCII Interface Protocol
No ratings yet
CTS ASCII Interface Protocol
41 pages
Differential Privacy in Deep Learning: An Overview
No ratings yet
Differential Privacy in Deep Learning: An Overview
6 pages
Roberto Paper
No ratings yet
Roberto Paper
6 pages
Qbusiness API 3
No ratings yet
Qbusiness API 3
590 pages
Adversarial ML Survey Paper
No ratings yet
Adversarial ML Survey Paper
23 pages
VMware Vsphere: Troubleshooting
100% (3)
VMware Vsphere: Troubleshooting
134 pages
Research Paper 2
No ratings yet
Research Paper 2
9 pages
Lecture 1 of Artificial Intelligence: Produced by Qiangfu Zhao (Since 2008), All Rights Reserved © AI Lec01/1
No ratings yet
Lecture 1 of Artificial Intelligence: Produced by Qiangfu Zhao (Since 2008), All Rights Reserved © AI Lec01/1
27 pages
Automatic Gen-Set Controller: Data Sheet Multi-Line 2
No ratings yet
Automatic Gen-Set Controller: Data Sheet Multi-Line 2
11 pages
Cyber Crime, Cyber Security in India: Mohammed Amjadh A S
No ratings yet
Cyber Crime, Cyber Security in India: Mohammed Amjadh A S
16 pages
Mobile QoS Testing Solutions
No ratings yet
Mobile QoS Testing Solutions
2 pages
Unit 1. The Digital Age
No ratings yet
Unit 1. The Digital Age
63 pages
VIT Library Details
No ratings yet
VIT Library Details
5 pages
Top 40 CCNP Enterprise Interview Question and Answer
No ratings yet
Top 40 CCNP Enterprise Interview Question and Answer
6 pages
Applsci 12 06042
No ratings yet
Applsci 12 06042
19 pages
FortiGate 7.4 Administrator Course Description
0% (1)
FortiGate 7.4 Administrator Course Description
3 pages
JSON Web Token Vulnerabilities
No ratings yet
JSON Web Token Vulnerabilities
18 pages
SOCMED Notes
No ratings yet
SOCMED Notes
10 pages
VHL 3C
No ratings yet
VHL 3C
4 pages
Open Ai Prompts
No ratings yet
Open Ai Prompts
1 page
Presentational or Formatting Tags (MCQS)
No ratings yet
Presentational or Formatting Tags (MCQS)
7 pages
Zetta Onsite-Install Checklist - 2023
No ratings yet
Zetta Onsite-Install Checklist - 2023
2 pages
What Is A Digital Twin - MATLAB & Simulink
No ratings yet
What Is A Digital Twin - MATLAB & Simulink
5 pages
Call Center QA Independent Contractor Policy Manual
No ratings yet
Call Center QA Independent Contractor Policy Manual
4 pages
Thủ tục đã cấu hình trên OLT
No ratings yet
Thủ tục đã cấu hình trên OLT
2 pages
WTW Qi284 Iqsn Fieldbus Ysi
No ratings yet
WTW Qi284 Iqsn Fieldbus Ysi
62 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset

Uploaded by

Privacy-Preserving Deep Action Recognition: An Adversarial Learning Framework and A New Dataset

Uploaded by

1

Privacy-Preserving Deep Action Recognition:

Index Terms—Visual privacy, action recognition, privacy-preserving learning, adversarial learning.

S MART surveillance or smart home cameras, such as

GRL [48] is a state-of-the-art algorithm to solve such a

3.5 Addressing the ∀ Challenge by Privacy Budget

90 Our methods, in contrast, achieve a great balance between

Comparison of three methods. K -Beam is a state-of-the-art

40 Naive Downsample Ours-K-Beam

Empirical Obfuscation Ours-K-Beam+

70 [4] B. Heater, “Amazon’s camera-equipped echo look raises new

[5] T. Brewster, “Thousands of banned chinese surveillance cameras

SBU UCF101 HMDB51

Brush Hair Cartwheel

lauise lauise lauise lauise lauise

br jutch br jutch br jutch br jutch br jutch

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.