0% found this document useful (0 votes)

14 views14 pages

Deduplication 2023

Uploaded by

Rajan Subbarao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Deduplication 2023

Uploaded by

Rajan Subbarao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2022.3141521, IEEE
Transactions on Dependable and Secure Computing
1

VeriDedup: A Verifiable Cloud Data

Deduplication Scheme with Integrity and
Duplication Proof
Xixun Yu, Hui Bai, Zheng Yan, Senior Member, IEEE, Rui Zhang, Member, IEEE

Abstract—Data deduplication is a technique to eliminate duplicate data in order to save storage space and enlarge upload bandwidth,
which has been applied by cloud storage systems. However, a cloud storage provider (CSP) may tamper user data or cheat users to
pay unused storage for duplicate data that are only stored once. Although previous solutions adopt message-locked encryption along
with Proof of Retrievability (PoR) to check the integrity of deduplicated encrypted data, they ignore proving the correctness of
duplication check during data upload and require the same file to be derived into same verification tags, which suffers from brute-force
attacks and restricts users from flexibly creating their own individual verification tags. In this paper, we propose a verifiable
deduplication scheme called VeriDedup to address the above problems. It can guarantee the correctness of duplication check and
support flexible tag generation for integrity check over encrypted data deduplication in an integrative way. Concretely, we propose a
novel Tag-flexible Deduplication-supported Integrity Check Protocol (TDICP) based on Private Information Retrieval (PIR) by
introducing a novel verification tag called note set, which allows multiple users holding the same file to generate their individual
verification tags and still supports tag deduplication at the CSP. Furthermore, we make the first attempt to guarantee the correctness of
data duplication check by introducing a novel User Determined Duplication Check Protocol (UDDCP) based on Private Set Intersection
(PSI), which can resist a CSP from providing a fake duplication check result to users. Security analysis shows the correctness and
soundness of our scheme. Simulation studies based on real data show the efficacy and efficiency of our proposed scheme and its
significant advantages over prior arts.

Index Terms—Integrity Check, Duplication Check, Private Information Retrieval, Data Deduplication, Cloud Computing, Verifiable
Computation

1 I NTRODUCTION
LOUD computing has become a popular information technol- trusted CSP may modify, tamper or delete the uploaded data driven
C ogy service by providing huge amount of resources (e.g.,
storage and computing) to end users based on their demands.
by some profits. The damage of deduplicated data could cause
huge loss to all related users (e.g., data owners and holders). Thus,
Among all cloud computing services, cloud storage is the most the integrity of the data stored at the cloud should be verified,
popular. Since the volume of data in the world is increasing especially for duplicate data storage with deduplication.
rapidly, saving cloud storage becomes essential. One of the Several Proof of Retrievability (PoR) schemes [4]–[9] have
key reasons that causes storage waste is duplicate data storage. been proposed to address the issue of integrity check on cloud data
Multiple users may save same files or different files containing storage in recent decade. In such schemes, a user adds verification
same pieces of data blocks at the cloud. Obviously, duplicate data tags along with a file. During the verification, the user creates a
storage at the cloud introduces a big waste of storage resources. random challenge and sends it to the CSP, the CSP has to use
Data deduplication [1]–[3] provides a promising solution to this all the data in user’s corresponding files it stored as inputs to
issue. In a deduplication scheme, the CSP can cooperate with the compute a response back to the user. The user then checks the
cloud user to first check whether a pending uploaded file has been integrity of the stored file by verifying the response. However,
saved already or not, and then provide the user whose pieces of existing PoR solutions mainly aim to improve the performance at
file data are checked duplicate a way to access the file without the user side and assume that the CSP has infinite computation
storing another copy at the cloud. and storage resources. While, in practice, the CSP performs data
However, since the CSP cannot be fully trusted, the cloud users deduplication in order to achieve the most economic usage of
may suffer from some security and privacy issues. Notably, a semi- its storage. Unfortunately, existing solutions mentioned above are
incompatible with deduplication. This is because the verification
tags of these schemes are created with user individual private
• X. X. Yu and H. Bai is with the State Key Lab on Integrated Services
Networks, School of Cyber Engineering, Xidian University, Xi’an, 710071,
keys unknown to each other, thus different verification tags are
China. Email: xxyu, baih@stu.xidian.edu.cn. generated, given the same file held by different users. But these
• Z. Yan (corresponding author) is with the State Key Lab on Integrated verification tags cannot be deduplicated at the CSP as shown in
Services Networks, School of Cyber Engineering, Xidian University, Xi’an, Fig. 1(a).
710071, China and with the Department of Communications and Network-
ing, Aalto University, Espoo, 02150, Finland. Email: zyan@xidian.edu.cn Message-locked PoR [10], [11] provides a promising solution
• R. Zhang is with the Department of Computer and Information Sciences, to check data integrity when performing deduplication. It derives
University of Delaware, Newark, DE, 19716, USA. a same file into a same verification tag based on message-locked
Email: ruizhang@udel.edu.
encryption technique as shown in Fig. 1(b). However, such design

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TDSC.2022.3141521, IEEE
Transactions on Dependable and Secure Computing
2

TABLE 1
A game over duplication check between users and CSP

Duplication check The actual situation of Weather the result Reason

result of the CSP the CSP needs to be verified
Duplicated. Don’t CSP stores the file. If CSP pretends that it owns the file, when a user requests the file, the CSP is
No.
upload file. CSP has not stored the file. unable to provide it. This kind of cheat is easily figured out.
(cheat)
Unduplicated. Please CSP stores the file. (cheat) The user is cheated by CSP to upload the file that has already been stored and
Yes.
upload file. CSP has not stored the file. pays normal storage fee without a deserved discount due to deduplication.

new verification tag called note set in which each note is a

randomized bit sequence that is conform to a function f . The
note set is inserted into the files based on Private Information
Retrieval (PIR). TDICP allows the users to create their own indi-
vidual verification tags to check data integrity over the CSP with
(a) PoR schemes
deduplication compatibility. Meanwhile, the UDDCP explores
a new challenge and response mechanism based on Private Set
Intersection (PSI) to let the user instead of the CSP tell whether the
file is duplicate first, so that the CSP cannot cheat the user on the
result of duplication check during file upload. VeriDedup is built
upon our previous scheme [14], which offers such functionalities
as deduplication over ciphertext, Proof of Ownership (PoW) and
key assignment by employing proxy re-encryption (PRE). While,
in this paper, we focus on integrity check and duplication check
that are ignored in [14]. Thus, we assumed the functionalities of
(b) ML-PoR schemes
PoW and encrypted data deduplication are available and are not
the focus of this paper.
Fig. 1. Previous PoR schemes Specifically, the main contributes of this paper are summarized
as below:
• We propose a novel protocol named TDICP based on PIR
restricts the users from creating their own individual tags with
to check the integrity of uploaded files in the CSP with
their private keys. Practically, we expect an effective method that
deduplication employed. TDICP allows users to generate
can check data integrity with the support of deduplication where
their own individual verification tags for integrity check
each user can generate its own individual verification tags from its
while the verification tags can also be deduplicated at the
private key against brute-force attacks.
CSP although different.
Another security issue ignored by the previous literature is
• We propose another novel protocol named UDDCP to
the correctness guarantee of data duplication check provided by
guarantee the correctness of duplication check based on
the CSP. Several schemes [12], [13] motivate the CSP to perform
PSI, so that the CSP is impossible to cheat the user to pay
deduplication, but ignore that the CSP could cheat the users by
for unused storage space due to deduplication.
providing a fake duplication check result. The reason is simple
• We construct a novel deduplication scheme called VeriD-
since the CSP can gain an extra profit by asking the users to pay
edup that contains the above two novel protocols and other
normal storage fee without granting a deserved discount while
essential properties, such as PoW and data access key
performing deduplication to save storage space. As shown in
assignment by re-shaping our previous scheme in [14] in
Table 1, we illustrate four situations that the CSP deals with a
order to overcome its shortcomings regarding integrity and
duplication check about file storage. We find that a problem may
duplication proof.
happen in the third situation that the CSP actually has the file
• We prove the security of TDICP and UDDCP by construct-
tested duplicate but tells the user that it is not in order to let the
ing several games and conduct both theoretical analysis
user pay a normal storage fee without any discount, which should
and experimental simulation to evaluate their performance.
be offered due to deduplication and storage saving. By saving
Our results show their efficacy and efficiency.
extra storage space, the CSP can earn more by serving more users
with the same dishonest way. An effective mechanism should be The rest of the paper is structured as follows. Section 2 briefly
proposed to prevent the user from being cheated by the CSP in the reviews related work. Section 3 introduces the main techniques
phase of data duplication check. used in our scheme. Section 4 presents the system model, threat
In this paper, we propose a novel deduplication scheme called model, and design goals of VeriDedup. We present the detailed
VeriDedup to tackle the above two security issues in an integrative construction of VeriDedup that contains TDICP and UDDCP in
way. It contains a novel Tag-flexible Deduplication-supported Section 5. We prove the security of the above two novel protocols
Integrity Check Protocol (TDICP) and a novel User Determined in Section 6 and report the simulation based evaluation results in
Duplication Check Protocol (UDDCP). The TDICP explores a Section 7. Finally, we conclude this paper in Section 8.

2 R ELATED W ORK TABLE 2

Comparison with existing schemes
Our work is most related to the proof of retrievability (PoR)
solutions in cloud data deduplication [15]–[17]. Juels and Kaliski
Schemes UQ TDS TGF DCCG
[15] proposed a sentinel-based PoR scheme, in which a data [15] × × X ×
owner adopts Error Correcting Code (ECC) and inserts special [6], [7] X × X ×
blocks called sentinels. The sentinels are indistinguishable from [16], [18] X × X ×
encrypted blocks in an encrypted file. During integrity challenge, [21] X X × ×
a verifier asks a prover for those sentinels by disclosing their [24] X X × ×
positions to the verifier. Therefore, this solution supports a limited [5] X X × ×
number of PoR queries and after several times of queries, a data [10] X X × ×
owner has to download the whole file and insert new sentinels to VeriDedup X X X X

it. Ateniese et al. [6] proposed a scheme by defining the concept UQ: Unlimited queries; TDS: Tag deduplication support; TGF: Tag generation
flexibility; DCCG: Duplication check correctness guarantee; X: supported; ×:
of Provable Data Possession (PDP) based on homomorphic tags, non-supported
which is weaker than PoR in the way that it can verify that the CSP
possesses parts of the file (called blocks) but cannot guarantee that
the file is fully stored. Their scheme allows public verifiability,
the same verification tag. But multiple users holding the same file
which means that any third party can verify the integrity of the
stored at the cloud may create different tags as their willingness
files without disclosing any private information of the data owner.
for data integrity check, which improves integrity check security
However, the usage of homomorphic tags incurs high computation
by overcoming brute-force attacks, but impacts deduplication.
cost, which brings heavy computation burden to the data owner.
Table 2 compares our scheme with existing works in terms
Their later work [7] cooperates with an erasure code to help
of unlimited queries, tag deduplication support, tag generation
recover small corruptions. However, their solution suffers from
flexibility, and duplication check correctness guarantee. From
such an attack that CSP can selectively delete some of redundant
Table 2, we observe that existing works either cannot perform
blocks but still can succeed in providing valid proof to the data
deduplication on verification tags or do not allow the users to
owner.
flexibly create their own individual verification tags during dedu-
Much effort was then made to improve the performance of
plication. In particular, none of the existing schemes considers the
PoR schemes. Shacham and Waters [18] proposed a new solution
necessity of correctness guarantee on duplication check, which
based on their proposed concept of Compact PoR, which adopts
allows the CSP to cheat the users for gaining profits.
an erasure code and an authenticator with a BLS signature [19]
and Message Authentication Codes (MAC) [11]. However, the
computational complexity of generating the authenticator is high 3 P RELIMINARY
and the number of the authenticators is linear to the number of In this section, we introduce the main techniques used in VeriD-
blocks. Xu and Chang [16] proposed to enhance the scheme in [18] edup, including PRE, RSA-Private Set Intersection (RSA-PSI),
with an polynomial commitment [18] to reduce communication and PIR. PRE is applied to assign file keys to an authorized data
cost. Azraoui et al. [20] proposed a scheme called StealthGuard holder, RSA-PSI is applied to enable the data holder to first decide
by using PIR within Word Search (WS) technique to retrieve a whether a file is duplicate instead of the CSP, and PIR to enable the
witness of watchdogs (similar as tags) and allows an unlimited data holder to retrieve the note set without exposing the position
number of queries. Compared with other works, the generation of the set to the CSP.
of watchdogs is more lightweight than the generation of tags like
in [7], [18]. In addition, the overhead of storing the watchdogs 3.1 Proxy Re-Encryption (PRE)
is less than that of previous work. However, those works fail in
supporting deduplication over verification tags. A PRE scheme consists of five polynomial time algorithms:
Key generation(KG), Encryption(E), Re-encryption key genera-
The concept of message-locked proofs of retrievability was
tion(RG), Re-encryption(R) and Decryption(D):
then proposed to solve the above conflicts. Bellare et al. [21]
(KG, E, D) are the standard key generation, encryption and
formalized a new cryptographic primitive called Message-locked
decryption algorithms. Suppose we have two parties A and B .
Encryption (MLE) that subsumes convergent encryption [22], [23]
On input the security parameter 1k , KG outputs two public and
that derives the same data block to the same verification tag to al-
private key pairs (pkA , skA ) and (pkB , skB ). On input pkA and
low deduplication of all verification tags. Chen et al. [24] proposed
data M , E outputs a ciphertext CA = E(pkA , M ).
a secure data deduplication mechanism based on an improved
On input (pkA , skA , pkB ), the re-encryption key generation
MLE scheme to enable dual-level source-based deduplication of
algorithm RG outputs re-encryption key rkA→B for a proxy.
large files. Moreover, Zheng et al. [5] introduced a new proof of
On input rkA→B and ciphertext CA , the re-encryption func-
storage scheme with deduplication based on a publicly verifiable
tion R outputs R(rkA→B , CA ) = E(pkB , M ) = CB .
proof of data possession. In their scheme, users can verify the
On input CB and skB , the decryption algorithm D outputs the
correct storage of deduplicated data with the key of the first user
plaintext M = D(skB , CB ).
who actually uploads the file. However, this scheme has been
proved insecure under a weak key attack in [25] and it cannot
prevent the users from being cheated by the CSP. Vasilopoulos et 3.2 RSA-PSI
al. [10] proposed a scheme by transforming the existing PoR into a PSI [26]–[28] enables two parties to compute the intersection
form that is message-locked and integrating it with a deduplication of their inputs in a privacy-preserving way, such that only their
function. However, these works require to derive the same file into common inputs are revealed. A PSI scheme based on RSA blind

signature (RSA-PSI) consists of four main phases: base phase,

setup phase, online phase, and update phase:
Base phase: Suppose we have a client C and a server S . S and
C agree on the RSA public key (N, e) and the false positive rate
AA CSP
for the cuckoo filter CF [29]. S generates the RSA private key d,
C chooses Ncmax random numbers and calculates their inverses
as well as their modular exponentiation to the power e.
Setup phase: On input the set owned by S , S encrypts it using ……
its private key d and inserts the ciphertexts into the cuckoo filter Data Owner
and sends the CF to C .
Data Holders
Online phase: C first blinds its inputs with the encryption of
the respective random values and sends the resulting values to Fig. 2. System Model
S . S responds the result to C by encrypting the resulting values
using its private key d. Using the inverse of the respective random
values, C can then unblind the encrypted blinded values through as a data owner with regard to the same blocks. 2) CSP who
multiplications by applying the property of RSA that xed ≡ x provides a data storage service with deduplication to data holders.
mod N . C finally obtains the intersection by checking whether Only one data copy is stored at the CSP, which can be accessed by
the unblinded encrypted elements are in the CF that was sent by all data holders with authority. 3) Authenticated auditor (AA) who
S in the setup phase. serves as a third party to check data ownership, authorize data
Update phase: On inputs a new element ui to its input, S access and cooperate with other two types of entities aiming to
encrypts it using its private key d and decides an efficient option audit the whole procedure of data duplication check. The system
to insert it into CF and sends the updated CF to C . model of VeriDedup is shown in Fig. 2.

3.3 PIR
4.2 Threat Model
PIR [30], [31] enables a database user, or a client to obtain some
information from the database in a way that prevents the database We perform our research based on the following assumptions. We
from knowing which data was retrieved. Assume a dataset D is a assume that the data holder is honest. We assume the CSP is
X × Y matrix obtained by a server S and we have a client C , let semi-trusted. It may raise the following three security threats: 1)
l donate the index of column the client is interested in. In order Snooping the private data of the data holders; 2) Cheating the data
to execute a PIR request, a PIR scheme normally performs the holders by providing a wrong duplication check result in order to
following steps: ask a higher storage fee; 3) Causing data loss due to carelessness of
Setup phase: C generates a large number m as the order of a data maintenance. In VeriDedup, we focus on the last two issues
∗
group G, selects a random b ∈ ZM , where gcd(b, m) = 1, and since many existing solutions of the first issue can be found in
keeps b and m as a secret. the literature [5], [32]. Thus, we assume that the first issue has
Query phase: C generates a set (e0 , . . . , ei ) for each col- been solved, e.g., through data encryption. In addition, we assume
umn (x0 , . . . , xi ), which holds that for a random selected set AA and CSP do not collude. However, AA is semi-trusted, which
(a0 , . . . , ai ), if xil is one of queried columns, then eil = ail N r ; is curious about the data stored at the cloud, thus private data
otherwise, if xi is not one of the queried columns, then ei = should be kept away from AA. We assume data holders, CSP,
N l + ai N r . Meanwhile, it holds that all ei < m/(t(N − 1)). and AA communicate with each other through secure channels by
Then, C computes v = {vi |vi = bei mod m} and sends applying some security protocol (e.g., Open-Secure Sockets Layer
Req = {v, tag} to S . (SSL)). And all system parameters are shared with all related
Response phase: When receiving Req , S computes Resp = parties during system setup or initialization phase in a secure way.
v × D and sends it back to C .
Extraction phase: C computes Res = Resp ∗ b−1 mod m
and obtains the data of the queried column. 4.3 Design Goals
VeriDedup is a verifiable cloud data deduplication storage scheme
4 P ROBLEM STATEMENT with integrity and duplication Proof. It holds the following design
In this section, we describe the system model, the threat model goals:
and the design goals of VeriDedup.
• Independent integrity check when deduplication: VeriD-
edup allows the data holder to check the integrity of its
4.1 System Model files stored at the CSP without downloading the whole
VeriDedup offers grarantee on the correctness of duplication check files and interacting with the corresponding data owner.
and supports the integrity check of deduplicated encrypted data in • Flexible tag generation: VeriDedup allows each data
cloud storage. holder to create its own individual verification tags while
Our target system contains three types of entities: 1) Data still can perform data deduplication over those tags.
holder who owns data and saves its data that consists of multiple • Correctness guarantee of duplication check: VeriDedup
blocks at CSP. It is possible that a number of eligible data holders can assure the correctness of duplication check. Thus,
share the same encrypted data blocks in the CSP. In particular, the a semi-trusted CSP can never cheat the data holders to
data holder that first uploads the data blocks to the CSP is denoted upload any data that have already been stored by the CSP.

5 T HE P ROPOSED S CHEME System setup: On input the secret parameter λ, the data holder
In this section, we introduce VeriDedup that can realize both outputs a RSA key pair (e, d) under a large number N and AA
integrity check and duplication proof over encrypted cloud data initializes an empty cuckoo filter.
deduplication. Filter generation: On input the CSP maintained tag set {x},
AA outputs the cuckoo filter as follows: 1) CSP computes a =
H(x)d for each tag of its tag set; 2) AA verifies the number of
5.1 Overview involved tags, the signature of the tags, and the computation of the
VeriDedup follows the construction of our previous deduplication CSP; 3) AA inserts the set {a} into the cuckoo filter.
scheme [14] and improves it by using PSI and PIR to ensure both Check initialization: On input the secret parameter λ, the data
data integrity and duplication check correctness over encrypted holder outputs three coefficient set {r}, {rinv }, and {r0 } for its
data deduplication. Specifically, compared with previous work, maintained tag set {y} and computes the challenge A = H(y)∗r0
we introduce a PSI based challenge and response mechanism to for all y .
the duplication check procedure in order to let the data holder first Response computation: On input the challenge set {A}, CSP
tell whether the uploaded blocks are duplicate or not instead of the computes C = Ad mod m and responds {C} to the data holder.
CSP. In addition, we employ AA to verify the computations of the Duplication check: On input the response set {C}, the data
CSP during the duplication check, so that the CSP cannot cheat holder outputs the duplicate tags as follows: 1) validate the com-
the users to upload data blocks that have been stored already. putation of the CSP on {C}; 2) compute all C ×rinv mod N and
Furthermore, we propose a note insertion mechanism based on check them in the cuckoo filter to find intersections as the duplicate
PIR to let the data holder insert a specific set (called note set) tags. The data holder confirms the duplicate files corresponding to
that contains several randomized bit sequences, which conform to the duplicate tags.
a hidden function, as verification tags into the encrypted blocks Filter update: On input the update tag set {y 0 }, AA updates the
of a uploaded file. The data owners/holders who are proved to cuckoo filter as follows: 1) CSP computes a0 = H(y 0 )d for each
have the ownership of the corresponding blocks can verify the y 0 in the update tag set; 2) AA verifies the number of involved
integrity of the uploaded blocks through a challenge on whether tags, the signature of the tags, and the computation of the CSP; 3)
the notes are conform to the hidden function. Attention need be AA inserts the set {a0 } into the cuckoo filter.
given that the verification tags generated by multiple data holders
with various notes can also be deduplicated in VeriDedup, so that 5.4 VeriDedup Construction
the CSP will no longer be required to maintain multiple pieces of VeriDedup contains the following main procedures: System setup,
verification tags from the same block of different data holders for Data preprocessing and Duplication Check, Note set insertion
integrity check, which reduces storage consumption of performing and Data Upload, Data integrity check, and Data Download. The
deduplication. In what follows, we first introduce the two proposed details of the scheme are elaborated as follows:
novel protocols (i.e., TDICP and UDDCP) and then detail the
whole construction of VeriDedup. 5.4.1 System Setup
Assuming that e : G1 × G1 → GT is a bilinear map where G1 ,
5.2 TDICP Design Brief GT are two groups of prime order q , the system parameters are
The protocol TDICP contains the following main procedures: random generators g ∈ G1 and Z = e(g, g) ∈ GT .
System setup, Note generation and insertion, Check Initialization, During system setup, each data holder uw generates skw =
Response computation, and Integrity check. aw and pkw = g aw for PRE, where aw ∈ Zp∗ . The public key
System setup: On input the security parameter λ, AA outputs pkw is used to generate the re-encryption key at AA for uw . let
a hidden function f which is then applied for note generation. Eq(a, b) be an elliptic curve over GF (q), P ∗ be a base point
Note generation and insertion: On input the hidden function shared among system entities, sw ∈R {0, . . . , 2σ − 1} be the
f and the secret keys of a data holder, the data holder outputs Eillptic Curve secret key of data holder uw and Vw = −sw P ∗
a randomized note set S and a position set P according to be the corresponding public key and σ be a security parameter.
the uploaded blocks of its file and inserts the note set into the The keys (pkw , skw ) and (Vw , sw ) of uw are bound to a unique
corresponding positions of the encrypted blocks. identifier of the data holder, which can be a pseudonym that is
Check initialization: On input the check indexes of the blocks, crucial for the verification of user identity.
the data holder outputs a coefficient set e and computes the AA generates a hidden function f , as a consensus that all the
challenge set v = b ∗ e mod m, where gcd(m, b) = 1. data holders will later use to create their unique note sets Sw,i
Response computation: On input the challenge set v , CSP and broadcasts f among all data holders. Note that, f can be an
outputs the response Resp = v × D. arbitrary function chosen depending on the security level required
Integrity check: On input the response Resp, the data holder by the data holders. Furthermore, AA generates pkAA and skAA
outputs the check result by computing Res = Resp∗b−1 mod m for PRE and broadcast pkAA to the data holders.
to pick out the note set and validating whether these notes conform CSP initializes a RSA algorithm with a public and secret key
to the hidden function f . If the verification passes, the data holder pair (e, d) under the module N , The key pair is used to encode
confirms the integrity of the stored file. the uploaded tags of the data holders and the CSP for duplication
check.

5.3 UDDCP Design Brief 5.4.2 Data Preprocessing and Duplication Check
The protocol UDDCP contains the following main procedures: Suppose that two data holders u1 and u2 want to upload their data
System setup, Filter generation, Check initialization, Response files F1 and F2 to the CSP. Let u1 the first to upload the file, it
computation, Duplication check, and Filter update. performs the data preprocessing and duplication check as follows:

𝑢2 CSP
AA

Input: 𝑌 = 𝑥2,𝑖 ，𝑖 = 1, … , 𝑁𝑐 Input: The current number of the files on the server Δ, Input: 𝑋 = 𝑥𝑗 ，𝑆𝑖𝑔𝑛 𝐻 𝑥𝑗 ，𝑗 = 1, … , 𝑁𝑠 ，
Output: 𝑆 = 𝑌 ∩ 𝑋 the users’ public keys where 𝑆𝑖𝑔𝑛 𝐻 𝑥𝑗 = 𝑆𝐼𝐺𝑆𝐾𝑐 (𝐻 𝑥𝑗 )
𝑗
Output: The updated number of the files Δ‘ on the
server Output: ⊥
System setup:
System setup:
Initialize RSA parameter (e, 𝑑, 𝑁) and
Initialize the cuckoo filter broadcast 𝑁 𝑎𝑛𝑑 𝑒
Verify Δ =? 𝑁𝑠 𝑎𝑗 , 𝐻(𝑥𝑗 ),𝑆𝑖𝑔𝑛(𝐻(𝑥𝑗 ))， 𝑎𝑗 = 𝐻(𝑥𝑗 )𝑑 𝑚𝑜𝑑 𝑁
For 𝑖 = 1, … , 𝑁𝑠 𝑗 = 1, … , 𝑁𝑠
? −1
Verify 𝐻 𝑥𝑗 =
𝑆𝐼𝐺𝑃𝐾𝑐𝑗
𝑆𝑖𝑔𝑛 𝐻 𝑥𝑗
Generate 𝑁𝑣 non-overlap subset of {𝑎 𝑗 , 𝐻 𝑥𝑗 },
In each subset,
Verify ∏𝐻(𝑥𝑣𝑗 ) =? (∏𝑎 𝑣𝑗 )𝑒
Generate random numbers 𝑟2,1 , … , For 𝑖 = 1 to 𝑁𝑠 : CF=CF.Insert(𝑎𝑗 )
𝑟2,𝑁𝑐𝑚𝑎𝑥 𝐶𝐹
𝑖𝑛𝑣
For 𝑖 = 1 𝑡𝑜 𝑁𝑐𝑚𝑎𝑥 : 𝑟2,𝑖 =
−1 ′
𝑟2,𝑖 𝑚𝑜𝑑 𝑁 𝑟2,𝑖 = (𝑟2,𝑖 )𝑒 𝑚𝑜𝑑 𝑁
For 𝑖 = 1 𝑡𝑜 𝑁𝑐 : 𝐴 2, 𝑖 ，𝑖 = 1, … , 𝑁𝑐
′
𝐴 2, 𝑖 = 𝐻(𝑥2,𝑖 ) ∙ 𝑟2,𝑖 𝑚𝑜𝑑 𝑁
𝐹𝑜𝑟 𝑖 = 1 𝑡𝑜 𝑁𝑐 :
Generate 𝑁𝑣′ non-overlap subset of 𝐶 2, 𝑖 ，𝑖 = 1, … , 𝑁𝑐 𝐶 2, 𝑖 = (𝐴[2, 𝑖])𝑑 𝑚𝑜𝑑 𝑁
{𝐴 2, 𝑖 , 𝐶 2, 𝑖 },
In each subset,
Verify ∏𝐴[2, 𝑖] =? ∏𝐶[2, 𝑖]𝑒
If 𝐶𝐹. 𝐶ℎ𝑒𝑐𝑘(𝐶[2, 𝑖] ∙ 𝑟𝑖𝑖𝑛𝑣 𝑚𝑜𝑑 𝑁):
put 𝑥2,𝑖 into 𝑆
Output 𝑆.

Fig. 3. Procedures of UDDCP

0
Step 1: On input F1 and the symmetric key DEK1 . u1 and then computes A[1, i] = H(y1,i ) · r1,i mod N , where
perform the following computations: 1) Divide F1 into several i = 1, . . . , Nc , and sends them to the CSP. The CSP then com-
splits where each split contains m blocks. In order to protect the putes C[1, i] = (A[1, i])d mod N as a response to u1 . u1 then
0
file from small corruptions, adopt Error Correcting Code (ECC) randomly creates N1,v non-overlap subset of {C[1,Q i], A[1, i]}
C[1, i] = ( A[1, i])e
Q
to extend m blocks to m + d − 1 blocks, which can correct up to and in each subset verifies whether
d
2 errors with an efficient [m + d − 1, m, d] ECC, such as Reed- holds to prove the correctness of the CSP computations and finally
Solomon code [33] and obtain a set of blocks {B1,i }. 2) For each checks duplication with the cuckoo filter CF using algorithm
block B1,i , u1 generates a block tag y1,i = H(H(B1,i ) × P ∗ ). CF.check(C[1, i]·r1,i inv
mod N ) to confirm the duplicate blocks.
3) Send the set of tags {y1,i } to the CSP. The whole protocol is shown in Fig. 3.
Step 2: Suppose that the CSP maintains a tag set {xj } gathered
form previous data owners, the CSP interacts with AA and u1 to 5.4.3 Note Set Insertion and Data Upload
perform a duplication check according to the following procedure. Let u2 the second to upload the file that obtains the same pieces
1) For all xj , the CSP generates aj = H(xj )d mod N , and of blocks as u1 , u1 and u2 perform the note insertion and data
sends {aj , H(xj ), sign(H(xj ))}, ∆, where j = 1, . . . , Ns , to upload as follows:
the AA, where sign(H(xj )) is the signature of H(xj ) signed by Step 1: Since u1 is the first to upload a new file that has
the original data owner of the tag xj and ∆ is the total number not been stored by the CSP before, i.e., the duplication check is
of the tags held by the CSP. 2) Receiving what the CSP sends, negative, it is served as a data owner and is required to upload
AA first verifies whether {aj , H(xj ), sign(H(xj ))} contains its corresponding blocks {B1,i }. Assume the ith block B1,i is
∆ elements to guarantee that CSP uses all its maintained tags uploaded, u1 first encrypts B1,i with DEK1 to get CT1,i , which
to perform computations. If it holds, it secondly verifies all the is stored as a X×Y matrix, and encrypts DEK1 with pkAA to get
signatures on H(xj ), which ensures that the CSP indeed uses the CK1 . Let S1,i = {η1,i,0 , . . . , η1,i,k |f (η1,i,0 , . . . , η1,i,k ) = 0}
tags uploaded from the previous data owners. Thirdly, AA will be a note set that conforms to the hidden function f . According
further validate the correctness of the CSP computations on all xj to the PIR algorithm, let B1,i be a seed, u1 shuffles the column
using a batch verification by randomly creating Nv non-overlap index [1, . . . , X] and selects the first r columns as the ones to
subset of {aj , Q
H(xj )} and in each subset verifying whether insert the notes. Thus, in each column, c = dk/re notes are
H(xj ) = ( aj )e holds. If all the verification passes, the
Q
required to be inserted. Furthermore, in each selected column,
AA assumes that the CSP computations are correct and creates a u1 further shuffles the row index [1, . . . , Y ] and decides the first
cuckoo filter CF as input of {aj }, i.e., CF = CF.Insert({aj }) c indexes as the final positions to insert the notes. Denote the
as a response to u1 . Note that, this procedure is only executed once position indexes as P1,i = {p1,i,1 , . . . , p1,i,k }, u1 then inserts
during the system setup, if another data holder requires to upload all the notes {η1,i,k } into CT1,i according to the position indexes
new files to the CSP, the CSP will cooperate with AA to update P1,i to obtain CKI1,i and sends CKI1,i and CK1 to CSP along
the cuckoo filter, there is no need to re-calculate the parameters with pki . At the same time, u1 also uploads tags of the new
of previous data owners mentioned above. 3) For all y1,i , u1 first blocks {y1,i } for further duplication check. On receiving a new
inv
selects random numbers r1,1 , . . . , r1,Nc and computes r1,i = block tagSy1,i , the CSP first adds them to its maintained tag set
−1 0
e
r1,i mod N and r1,i = r1,i mod N for all i ∈ [1, . . . , Nc ] xj = xj y1,i and then computes ai = H(y1,i )d mod N , and

𝑢1 CSP 𝑢2

System initialize:
A public function 𝑓 used to create hidden
parameters for data owner and N for
module computation
When 𝐵2,𝑖′ is duplicated:
Generate the position set 𝑃2,𝑖′ =
Generate the position set 𝑃1,𝑖 =
𝑝2,𝑖′,1 , … , 𝑝2,𝑖′,𝑘 correlated to 𝐵2,𝑖′
𝑝1,𝑖,1 , … , 𝑝1,𝑖,𝑘 correlated to 𝐵1,𝑖 , 𝑆1,𝑖 = {𝐶𝑇𝐼1,𝑖 , 𝑡𝑎𝑔1,𝑖 }
Store the data with its
𝜂1,𝑖,0 , … , 𝜂1,𝑖,𝑘 𝑓 𝜂1,𝑖,1 , … , 𝜂1,𝑖,𝑘 = 0} corresponding tags {𝐶𝑇𝐼1,𝑖 , 𝑡𝑎𝑔1,𝑖 }
and insert the 𝑆1,𝑖 into the encrypted
block 𝐶𝑇1,𝑖 = 𝑒𝑛𝑐{𝐵1,𝑖 } based on 𝑃1,𝑖 to
obtain 𝐶𝑇𝐼1,𝑖 , and the block tag 𝑡𝑎𝑔1,𝑖 =
𝐻(𝐻(𝐵1,𝑖 ) × 𝑃 ∗).
Integrity check:
Generate large number 𝑚 as the order and
∗
𝑏 ∈ 𝑍𝑚 where gcd 𝑏, 𝑚 = 1, generate
secret coefficient set {𝑒1,𝑖,1 , … , 𝑒1,𝑖,z } and
computes 𝑣1,𝑖 = {𝑣1,𝑖,𝑙 |𝑣1,𝑖,𝑙 = 𝑅𝑒𝑞1,𝑖 = {𝑣1,𝑖 , 𝑡𝑎𝑔1,𝑖 }
𝑏𝑒1,𝑖,𝑙 𝑚𝑜𝑑 𝑚, 𝑙 = 1, … , 𝑧} Integrity check:
Compute 𝑅𝑒𝑠1,𝑖 = 𝑅𝑒𝑠𝑝1,𝑖 ∗ Compute 𝑅𝑒𝑠𝑝1,𝑖 = 𝑣1,𝑖 × 𝐷 Generate large number 𝑚′ as the order and 𝑏 ∈
𝑅𝑒𝑠𝑝1,𝑖 ∗
𝑏 −1 𝑚𝑜𝑑 𝑚 to obtain the queried 𝑍𝑚′ where gcd 𝑏′, 𝑚′ = 1, generate secret
𝑅𝑒𝑠𝑝2,𝑖 ′ coefficient set {𝑒2,𝑖′,1 , … , 𝑒2,𝑖′,𝑧 } and computes
column and pick up 𝜂′1,𝑖,0 , … , 𝜂′1,𝑖,𝑘 based
on 𝑃1,𝑖 . Then verify whether 𝑣2,𝑖′ = {𝑣2,𝑖′,𝑙 |𝑣2,𝑖′,𝑙 = 𝑏𝑒2,𝑖′,𝑙 𝑚𝑜𝑑 𝑚, 𝑙 = 1, … , 𝑧}
Compute 𝑅𝑒𝑠𝑝2,𝑖′ = 𝑣2,𝑖′ × 𝐷
𝑓 𝜂′1,𝑖,1 , … , 𝜂′1,𝑖,𝑘 = 0 𝑅𝑒𝑞2 = {𝑣2,𝑖′ , 𝑡𝑎𝑔2,𝑖′ } Compute 𝑅𝑒𝑠2,𝑖′ = 𝑅𝑒𝑠𝑝2,𝑖′ ∗ 𝑏′−1 𝑚𝑜𝑑 𝑚 to
obtain the queried column and pick up
𝜂′2,𝑖′,0 , … , 𝜂′2,𝑖′,𝑘 based on 𝑃2,𝑖′ . Then verify
whether 𝑓 𝜂 ′ 2,𝑖′,1 , … , 𝜂 ′ 2,𝑖′,𝑘 = 0, the same
equation holds, since 𝑃1,𝑖 = 𝑃2,𝑖′ .

Fig. 4. Procedures of TDICP

sends {ai , H(y1,i ), sign(H(y1,i ))} to AA. AA then first checks of a single block B1,i stored at CSP, it first initializes a large
00 ∗
the signatures on H(y1,i ), and further randomly creates N1,v non- number m and b ∈ Zm , where gcd(b, m) = 1, as a secret.
overlap subset of {{a 1,i }, {H(y1,i )}} and in each subset verifies According to the position indexes P1,i , it then generates a set
H(y1,i ) = ( a1,i )e . If the verification passes, AA
Q Q
whether (e1,i,0 , . . . , e1,i,z ) for each column (x1,i,0 , . . . , x1,i,z ) with ran-
assumes that the CSP computation is correct and updates the dom selected (d1,i,0 , . . . , d1,i,z ), where if x1,i,l ∈ P1,i , then
cuckoo filter CF using CF = CF.Insert({a1,i }), which will e1,i,l = d1,i,l N r ; otherwise, if x1,i,l ∈ / P1,i , then e1,i,l =
be used in the next duplication check round. If the duplication N l +d1,i,l N r . Meanwhile, it holds that all e1,i,z < m/(t(N −1))
check is positive and the pre-stored blocks are from the same data for some choice of l < r and d1,i,l . Finally, it computes
holder, the data holder will inform the CSP to do nothing but v1,i = {v1,i,l |v1,i,l = be1,i,l mod m, l ∈ [1, . . . , z]} and sends
maintain its blocks. If the blocks are from a different data holder, Req1,i = {v1,i , tag1,i } to the CSP. Receiving Req1,i , the CSP
it will inform the CSP to perform deduplication. computes Resq1,i = v1,i ×B1,i as a response and sends it back to
Step 2: Informed the duplication from a different user u2 , u1 . u1 then computes Res1,i = Resp1,i × b−1 mod m to obtain
the CSP first checks the ownership of the blocks by passing the the queried columns and then pick out the notes according to the
ownership verification tasks to the AA, which will challenge the position indexes P1,i to check whether the notes are conform to
data holder u2 on whether it is the real party who possesses the the hidden function. Similarly, when user u2 challenges the CSP,
data blocks B2,i0 = B1,i . We introduce an ownership verification it generates its unique (m0 , b0 ) as a secret and also its unique
protocol based on a cryptoGPS identification scheme [34]. In the (d2,i0 ,0 , . . . , d2,i0 ,z ) to generate other (e2,i0 ,0 , . . . , e2,i0 ,z ) and its
protocol, AA first randomly chooses c ∈R {0, . . . , 2σ − 1} and further Req2,i0 = (v2,i0 , tag2,i0 ). In cooperation with the CSP, u2
challenges u2 by c. u2 computes h = H(B2,i0 ) + (s2 × c) as can also obtain the note set based on the position indexes P2,i0 to
a response along with V2 to AA. AA will computes H(hP ∗ + check whether the notes are conform to the hidden function.
cV2 ) and compares it with tag y1,i . If the verification passes, i.e.
y1,i = H(hP ∗ + cV2 ), AA confirms that u2 has the duplicated Furthermore, suppose that u1 and u2 shares a same duplicated
blocks B2,i0 = B1,i and generates re-encryption key rkAA→uj = block {Bi∗ } and u1 has its unique block {Bi1 } and u2 has {Bi2 }.
RG(pkAA ; skAA ; P K2 ) and sends it to CSP. CSP then transfers For B1,i ∈ {Bi1 }, u1 verifies f (η1,i,0 , . . . , η1,i,k ) = 0 to check
CK1 to CK2 by computing R(rkAA→u2 ; E(pkAA ; DEK1 )) = the integrity of B1,i as well as for B2,i0 ∈ {Bi2 }, u2 verifies
E(pk2 ; DEK1 ) for u2 . f (η2,i0 ,0 , . . . , η2,i0 ,k ) = 0. For B∗,i ⊆ {Bi∗ }, although u2 is
At this moment, both u1 and u2 can access the same data unaware of the exact inserted notes of u1 , since they both share
blocks B1,i (B2,i0 ) stored at the CSP and use its corresponding the same hidden function f and P1,i = P2,i0 , they all can verify
CT I1,i (CT I2,i0 ) to perform the below integrity check. Note that, that f (η∗,i,0 , . . . , η∗,i,k ) = 0 to check the integrity of B∗,i .
each B1,i is only correlated with single CT I1,i , i.e. CT I1,i = Therefore, we not only deduplicate the same block uploaded to
CT I2,i0 . the CSP, but also take a further step to deduplicate the verification
tags of duplicated blocks generated by multiple data holders. Note
5.4.4 Data Integrity Check that since we apply ECC to help recovering the files, there is no
Assume that data owner u1 wants to upload a block set {B1,i } need to perform integrity check over all blocks. If u1 and u2 can
and data owner u2 wants to upload a block sets {B2,i0 }. Re- succeed in performing above γ times random verification in all its
gardless of deduplication, when user u1 challenges the integrity corresponding block sets, our protocol guarantees the integrity of

F1 and F2 . The whole procedure is shown in Fig. 4. During the Integrity check phase, the data holder computes as
follows:
5.4.5 Data Download Resp ∗ b−1 mod m = (v × D) ∗ b−1 mod m
When u1 wants to download F1 . It sends a request and the = (be × D) ∗ b−1 mod m
file name to the CSP. Upon receiving the request, the CSP
= e × D mod m
first checks if u1 has the authorization to download the file.
If passed, CSP returns the corresponding block sets {CT I1,i } Since e =(e1 , . . . , et ) and 
t = x,
to u1 . u1 then extracts all the notes according to the position d11 · · · d1y
indexes SP1,i on each block to get the ciphertexts {CT1,i } = D =  ... .. .. 

∗ . . 
{CTi1 } {CT S i } and decrypts Seach CT1,i using DEK1 directly dx1 · · · dxy
to obtain {B1,i }S= {Bi∗ } {Bi1 }. Owing to ECC, u1 can then,
recover F1 from {B1,i } with errors no more than d2 . As
x x x
for u2 , after following
S the same steps to obtain the ciphertexts
X X X
e × D mod m = ( ei di1 , ei di2 , · · · , ei diy ) mod m
{CT2,i } = {CTi∗ } {CTi2 }, it also receives a re-encrypted
i=1 i=1 i=1
DEK1 key D(sk2 ; E(pk2 ; DEK1 )) from the CSP. u2 can then Xx x
X x
X
obtain the key DEK1 using its key pair (pk2 , sk2 ) and decrypt =( ei di1 , ei di2 , · · · , ei diy )
each CTi∗ to get the duplication original blocks {Bi∗ } and its i=1 i=1 i=1
unique original blocks {Bi2S} by directly usingSDEK2 . Finally, it r l
can obtain the original file {B2,i } = {Bi∗ } {Bi2 } and recover Px ei is the P
When queried column, ei = N + aP
x l r
l N , we have
x
F2 using ECC. P ei dij = i=1 (N + al N )dij , then i=1 ei dij mod
i=1
N r = xi=1 N l dij
r
Otherwise,
Px ei = aPk N , we have
x r
Px
5.5 Further Discussion i=1 ei dij = i=1 (ak N )dij , then i=1 ei dij mod
Nr = 0
We recognize the fact that the CSP is likely to increase its income
with massive amounts of computation/storage from deduplication. PxAssume that Pr ir isl the queried column, it holds that
e d
i=1 i ij = i=1 N dir j = (dir j )N
In this case, confirming deduplication happened already at the Above all, all the elements in the queried ir th column are
CSP to get an offer of low storage charge becomes essential, obtained.
our paper aims to solve this issue. For motivating the adoption
of our scheme, in another line of our work, we study how to
6.2 Soundness of TDICP
make all related stakeholders to accept and use deduplication
schemes by applying game theory to design proper incentive or Then, we further prove the soundness of TDICP by introducing
punishment mechanisms in three cases: client-controlled dedupli- the following game.
cation [35], [36], server-controlled deduplication [12] and hybrid Assume there is an adversary A that corrupts on average ρadv
deduplication [13]. Since our scheme design is built upon the one blocks of an outsourced file, and succeed in the soundness game
in [14], belonging to server-controlled deduplication, the incentive of the proposed protocol with the probability of δ . In the following
mechanism [12] suitable for the server-controlled deduplication proof, we show that if the query times γ exceeds a threshold γneg ,
schemes can be applied to motivate scheme adoption. Moreover, our protocol can recover the whole file with a probability of more
linking a trust value to each CSP can help the users to choose a than 1 − 2nτ , where τ is the security parameter, when there exists
trustworthy CSP. an adversary A that can succeed in the soundness game with the
probability δ ≥ δneg = 21τ .
Remind that n is the length of the notes and s is the number
6 S ECURITY A NALYSIS of the notes in a note set, We first quantify δ with respect to the
parameter ρadv . In order to succeed in the soundness game, the
In this section, we prove the correctness and the soundness of adversary A can perform under the following two conditions. 1)
TDICP. Correctness means that the integrity check algorithm can it does not corrupt any note; 2) it corrupts some of the notes,
correctly extract a queried column and soundness means that but can still provide valid notes that conform to the hidden
the original file can be recovered if the corresponding TDICP function. Therefore, we define the probability that the adversary
integrity check passes. We also prove the soundness and privacy A can succeed in the soundness game with respect to ρadv as:
of UDDCP, and omit the proof of correctness, since it is obvious. A
ρ = P(Success,i) = (1 − ρadv ) + ρ2adv
ns .
Soundness means that the CSP cannot provide fake computation In TDICP, the integrity check requires the adversary A to
results during the whole procedure, privacy means that none of the response γ valid note sets to succeed in the soundness game,
information of both the CSP and the data holder are leaked to the therefore,
other except for the intersection, and correctness means that the
data holder can correctly pick up all the intersection of its tag set γ
X γρadv (1 − ρadv )γ−1 1
A
and the CSP’s tag set. δ= P(Success,i) = (1−ρadv )γ + ns
+ o( ns )
i=1 | 2 {z 2 }
ξ

6.1 Correctness of TDICP

Note that if ns is large enough, i.e., ns = 128, ξ will then be
We first prove the correctness of TDICP on extracting the queried negligible. We can simplify the above equation that if ns ≥ 128,
column where the notes are inserted based on the PIR algorithm. δ ≈ (1 − ρadv )γ .

We then define a threshold ρneg with respect to ρadv that 6.4 Soundness of UDDCP
if ρadv < ρneg , the probability of our protocol that fails in We prove the soundness of UDDCP by illustrating how it can
recovering the blocks is negligible. solve all potential cheats the CSP can perform, including 1) the
Since TDICP adopts ECC and can recover ρD = d2 errors, CSP may provide unauthorized tags that are not from previous
then for each block, if there exists more than corrupted d2 errors, data holders or delete some stored tags driven by some profits; 2)
σ
our protocol fails in recovering the blocks. Let P(F ail,i) be the the CSP may provide wrong computation results of aj or C[i] to
d
probability that a block has more than 2 errors. According to the AA or the data holder.
σ
Chernoff bounds, we can bound P(F ail,i) as: In UDDCP, the first cheat can be tested, since we employ AA
to verify all the signatures and record the number of the CSP’s
ρadv D ρ 2 tag set. Unauthorized tags created by the CSP are easily found out
σ
P(F ail,i) ≤ exp(− (1 − ) ) and the CSP is audited to provide all the tags from previous data
3 ρadv
owners. The second
Q cheat canQalso be tested, since we let AA to
σ σ 1
Let P(F ail,i) be negligible, i.e., P(F ail,i) < 2τ , then verify whether H(xj ) = ( aj )e holds, which can be proved
ρadv D ρ 2 1 correct according to the multiplication homomorphism of RSA.
exp(− 3 (1 − ρadv ) ) < 2τ . We derive ρneg as the bound of
ρadv : Wrong computations of any aj or C[i] can be detected by the
AA.
ρ 2 3 ln(2)τ
(1 − ) ρneg = and ρneg < ρ
ρneg D 7 P ERFORMANCE A NALYSIS AND E VALUATION
Next, we define a threshold γneg for the query time γ that if an In this section, we perform theoretical analysis, conduct simulation
adversary A corrupts more than ρneg fraction of the blocks, it will based evaluation on VeriDedup, and compare its performance with
be detected by our protocol with an overwhelming probability. In related previous works. In addition
other words, if γ > γneg and ρadv > ρneg , then the probability
of the adversary A to succeed in the soundness game is negligible. 7.1 Evaluation Metrics and Experimental Settings
Then, 7.1.1 Evaluation Metrics
We applied five metrics in our simulation studies to evaluate
1
δ = (1 − ρadv )γ ≤ (1 − ρadv )γneg ≤ δneg = TDICP, including (1) the data owner’s computational complexity
2τ for creating and inserting the note set; (2) the data holder’s
According to the equation ln x ≤ x − 1, when ρadv > ρneg : storage overhead for extra data storage in integrity check; (3) the
data holder’s computational complexity for challenging CSP and
ln(2)τ − ln(2)τ − ln(2)τ retrieving the inserted note set for verification; (4) CSP compu-
γneg = d e≤ ≤ tational complexity for responding the challenge from the data
ρneg ln(1 − ρneg ) ln(1 − ρadv )
holder; (5) Data holder-CSP communication cost for transferring
Finally, we define the probability of a file to be recovered. extra data in integrity check. The communication cost of AA to
Since if there exists one block Qfailing to be recovered, the whole broadcast the hidden function f is omitted, since it is a one-time
file fails to be recovered. Let
Q F ail Pbe the probability that the file cost regardless with integrity check interactions.
n
fails to be recovered, then F ail ≤ i=1 P(F ail,i) . If we assume Meanwhile, we used six metrics in our simulation studies to
the probability of the files that fails to be recovered is negligible, evaluate UDDCP, including (1) the data holder’s computational
1
i.e., P(F ail,i) ≤ 2τ . The probability of the files to be successfully complexity for initializing duplication check; (2) CSP’s computa-
recovered is: tional complexity for preprocessing its tag set and responding the
challenge from data holders; (3) AA’s computational complexity

Y
Y n for verifying CSP computation and setting up the cuckoo filter;
=1− ≥1− (4) the data holder’s computational complexity for confirming
2τ
Success F ail duplicate blocks; (5) the communication cost from CSP to AA for
constructing the cuckoo filter; (6) the communication cost between
the data holder and CSP for transferring extra data in duplication
6.3 Privacy of UDDCP
check. The communication cost from AA to the data holder for
We further prove the privacy of UDDCP based on the irreversibil- transferred the cuckoo filter is omitted, since it depends on the
ity of the cuckoo filter. concrete type of the cuckoo filter.
In UDDCP, the data holder is private, which leaks no infor- We can find several previous schemes [7], [15], [16], [18],
mation to the CSP about its private inputs. Since the data holder [20] with the aspect of integrity check. These schemes focus on
selects all values uniformly and at random, i.e., {r1 , · · · , rNc } ← integrity check in various scenarios. We found that StealthGuard
0
Zn∗ , thus, riinv and ri are all random sequences. The data holder [20] is the only one that targets on the same integrity check issue
0
masks its inputs A[i] to the CSP with random values ri , so that over data deduplication as ours. Thus, we chose StealthGuard as
CSP cannot obtain any other H(yj ) of the data holder except for a baseline scheme and compare its simulation performance with
the intersection. The CSP is private which leaks no information ours in terms of integrity check. Meanwhile, to the best of our
to the data holder since we introduce a cuckoo filter to store knowledge, our scheme is the first to consider the issue on proving
the computation results ai in filter generation phase. Due to the the correctness of duplication check. Thus, in what follows, we
irreversibility of the filter, the data holder cannot obtain any other provide the evaluation result of UDDCP without comparison with
H(xi ) except for the intersection. other previous work.

7.1.2 Experimental Settings whole integrity check by inserting the initial notes into each
We implemented our scheme in Python and tested it on a desktop block. Assume that the data owner would like to insert k pairs
equipped with an i5 CPU, 8GB RAM, and 64-bit Win10 OS. We notes into one block, it performs 4k pseudo-random permutation
chose SHA-256 for the cryptographic hash function and 1024- (PRP) operations to decide the position Pi and 2k pseudo-random
bit RSA for digital signature. We employed the Crypto library to function (PRF) operations and k HASH to derive all the notes.
realize Advanced Encryption Standard (AES) encryption. We ap- Therefore, in our test, a 4GB file will contain 32768 blocks, as we
plied a MySQL database to store data and their related information set k = 8 for comparison with previous work, TDICP requires the
and built secure channels between entities using SSLsocket. We data holder to perform 1048576 PRP, 524288 PRF, and 262144
employed a cuckoo filter with 12.6 bits on average per item, which HASH. Compared with the most related work StealthGuard,
can offer a false positive rate of 0.19%. In our tests, we focus on since we introduce a hidden function instead of the watchdog in
testing the performance of our proposed TDICP and UDDCP, the StealthGuard, we bring more costs to the data owner. However,
rest including encryption, decryption, and key re-encryption, can since the setup phase is a one-time cost regardless of the integrity
be found in our previous paper for details. check, it is reasonable since our protocol provides a new feature
Assume there exists n0 elements after the block has been that we can also deduplicate the verification tags, which is a step
transformed into a matrix. Let a note set conform to the hidden forward than StealthGuard.
function that contains r notes and a number of k note sets are Storage overhead of data holders. In order to fulfill the task
required to be inserted, there exists s = k ∗ r notes. of integrity check, all the data holders need to record the position
Assume that the data holder inserts c notes in each column at set Pi , so that they can later retrieve the notes based on the PIR
average, then s/(tc) times of queries will be needed to fetch all algorithm. In our protocol, the size of each position information is
the notes. Therefore, the communication cost of the data holder 15bits. Thus, the data holder needs 4k ∗ 15 bits additional storage
and the CSP to check the integrity is (x + y) ∗ |m| ∗ s/(tc). When to record these positions. As to a 4GB file, it costs an additional
x = y , the equation 32768 ∗ 4 ∗ 8 ∗ 15 = 1.875 MB. Compared with StealthGuard and
√ reaches the best. Thus, in our experiment, other previous work, our scheme saves more or less storage due to
we set x = y = n in order to minimize communication cost.
Meanwhile, the times of multiplication in the integrity check is the novel verification tags.
2∗x∗s/(tc)+x∗y∗s/(tc)+s = 2∗x∗s/(tc)+n∗s/(tc)+s+y . Computational complexity of CSP at response phase. As the
Therefore, the larger t ∗ c, the less multiplications needed. Thus, entity who responses the integrity challenge from the data holders,
we set c = y, t = ds/ce to minimize the times of multiplication the CSP performs 1∗x∗y multiplication (mul) to compute Resp =
needed for integrity check. vi × D. Therefore, assume that x = 135 and y = 136 for the
matrix D, the CSP performs 18360 mul to compute a response to
the data holder, and totally 1719 ∗ 18360 = 31560840 mul for
7.2 Performance Analysis
all 1719 notes. Compared with StealthGuard, our scheme reduces
In this subsection, we analyze the performance of our two pro- almost 20 times of computational complexity, it is mainly because
posed protocols. Hereafter, computation costs are represented with in StealthGuard, the CSP needs to transform the matrix D into a
exp for exponentiation, mul for multiplication, PRF for pseudo- bit matrix, which increases the number of computations.
random function, PRP for pseudo-random permutation, INV for Computational complexity of data holder at challenge and
inversion, and enc for encryption. response phase. In each challenge phase, the data holder performs
x mul and x PRF for generating the private coefficient ei and
7.2.1 Performance Analysis on TDICP x mul to compute vi = bei mod m. Assume that x = 135,
We first did theoretical analysis on integrity check performance. the data holder then performs 270 mul and 135 PRF to generate
The proposed TDICP involves two types of system entities: data one challenge. As a total, the data holder performs 464130 mul
holder/data owner and CSP. In order to be compatible with the and 232065 PRF to generate all challenges for a 4GB file. In
chunk setting of previous deduplication schemes [37] and compare each verification phase, there exist a best situation and a worst
computational complexity and communication cost with other situcation. In the best situation, the data holder performs 4 + y
pervious PoR schemes [7], [15], [16], [18], [20], we follow the mul to extract a queried column based on PIR algrithm. In the
settings of the prior arts to split a 4 GB file into a number worst situation, the data holder performs (4 + 3 + 2 + 1) + y mul
of blocks with 128 KB, so that we got each block containing to extract the queried column. Therefore, in the best situation, the
16384 elements and each element is 64 bits. In order to reduce data holder in TDICP performs 140 mul and 1 HASH to verify that
the probability of CSP to find collisions that can help passing the note conform to f . In the best situation, the data holder then
the integrity verification, we selected the hidden function as performs 146 mul and 1 HASH to verify the note set. As a total,
notes[1]knotes[2] = (HashSHA256 (notes[3]knotes[4]))128 , the data holder performs 250974 (240660) mul and 1719 HASH
which can be proved secure and efficient in [38] and inserted 8 to verify a 4GB files. Compared with StealthGuard, our scheme is
pairs of notes as verification tags into each block. The size of more efficient than StealthGuard at the challenge phase and a little
each notes is 64bits. We adopted ECC in all splits and required bit worse than StealthGuard during verification. The reason is that
it to correct 5% errors (912 elements). Thus, each block contains our TDICP performs less computations to retrieve the tags, but
18240 = 16384 + 32 + 912 ∗ 2 elements. We adopted AES for performs more computations to verify whether the notes conform
symmetric encryption and PRE proposed in [39]. We analyzed the to the hidden function. In fact, for a same file that contains one
computational complexity, storage overhead, and communication pair of notes or a single watchdog, our method of note insertion
cost of each entity at various phases as below. The result is shown only affects the complexity of the verification phase, and the effect
in Table 3. is very small regarding to the above statement.
Computational complexity of data owner at setup phase. Communication cost of challenge and response phase. When
Regarded as the first data uploader, the data owner setups the the data holder challenges the integrity of a block, it sends a

TABLE 3
Computational complexity and communication cost of TDICP compared with existing works

Scheme Parameter Setup cost Storage overhead Server cost Verifier cost Communication cost
Challenge:
764 PRP
1 exp
Block size: 2 KB 4.4 × 106 exp 764 PRF Challenge: 168 B
[7] tags: 267 MB Verfi:
tag size: 128 B 2.2 × 106 mul 765 exp Response: 148 B
766 exp
1528 mul
764 PRP
Challenge:
Block size: 128 bits
1719 PRF Challenge: 6 KB
[15] Number of sentinels: 2 × 106 PRF sentinels: 30.6 MB ⊥
Verfi: Response: 26.9 MB
2 × 106
⊥
Challenge:
Block size: 80 bits 1 enc
1 enc
Number of blocks: 1 MAC Challenge: 1.9 KB
[18] 5.4 × 106 PRF tags: 51 MB 7245 mul
in one split: 160 Verfi: Response: 1.6 KB
1.1 × 109 mul
tag size: 80 bits 45 PRF
160+205 mul
Challenge:
⊥
Block size: 160 bits
2.2 × 108 mul 160 exp Verfi: Challenge: 36 KB
[16] Number of blocks tags: 26 MB
1.4 × 106 PRF 2.6 × 105 mul 2 exp Response: 60 B
in one split: 160
1639 PRF
1639 mul
Challenge:
Block size: 256 bits
2.6 × 105 PRF 2.0 × 106 mul Challenge: 23.3 MB
[20] Number of blocks Watchdogs: 8 MB 6.2 × 108 mul
2.6 × 105 PRP Verfi: Response: 26.2 MB
in one split: 4096
1.4 × 105 mul
Challenge:
4.6 × 105 mul
Verif:
Element size: 64 bits 1.0 × 106 PRP worst:
Positions of hidden Challenge: 9.24 MB
Ours Number of elements 7.9 × 105 PRF 3.2 × 107 mul 2.5 × 105 mul
parameters: 1.875 MB Response: 9.31 MB
in one block: 16384 2.6 × 105 HASH 1.7 × 103 HASH
best:
2.4 × 105 mul
1.7 × 103 HASH

exp: exponentiation; mul: multiplication; PRP: pseudo-random permutation; PRF: pseudo-random function; enc: encryption; MAC: message authentication code

[1, x] vector to the CSP. The size of each element in the vector is TABLE 4
334 bits. Thus, the size of each challenge is 334×x bits. As a total, Computational complexity of UDDCP
the size of each challenge is 334 bits∗135 = 5636.25 bytes and
9.24 MB for a 4 GB file. As to the response phase, the CSP sends Initialization Filter generation Duplication check
Nc PRF
a [1, y] vector back to the data holder, the size of each element 2Nc mul
Data holder Nc INV
in the vector is also 334 bits. Thus, the size of each response is Nc exp
Nc exp
334 × xbits. Remembering that y = 136, as a total, the size of CSP Ns exp Nc exp
each response is 334 bits∗136 = 5678 bytes and 9.31 MB for a Ns + Ns /λ exp
4 GB file. Compared with StealthGuard, our communication cost AA 2Ns mul
is smaller since in our scheme, we reduce the data needed to be Ns CF.insert
retrieved based on PIR algorithm.
exp: exponentiation; mul: multiplication; PRF: pseudo-random function; INV:
inversion
7.2.2 Performance Analysis on UDDCP
We then theoretically analyze the performance of duplication
check. The proposed protocol involves three types of system setup and online phase performs module exponential calculations
entities: data holder, AA, and CSP. Assume that the data holder whose computation complexities are Ns and Nc exp, respectively.
holds Nc tags and the CSP has already maintained Ns tags. Computational complexity of AA at filter generation phase:
We adopt RSA for signing a signature and analyze the com- During the setup phase, AA performs three types of verification: 1)
?
putational complexity and comminication cost of each entity at Tag number verification, i.e., N = Ns , which can be omitted; 2)
various phases as below. The results with respect to computational Signature verification: AA performs Ns exp to verify all signatures
complexity are shown in Table 4. provided by the CSP; 3) CSP computation verification: Using
Computational complexity of data holder at preprocessing batch verification, AA performs 2 ∗ Ns and Ns /λ, where λ
phase: The data holder who wants to check whether its uploaded represents the size of each non-overlap subset that verifies the CSP
blocks are duplicate needs to perform Nc PRF, Nc INV, and Nc computations according to the corresponding tag value. Also, AA
exp to initialize the PSI algorithm, which are all in proportion to needs to construct a cuckoo filter with Ns elements. Attention
the number of tags held by the data holder. needs to be paid that, once the system is set up, when several
Computational complexity of CSP at filter generation and new tags which have not been maintained by the CSP come, AA
duplication check phase: As we can see in the protocol, the CSP at only needs to perform verification on the new coming tags instead

32KB 32KB 1.1 32KB

50 64KB 8 64KB 64KB

Check integrity (ms)

Remove notes (ms)

128KB 128KB 1.0 128KB

Insert notes (ms)

40 6 0.9
30 0.8
4 0.7
20
0.6
10 2
0.5
0
0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10
Notes ratio (%) Notes ratio (%) Notes ratio (%)
(a) Inserting note cost (b) Integrity check cost (c) Removing note cost

Fig. 5. Computational costs with regard to note ratio varying from 0.02 to 0.10

of all the tags maintained by the CSP. This implies that the total expectation. When the note ratio increases, all these costs increase
computational complexity of AA is proportional to Nc0 instead of linearly since our meta verification block is a note set that contains
Ns . 4 notes that conform to the hidden function. The increase of note
Computational complexity of data holder at duplication check ratio causes the increase of operation time regarding inserting,
phase: At the duplication check phase, the data holder first per- verifying, and removing those similar verification blocks.
forms Nc mul to create a challenge to the CSP. In order to verify Impact of tag size: Figs. 6(a) to 6(e) shows the setup cost, the
the response sent back from the CSP, the data holder then conducts data holder storage overhead, the CSP integrity check cost, the
Nc exp computations to verify CSP compuatation. Finally, Nc mul data holder integrity check cost , and the total integrity check cost
operations is needed for the data holder to check duplication with of TDICP with regard to the size of notes (watchdogs) varying
the help of the cuckoo filter provided by the AA. from 2 KB to 14 KB compared with StealthGuard. Fig. 6(a)
Communication cost from CSP to AA: During the setup compares the setup cost of our scheme with StealthGuard. The
phase, CSP sends its all maintained tag values, the corre- setup cost increases as the size of tag increases in both schemes
sponding signatures and the computation results {ai } to the as expected. As we can see, TDICP incurs a higher computation
AA, whose element size is 256 bits, 576 bits, and 1024 bits, cost than the StealthGuard at the setup phase. The reason is that
respectively. As a total, the CSP is required to transfer TDICP needs to additionally perform multiple HASH operations
(256 bits+576 bits+1024 bits)*Ns =232*Ns bytes to the AA for and permutations than the StealthGuard. Fig. 6(b) compares the
constructing the cuckoo filter, which is linear to the number of storage overhead of TDICP with StealthGraud at the data holder.
tags maintained at the CSP. StealthGuard incurs higher storage overhead since it requires the
Communication cost between data holder and CSP: At the data holder to record all the watchdogs and TDICP requires the
duplication check phase, the data holder sends {A[i]} to the data holder to store the position index P of the notes whose size
CSP. Since we set the module N as an integer with 1024 bits, is smaller than that of the watchdogs. Fig. 6(c) compares the
the size of each A[i] is 1024 bits. As a total, the size of CSP cost of TDICP with StealthGuard. StealthGuard incurs higher
each challenge is 1024*Nc bits. As to the CSP, it responses computation cost since it requires the CSP to transfer the data into
the challenge with {C[i]} whose element size is 1024 bits. As 80bits matrix, which increases the times of multiplication executed
a total, the size of each response is 1024*Nc bits. Thus, the at the CSP. Fig. 6(d) compares the data holder cost of TDICP
total communication cost between the data holder and CSP is with StealthGuard. We can see that StealthGuard incurs higher
(1024 bits+1024 bits)*Nc =256*Nc bytes, which is in proportion computation cost since StealthGrard requires the data holder to
to the tag number of the data holder. perform more computations on extracting the verification tags
from the response. As a total, Fig. 6(e) concludes and compares
7.3 Performance Evaluation the total cost on checking the integrity of a 128KB file with
In this subsection, we present simulation based evaluation results StealthGuard. We can see that TDICP outperforms StealthGuard
of the two proposed protocols. with respect to the computation cost in both the CSP and the data
holder side, and all of those costs increase as the size of notes
7.3.1 Performance Evaluation on TDICP (watchdogs) increases.
We first present the performance evaluation result of TDICP and
compare it with StealthGuard in terms of setup cost, integrity 7.3.2 Performance Evaluation on UDDCP
check cost at the CSP and the data holder (DH), respectively. We further tested the performance of UDDCP. We assume that
Since we propose the novel note set as the verification tags, the the CSP has already maintained a larger number of tags than
note ratio, i.e., the ratio of the size of inserted notes to that of the the data holder. Since the computational complexity of signature
block, is an unique evaluation parameter in TDICP, we evaluate it verification is obviously linear to the number of the tags. Our
without comparison. simulation focuses on the AA and data holder verification on
Impact of note ratio: Figs. 5(a) to 5(c) shows note insertion the CSP computations, respectively. We also evaluated UDDCP
cost, integrity check cost, and note removing cost of our scheme performance in various sizes of the non-overlap subset as we
with the note ratio varying from 0.02 to 0.10 and notes size of introduce batch verification into UDDCP.
32 KB, 64 KB, and 128 KB, respectively. As we can see, the Fig. 7(a) presents the verification cost of AA on CSP compu-
larger the note size is, the higher the note insertion cost, integrity tations. As we can see, in all sizes of the subset, the verification
check cost, and note removing cost, which is the same as our cost increases linearly to the the number of CSP tags as expected.

StealthGuard 14 StealthGuard 50 StealthGuard 30 StealthGuard 70 StealthGuard

2.0

CSP integrity check (ms)

DH integrity check (ms)

Ours 12 Ours Ours Ours Ours

Check integrity (ms)

Local storage (KB)

Setup phase (ms) 40 25
1.5 10 50
20
8 30 40
1.0 6 15
20 30
4 10
0.5 20
2 10 5 10
0.0 0 0 0
0.0 2.5 5.0 7.5 10.0 12.5 0.0 2.5 5.0 7.5 10.0 12.5 0.0 2.5 5.0 7.5 10.0 12.5 0.0 2.5 5.0 7.5 10.0 12.5 0.0 2.5 5.0 7.5 10.0 12.5
Size of notes (watchdogs) (KB) Size of notes (watchdogs) (KB) Size of notes (watchdogs) (KB) Size of notes (watchdogs) (KB) Size of notes (watchdogs) (KB)

(a) DH Setup cost (b) DH storage overhead (c) CSP integrity check cost (d) DH integrity check cost (e) Total integrity check cost

Fig. 6. Computational costs with regard to note size varying from 2KB to 14KB compared with StealthGuard

size=10 size=10

CSP-DH comm. cost (MB)

CSP-AA comm. cost (MB)
size=20

CSP comp. veri. cost (s)

size=20 0.05
CSP comp. veri. cost (s)

0.4 size=30 size=30 8

size=40 size=40 0.8
size=50 0.04 size=50
0.3 size=60 size=60
size=70 6
size=70 0.6
size=80
size=90
0.03 size=80
size=90
0.2 size=100 size=100 4
0.02 0.4
0.1 0.01 2 0.2
10000 20000 30000 40000 1000 2000 3000 4000 10000 20000 30000 40000 1000 2000 3000 4000
Number of elements in X Number of elements in Y Number of elements in X Number of elements in Y
(a) Veri. cost of AA (b) Veri. cost of DH (c) CSP-AA comm. cost (d) CSP-DH comm. cost

Fig. 7. Computational costs with regard to the size of the non-overlap subset

Meanwhile, the larger size of each non-overlap subset, the lower Academy of Finland under Grant 308087, Grant 335262 and Grant
the verification cost, since the times of exponentiation needed for 345072; in part by the open research project of ZheJiang Lab
verification decreases. Fig. 7(b) presents the verification cost of under grant 2021PD0AB01; in part by the Shaanxi Innovation
the data holder on CSP computations. Similar as the verification Team Project under Grant 2018TD-007; and in part by the 111
at AA, the verification cost increases linearly to the number of data Project under Grant B16037.
holder tags as expected. Also, the higher size of each non-overlap
subset, the lower the verification cost.
Fig. 7(c) presents the communication cost between the CSP R EFERENCES
and AA. We can see that the communication cost increases linearly [1] Z. Yan, L. F. Zhang, W. X. Ding, and Q. H. Zheng, “Heterogeneous
as the number of elements in the CSP tag set X increases, which data storage management with deduplication in cloud computing,” IEEE
is the same as expectation. The reason is that the CSP is required Transactions on Big Data, pp. 1–1, 2017.
[2] Z. Yan, W. X. Ding, and H. Q. Zhu, “A scheme to manage encrypted
to provide all its maintained tags to the AA for computation
data storage with deduplication in cloud,” in International Conference on
and signature auditing. Fig. 7(d) presents the communication cost Algorithms and Architectures for Parallel Processing, 2015.
between the CSP and the data holder. As the number of elements in [3] Z. Yan, M. J. Wang, Y. X. Li, and A. V. Vasilakos, “Encrypted data
data holder tag set Y increases, the communication cost increases management with deduplication in cloud computing,” IEEE Cloud Com-
puting, vol. 3, no. 2, pp. 28–35, 2016.
linearly as expectation. The reason is that the data holder sends all [4] W. Shen, Y. Su, and R. Hao, “Lightweight cloud storage auditing with
its masked tag values to the CSP as challenges and the CSP then deduplication supporting strong privacy protection,” IEEE Access, vol. 8,
responses all the challenges, which is linear to the number of tags. pp. 44 359–44 372, 2020.
[5] Q. Zheng and S. Xu, “Secure and efficient proof of storage with
deduplication,” in CODASPY ’12, New York, NY, USA, 2012, p. 1–12.
8 C ONCLUSION [6] A. Giuseppe, R. Burns, and C. Reza, “Provable data possession at un-
In this paper, we introduced VeriDedup to check the integrity trusted stores,” in Proceedings of the 14th ACM Conference on Computer
of an outsourced encrypted file and guarantee the correctness of and Communications Security, 2007, pp. 598–609.
[7] G. Ateniese, R. Burns, R. Curtmola, J. Herring, O. Khan, Z. Peterson,
duplication check in an integrated way. The integrity check proto- and D. Song, “Remote data checking using provable data possession,”
col TDICP of VeriDedup allows multiple data holders to verify ACM Transactions on Information and System Security, vol. 14, pp. 1–
the integrity of their outsourced file with their own individual 34, 2011.
[8] Z. Wen, J. Luo, H. Chen, J. Meng, X. Li, and J. Li, “A verifiable data
verification tags without interacting with the data owner. On the deduplication scheme in cloud computing,” in INCOS ’14, USA, 2014,
other hand, we employed a novel challenge and response mech- p. 85–90.
anism in the duplication check protocol UDDCP of VeriDedup [9] P. Meye, P. Raïpin, F. Tronel, and E. Anceaume, “A secure two-phase
to let the data holder instead of the CSP first tell whether a file data deduplication scheme,” in HPCC ’14, CSS ’14, ICESS ’14, 2014,
pp. 802–809.
is duplicate in order to guarantee the correctness of duplication [10] D. Vasilopoulos, M. Önen, K. Elkhiyaoui, and R. Molva, “Message-
check. Security and performance analysis show that VeriDedup is locked proofs of retrievability with secure deduplication,” in Proceedings
secure and efficient under the described security model. The result of the 2016 ACM on Cloud Computing Security Workshop, 2016, pp. 73–
of our computer simulation further shows its efficiency compared 83.
[11] M. Bellare, R. Canetti, and H. Krawczyk, “Keying hash functions for
with highly related prior arts. message authentication,” in CRYPTO ’96, Berlin, Heidelberg, 1996, pp.
1–15.
ACKNOWLEDGMENT [12] X. Q. Liang, Z. Yan, X. F. Chen, L. T. Yang, W. J. Lou, and Y. T. Hou,
“Game theoretical analysis on encrypted cloud data deduplication,” IEEE
This work is supported in part by the National Natural Science Transactions on Industrial Informatics, vol. 15, no. 10, pp. 5778–5789,
Foundation of China under Grant 62072351; in part by the 2019.

[13] X. Q. Liang, Z. Yan, R. H. Deng, and Q. H. Zheng, “Investigating the [39] G. Ateniese, K. Fu, M. Green, and S. Hohenberger, “Improved proxy
adoption of hybrid encrypted cloud data deduplication with game theory,” re-encryption schemes with applications to secure distributed storage,”
IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 3, Acm Transactions on Information and System Security, vol. 9, no. 1, pp.
pp. 587–600, 2021. 1–30, 2006.
[14] Z. Yan, W. Ding, X. Yu, H. Zhu, and R. H. Deng, “Deduplication on
encrypted big data in cloud,” IEEE Transactions on Big Data, vol. 2,
no. 2, pp. 138–150, 2016.
[15] A. Juels and B. S. Kaliski, “Pors: Proofs of retrievability for large files,”
Xixun Yu received the BEng degree in telecom-
in CCS ’07, New York, NY, USA, 2007, p. 584–597.
munications engineering from Xidian University,
[16] J. Xu and E.-C. Chang, “Towards efficient proofs of retrievability,” in Xi’an, China, in 2015. He was a visiting student
Proceedings of the 7th ACM Symposium on Information, Computer and in University of Delaware, USA, in 2017. He
Communications Security, New York, NY, USA, 2012, p. 79–80. is currently working toward the PhD degree in
[17] C. M. Tang and X. J. Zhang, “A new publicly verifiable data possession information security from the School of Cyber
on remote storage,” Journal of supercomputing, vol. 75, no. 1, pp. 77–91, Engineering, Xidian University. His research in-
2019. terests include cloud security and verifiable com-
[18] H. Shacham and B. Waters, “Compact proofs of retrievability,” in putation.
ASIACRYPT ’08, Berlin, Heidelberg, 2008, pp. 90–107.
[19] B. Dan, B. Lynn, and H. Shacham, “Short signatures from the weil
pairing,” in ASIACRYPT ’01, 2001, pp. 514–532.
[20] M. Azraoui, K. Elkhiyaoui, R. Molva, and M. Önen, “Stealthguard:
Proofs of retrievability with hidden watchdogs,” in European Symposium
on Research in Computer Security, 2014, pp. 39–256. Hui Bai received the BEng degree in Information
[21] M. Bellare, S. Keelveedhi, and T. Ristenpart, “Message-locked encryp- Security from Xidian University, Xi’an, China, in
tion and secure deduplication,” in EUROCRYPT ’13, 2013, pp. 296–312. 2019. She is currently working for her Master de-
[22] A. Kate, G. M. Zaverucha, and I. Goldberg, “Constant-size commitments gree in Cyberspace Security, Xidian University.
to polynomials and their applications,” in ASIACRYPT ’10, 2010, pp. Her research interests include verifiable compu-
177–194. tation and machine learning.
[23] G. Wallace, F. Douglis, H. Qian, P. Shilane, and W. Hsu, “Characteristics
of backup workloads in production systems,” in Proceedings of the 10th
USENIX conference on File and Storage Technologies, 2012, pp. 4–4.
[24] R. Chen, Y. Mu, G. Yang, and F. Guo, “Bl-mle: Block-level message-
locked encryption for secure large file deduplication,” IEEE Transactions
on Information Forensics and Security, vol. 10, no. 12, pp. 2643–2652,
2015.
[25] Y. Shin, J. Hur, and K. Kim, “Security weakness in the proof of storage Zheng Yan received the D.Sc. degree in tech-
with deduplication,” Cryptology ePrint Archive, Report 2012/554, 2012, nology from the Helsinki University of Technol-
https://eprint.iacr.org/2012/554. ogy, Espoo, Finland, in 2007. She is currently a
[26] A. Kiss, J. Liu, T. Schneider, N. Asokan, and B. Pinkas, “Private set Professor in the School of Cyber Engineering,
intersection for unequal set sizes with mobile applications,” Proceedings Xidian University, Xi’an, China and a Visiting
of Privacy Enhancing Technologies, vol. 2017, no. 4, pp. 177–197, 2017. Professor and Finnish Academy Research Fel-
[27] E. D. Cristofaro and G. Tsudik, “Practical private set intersection pro- low at the Aalto University, Helsinki, Finland. Her
tocols with linear complexity,” in Financial Cryptography and Data research interests are in trust, security, privacy,
Security, 2010, pp. 143–159. and security-related data analytics. Dr. Yan is
[28] E. Cristofaro and G. Tsudik, “Experimenting with fast private set inter- an area editor or an associate Editor of IEEE
section,” in Trust and Trustworthy Computing, Berlin, Heidelberg, 2012, INTERNET OF THINGS JOURNAL, Information
pp. 55–73. Fusion, Information Sciences, IEEE ACCESS, and Journal of Network
[29] B. Fan, D. G. Andersen, M. Kaminsky, and M. D. Mitzenmacher, and Computer Applications. She served as a General Chair or Program
“Cuckoo filter: Practically better than bloom,” in CoNEXT’14, 2014, pp. Chair for numerous international conferences, including IEEE TrustCom
77–85. 2015 and IFIP Networking 2021. She is a Founding Steering Committee
[30] E. Kushilevitz and R. Ostrovsky, “Replication is not needed: single Co-Chair of IEEE Blockchain conference. She received several awards
database, computationally-private information retrieval,” in Proceedings in recent years, including the Distinguished Inventor Award of Nokia,
of the 38th Annual Symposium on Foundations of Computer Science, Aalto ELEC Impact Award, the Best Journal Paper Award issued by
1997, pp. 364–373. IEEE Communication Society Technical Committee on Big Data and the
[31] J. Trostle and A. Parrish, “Efficient computationally private information Outstanding Associate Editor of 2017 and 2018 for IEEE Access, etc.
retrieval from anonymity or trapdoor groups,” in Proceedings of the 13th
International Conference on Information Security, Berlin, Heidelberg,
2010, p. 114–128.
[32] Z. Pooranian, M. Shojafar, S. Garg, R. Taheri, and R. Tafazolli, “Lever: Rui Zhang received the B.E. in Communication
Secure deduplicated cloud storage with encrypted two-party interactions Engineering and the M.E. in Communication and
in cyber–physical systems,” IEEE Transactions on Industrial Informatics, Information System from Huazhong University
vol. 17, no. 8, pp. 5759–5768, AUG 2021. of Science and Technology, China, in 2001 and
[33] S. Reed and G. Solomon, “Polynomial codes over certain finite fields,” 2005, respectively, and the PhD degree in elec-
Journal of the Society for Industrial and Applied Mathematics, vol. 8, trical engineering from the Arizona State Univer-
pp. 300–304, 1960. sity, in 2013. He has been an Assistant Professor
[34] M. O’Neill and M. Robshaw, “Low-cost digital signature architecture in the Department of Computer and Information
suitable for radio frequency identification tags.” IET Computers and Sciences Department at University of Delaware
Digital Techniques, vol. 4, no. 1, pp. 14–26, 2010. since 2016. Prior to joining UD, he had been an
[35] X. Liang, Z. Yan, and R. H. Deng, “Game theoretical study on client- Assistant Professor in the Department of Elec-
controlled cloud data deduplication,” Computers Security, vol. 91, p. trical Engineering at the University of Hawaii from 2013 to 2016. His
101730, 2020. primary research interests are security and privacy issues in wireless
[36] X. Liang, Z. Yan, W. Ding, and R. H. Deng, “Game theoretical study networks, mobile crowdsourcing, mobile systems for disabled people,
on a client-controlled deduplication scheme,” in IEEE UIC2019, Augest cloud computing, and social networks. He received the US NSF CA-
2019, pp. 1154–1161. REER Award in 2017.
[37] D. T. Meyer and W. J. Bolosky, “A study of practical deduplication,”
ACM Transactions on Storage, vol. 7, no. 4, p. 14, 2012.
[38] NIST, “Recommendation for applications using approved hash algo-
rithms,” 2008.

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

A Hybrid Cloud Approach For Secure Authorized Deduplication
100% (4)
A Hybrid Cloud Approach For Secure Authorized Deduplication
9 pages
1 s2.0 S221421262300011X Main
No ratings yet
1 s2.0 S221421262300011X Main
13 pages
Regeneration
No ratings yet
Regeneration
87 pages
Enabling Parity Authenticator Based Public Auditing With Protection of A Valid User Revocation in Cloud
No ratings yet
Enabling Parity Authenticator Based Public Auditing With Protection of A Valid User Revocation in Cloud
9 pages
AF-DedupSecure Data Deduplication Based On Adaptive Dynamic Merkle Hash Forest PoW For Cloud Storage
No ratings yet
AF-DedupSecure Data Deduplication Based On Adaptive Dynamic Merkle Hash Forest PoW For Cloud Storage
11 pages
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
No ratings yet
Solving Data De-Duplication Issues On Cloud Using Hashing and Md5 Techniques
11 pages
Electronics: Dynamic Data Integrity Auditing Based On Hierarchical Merkle Hash Tree in Cloud Storage
No ratings yet
Electronics: Dynamic Data Integrity Auditing Based On Hierarchical Merkle Hash Tree in Cloud Storage
20 pages
In Recent Years,: CPDP 1
No ratings yet
In Recent Years,: CPDP 1
54 pages
Detecting Replicated Files in The Cloud
No ratings yet
Detecting Replicated Files in The Cloud
9 pages
Electronic Electrical
No ratings yet
Electronic Electrical
1,536 pages
RCC 2
No ratings yet
RCC 2
22 pages
Certificateless Integrity Checking of Group Shared Data On Public Distributed Storage
No ratings yet
Certificateless Integrity Checking of Group Shared Data On Public Distributed Storage
8 pages
1, S4 New Curriculum Chemistry Chapter 3 - Trendsin The Periodic Table
No ratings yet
1, S4 New Curriculum Chemistry Chapter 3 - Trendsin The Periodic Table
9 pages
2 Efficient Client-Side Deduplication
No ratings yet
2 Efficient Client-Side Deduplication
10 pages
VTJDM04 FR1
No ratings yet
VTJDM04 FR1
20 pages
TSC 2024 10 0911 - Proof - Hi
No ratings yet
TSC 2024 10 0911 - Proof - Hi
15 pages
Guo 2020
No ratings yet
Guo 2020
15 pages
Deduplication Review
No ratings yet
Deduplication Review
18 pages
Public Auditing Springer
No ratings yet
Public Auditing Springer
26 pages
Journal Tiis 15-2 2086517432
No ratings yet
Journal Tiis 15-2 2086517432
22 pages
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
No ratings yet
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
13 pages
DeyPoS Deduplicatable Dynamic Proof of
No ratings yet
DeyPoS Deduplicatable Dynamic Proof of
9 pages
2022 V13i1173
No ratings yet
2022 V13i1173
8 pages
Ijct V3i1p7
No ratings yet
Ijct V3i1p7
9 pages
(IJCST-V10I5P53) :MR D Purushothaman, M Naveen
No ratings yet
(IJCST-V10I5P53) :MR D Purushothaman, M Naveen
8 pages
V3i501 PDF
No ratings yet
V3i501 PDF
6 pages
V3i102 PDF
No ratings yet
V3i102 PDF
7 pages
Securing Cloud Data Storage
No ratings yet
Securing Cloud Data Storage
7 pages
Revocation Based De-Duplication Systems For Improving Reliability in Cloud Storage
No ratings yet
Revocation Based De-Duplication Systems For Improving Reliability in Cloud Storage
6 pages
Deduplication On Encrypted Data in Cloud Computing
No ratings yet
Deduplication On Encrypted Data in Cloud Computing
4 pages
Deduplication On Encrypted Big Data in Cloud
No ratings yet
Deduplication On Encrypted Big Data in Cloud
13 pages
Data Storage Protocol For Cloud Security
No ratings yet
Data Storage Protocol For Cloud Security
15 pages
Toward Secure and Dependable Storage Services in Cloud Computing
No ratings yet
Toward Secure and Dependable Storage Services in Cloud Computing
13 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
The Uses of 3 Dimensional Printing Technology in o
No ratings yet
The Uses of 3 Dimensional Printing Technology in o
5 pages
Securing Cloud Data Storage: S. P. Jaikar, M. V. Nimbalkar
No ratings yet
Securing Cloud Data Storage: S. P. Jaikar, M. V. Nimbalkar
7 pages
Block
No ratings yet
Block
16 pages
Analysis of Bleach Lab Report
83% (6)
Analysis of Bleach Lab Report
8 pages
ASurveyonDataIntegrityAuditingSchemesinCloud PDF
No ratings yet
ASurveyonDataIntegrityAuditingSchemesinCloud PDF
7 pages
IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search
No ratings yet
IEEE2023 Data Secure De-Duplication and Recovery Based On Public Key Encryption With Keyword Search
11 pages
Lightweight Cloud Storage Auditing With Deduplication Supporting
No ratings yet
Lightweight Cloud Storage Auditing With Deduplication Supporting
14 pages
Amol PCX - Report
No ratings yet
Amol PCX - Report
15 pages
LabVIEW MathScript
No ratings yet
LabVIEW MathScript
57 pages
Iaetsd Controlling Data Deuplication in Cloud Storage
No ratings yet
Iaetsd Controlling Data Deuplication in Cloud Storage
6 pages
Immediate Detection of Data Corruption by Integrating Blockchain in Cloud Computing
No ratings yet
Immediate Detection of Data Corruption by Integrating Blockchain in Cloud Computing
4 pages
An 3526982701
No ratings yet
An 3526982701
4 pages
CDO User's Guide: Uwe Schulzweida - MPI For Meteorology
No ratings yet
CDO User's Guide: Uwe Schulzweida - MPI For Meteorology
206 pages
Secure Way of Storing Data in Cloud Using Third Party Auditor
No ratings yet
Secure Way of Storing Data in Cloud Using Third Party Auditor
6 pages
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
No ratings yet
Secure Data Deduplication and Auditing For Cloud Data Storage 1 1
4 pages
Towards Secure and Dependable Storage Devices in Cloud Computing
No ratings yet
Towards Secure and Dependable Storage Devices in Cloud Computing
5 pages
Time Series Forecasting
100% (1)
Time Series Forecasting
52 pages
Near Sheltered and Loyal Storage Space Navigating in Cloud: N.Venkata Krishna, M.Venkata Ramana
No ratings yet
Near Sheltered and Loyal Storage Space Navigating in Cloud: N.Venkata Krishna, M.Venkata Ramana
5 pages
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
No ratings yet
A Survey On: Application-Aware Big Data Deduplication in Cloud Environment
7 pages
A Study On Application-Aware Local-Global Source Deduplication For Cloud Backup Services of Personal Storage
No ratings yet
A Study On Application-Aware Local-Global Source Deduplication For Cloud Backup Services of Personal Storage
3 pages
Diploma in Electrical Engineering Industrial Traning Report
No ratings yet
Diploma in Electrical Engineering Industrial Traning Report
42 pages
Valves Symbols
No ratings yet
Valves Symbols
4 pages
Enabling Public Auditability and Data Dynamics For Storage Security in Cloud Computing
No ratings yet
Enabling Public Auditability and Data Dynamics For Storage Security in Cloud Computing
13 pages
Towards Secure and Dependable Storage Services in Cloud Computing
No ratings yet
Towards Secure and Dependable Storage Services in Cloud Computing
7 pages
Towards Secure and Dependable Storage Services in Cloud Computing
No ratings yet
Towards Secure and Dependable Storage Services in Cloud Computing
6 pages
Identity Based Distributed Provable Data Possession in Multi Cloud Storage
No ratings yet
Identity Based Distributed Provable Data Possession in Multi Cloud Storage
11 pages
Verification of Data Integrity in Co-Operative Multicloud Storage
No ratings yet
Verification of Data Integrity in Co-Operative Multicloud Storage
9 pages
Secure Auditing and Deduplicating Data in Cloud: Akhila N P
No ratings yet
Secure Auditing and Deduplicating Data in Cloud: Akhila N P
14 pages
Research Paper
No ratings yet
Research Paper
6 pages
A Dynamic and Public Auditing With Fair Arbitration
No ratings yet
A Dynamic and Public Auditing With Fair Arbitration
17 pages
Microcontrollers
No ratings yet
Microcontrollers
13 pages
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
No ratings yet
FT-891 Quick Manual: (PWR/LOCK) Key RF/SQL Knob
2 pages
Development of A Static Aeroelastic Database Using NASTRAN SOL 14
No ratings yet
Development of A Static Aeroelastic Database Using NASTRAN SOL 14
108 pages
Ensuring Data Storage Security in Cloud Computing: STJEE504
No ratings yet
Ensuring Data Storage Security in Cloud Computing: STJEE504
3 pages
Data Integrity Proofs in Cloud Storage
No ratings yet
Data Integrity Proofs in Cloud Storage
6 pages
EE Review 2
No ratings yet
EE Review 2
5 pages
(L2) - (JLD 4.0) - Solutions - 30th April
No ratings yet
(L2) - (JLD 4.0) - Solutions - 30th April
34 pages
13.Privacy-Preserving Public Auditing For Secure Cloud Storage
No ratings yet
13.Privacy-Preserving Public Auditing For Secure Cloud Storage
3 pages
Unit 09
No ratings yet
Unit 09
9 pages
Human Skin Grade 6
No ratings yet
Human Skin Grade 6
15 pages
Exp 10 Relative Density Application
0% (1)
Exp 10 Relative Density Application
2 pages
Market Structure
No ratings yet
Market Structure
14 pages
Advanced Creating of 3D Dental Models in Blender Software: September 2016
No ratings yet
Advanced Creating of 3D Dental Models in Blender Software: September 2016
67 pages
Sewing Symbols in Tailoring
No ratings yet
Sewing Symbols in Tailoring
12 pages
Admission Test For 6th Class CBSE Answers
No ratings yet
Admission Test For 6th Class CBSE Answers
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
2 pages
Design of Unde Ground Water Tank
100% (2)
Design of Unde Ground Water Tank
18 pages
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
No ratings yet
Physical Characterization of Activated Carbon Derived From Mangosteen Peel PDF
5 pages
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
No ratings yet
Novel Convolutional Neural Network (NCNN) For The Diagnosis of Bearing Defects in Rotary Machinery
10 pages
Softening Behavior of Reinforced Concrete Beams Under Cyclic Loading
No ratings yet
Softening Behavior of Reinforced Concrete Beams Under Cyclic Loading
24 pages
4G CN PDF
No ratings yet
4G CN PDF
2 pages
Zelio Electromechanical Relays - RHT4138E
No ratings yet
Zelio Electromechanical Relays - RHT4138E
2 pages
Corn Starch
No ratings yet
Corn Starch
8 pages
Cybersecurity in Cloud Computing
From Everand
Cybersecurity in Cloud Computing
Akula Achari
No ratings yet
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
From Everand
IGNOU MCS 227 Cloud Computing and IoT Previous Years Solved Papers
Manish Soni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Deduplication 2023

Uploaded by

Deduplication 2023

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

VeriDedup: A Verifiable Cloud Data

Duplication check The actual situation of Weather the result Reason

new verification tag called note set in which each note is a

2 R ELATED W ORK TABLE 2

signature (RSA-PSI) consists of four main phases: base phase,

Fig. 3. Procedures of UDDCP

Fig. 4. Procedures of TDICP

6.1 Correctness of TDICP

32KB 32KB 1.1 32KB

Check integrity (ms)

Remove notes (ms)

Insert notes (ms)

StealthGuard 14 StealthGuard 50 StealthGuard 30 StealthGuard 70 StealthGuard

CSP integrity check (ms)

DH integrity check (ms)

Check integrity (ms)

Local storage (KB)

CSP-DH comm. cost (MB)

CSP comp. veri. cost (s)

0.4 size=30 size=30 8

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.