2 Efficient Client-Side Deduplication
2 Efficient Client-Side Deduplication
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
Submission
Digital Object Identifier DOI for IEEE Access
ABSTRACT At present, there is a considerable increase in the amount of data stored in storage services,
along with dramatic evolution of networking techniques. In storage services with huge data, the storage
servers may want to reduce the volume of stored data, and the clients may want to monitor the integrity of
their data with a low cost, since the cost of the functions related to data storage increase in proportion to the
size of the data. To achieve these goals, secure deduplication and integrity auditing delegation techniques
have been studied, which can reduce the volume of data stored in storage by eliminating duplicated copies
and permit clients to efficiently verify the integrity of stored files by delegating costly operations to a trusted
party, respectively. So far many studies have been conducted on each topic, separately, whereas relatively
few combined schemes, which supports the two functions simultaneously, have been researched. In this
paper, we design a combined technique which performs both secure deduplication of encrypted data and
public integrity auditing of data. To support the two functions, the proposed scheme performs challenge-
response protocols using the BLS signature based homomorphic linear authenticator. We utilize a third party
auditor for performing public audit, in order to help low-powered clients. The proposed scheme satisfies all
the fundamental security requirements. We also propose two variances that provide higher security and
better performance.
INDEX TERMS Cloud storage, Cryptography, Data security, Information security, Public audit, Secure
deduplication
I. INTRODUCTION that the size of data that is dealt by cloud storage services
N cloud storage services, clients outsource data to a will increase due to the performance of the new networking
I remote storage and access the data whenever they need
the data. Recently, owing to its convenience, cloud storage
technique. In this viewpoint, we can characterize the volume
of data as a main feature of cloud storage services. Many ser-
services have become widespread, and there is an increase in vice providers have already prepared high resolution contents
the use of cloud storage services. Well-known cloud services for their service to utilize faster networks. For secure cloud
such as Dropbox and iCloud are used by individuals and services in the new era, it is important to prepare suitable
businesses for various applications. A notable change in security tools to support this change.
information-based services that has happened recently is the Larger volumes of data require higher cost for managing
volume of data used in such services due to the dramatic evo- the various aspects of data, since the size of data influences
lution of network techniques. For example, in 5G networks, the cost for cloud storage services. The scale of storage
gigabits of data can be transmitted per second, which means should be increased according to the quantity of data to be
VOLUME 4, 2016 1
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
public key is distributed securely to the entities. the client only needs to keep mk secret. However, the CSS
The proposed method includes the following three proce- should store ek of each user for the duplicate file.
dures.
• First upload procedure: In this case, a user first uploads 2) Integrity auditing procedure
a file that is not stored in the CSS. First, a file ID/Tag The TPA periodically checks the integrity of the data stored
and a convergent encryption key K are generated, and in the CSS. To do this, the TPA first selects a random subset
the file is encrypted using CE with K and then uploaded I ∈ [1, n], and then randomly selects vi from Zp , for
to the CSS. The CSS maintains the list owner, tag, and each i ∈ I. The challenge values for integrity auditing are
ciphertext. The user computes an authentication tag for Q = {I, (vi )}. The TPA sends the Audit_Chall message with
the integrity auditing and sends it to the TPA. (TF , Q) to the CSS. Then, the CSS computes the proof values
• Subsequent upload procedure: This procedure is per-
{µ, τ } corresponding to the challenge as follows;
formed when a duplicate file is uploaded. The CSS
checks for the duplication using the file tag, and in the Y
µ← tvi i (∈ G) (1)
event of duplication, it proceeds with the PoW protocol
(i,vi )∈Q
to examine the ownership of the user. If a user passes
this process, the CSS adds the file ownership of the user and
to the stored file. X
• Integrity auditing procedure: Periodic auditing is re-
τ← vi · CTi (∈ Zp ). (2)
quired to ensure that the files stored on the CSS are (i,vi )∈Q
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
VOLUME 4, 2016 5
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
IV. ANALYSIS homomorphic authentication tag, and its security has been
The proposed scheme satisfies the security objectives men- analyzed in [14]. Thus, the proposed scheme supports public
tioned above. From the privacy perspective, the proposed verifiability. Finally, if the CSS keeps the data intact, it can
method outsources the encrypted ciphertext to the CSS us- pass the verification by the TPA. This is also provided by the
ing the convergent encryption key. In the integrity auditing security of the BLS-based homomorphic authentication tag
process, the TPA can also partially obtain the information used, thus ensuring the correctness of the storage.
about the ciphertext. Assuming that the convergent key gen- In order to provide a comparison with the existing
eration process is performed through the OPRF (oblivious schemes, first, the scheme proposed in [18] is considered.
pseudo random function) protocol with the trusted key server This scheme supports integrity auditing and deduplication
as in DupLESS [11], it provides security against offline using a polynomial-based authentication tag and a homomor-
brute-force attacks. An attacker who does not obtain the phic linear tag. During the setup process, the user computes a
convergent key K cannot get any information from the homomorphic linear tag and uploads it to the cloud server.
outsourced ciphertext, except the information of duplication. Then, the TPA performs integrity auditing with the cloud
The proposed scheme also supports secure deduplication by server through the interaction using a polynomial-based au-
providing deduplication for the ciphertext and performing thentication tag. In the deduplication process, when the cloud
the PoW protocol. This also depends on the security of the server randomly selects a set of block indexes for the PoW,
convergent key, as mentioned above, and the security of BLS the server sends them to the user. Then, the user transmits
signature based HLA [14]. the corresponding plaintext blocks as the response. Then,
The TPA then audits the integrity of the data without user the cloud server verifies the file ownership by verifying the
intervention. It depends on the security of the BLS-based validity of the received blocks using the pairing operation.
6 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
The biggest problem with this scheme is that the data is authentication tag of the already uploaded file and using it to
used as a plain text on the cloud side; therefore, it does not pass the ownership proof procedure. When the adversary is
support secure deduplication. In addition, regardless of the an outsider who cannot easily obtain the tag, it is not easy
file duplication, users always have to compute authentication to mount the attack. However, if the original data holder
tags. This results in a high computational overhead. helps an adversary to obtain the authentication tag, it is easy
In the scheme proposed in [12], the client uploads a file to mount the attack. As recognized in [8], the original file
to a TPA that is assumed to be honest. This is a very holder can utilize the storage service as a content distribution
strong assumption, since most of TPAs in the existing public network, and this could be a threat to storage services.
auditing-related papers are assumed to be semi-honest. It also Hence, if we consider such a stronger adversary, we need
wastes the bandwidth on the client side, because it always a countermeasure to prevent a legitimate user from helping
transmits the file to the TPA. When the client uploads the others to obtain the stored data.
file to the TPA, the TPA computes a homomorphic signature Note that it is not possible to prevent the data holder
for integrity auditing and uploads it to the cloud server along from giving his data to others, and therefore the goal of the
with the file. countermeasure is to make it difficult for an adversarial user
The proposed scheme improves the problems of the above to use the storage service as a content distribution network.
two methods. Table 1 shows the comparison with the existing As a concrete countermeasure to alleviate this threat, we
schemes. The TPA of the proposed scheme is assumed to be can consider a random selection of a generator for each
semi-honest, and also does not upload the file to the TPA. PoW process. The CSS randomly chooses a generator u0
In addition, the proposed method supports deduplication and and sends it to the client as the PoW Chall message. Then,
integrity auditing for the encrypted data, and the client only it is possible for the CSS to perform more secure PoW by
needs to perform a single authentication tag computation. verifying whether an appropriate response is returned by the
It has similar computational overhead in each case of the client. Even though the original file holder can still help other
first upload and duplicate upload. That is, in the case of the users to obtain the ownership, the user should have the entire
first upload, the authentication tag for the integrity audit is file and perform costly operations to do so. Hence, in the data
computed, and in the case of deduplication, the authenti- holder’s viewpoint, it is better to give the file to the adversary
cation tag for the PoW is generated. Therefore, it provides instead of helping the adversary to pass the PoW procedure.
better efficiency than the existing schemes in the viewpoint Fig. 5 shows the Subsequent upload process with improved
of client-side computational overhead. security.
In the case of a duplicate file upload, the CSS performs
the PoW process. To do this, the CSS chooses a random
generator u0 and challenge values Q = {I, (vi )}, where
V. VARIANCES the challenge Q is generated as in Section III. The CSS
In Section III, we designed a new secure deduplication sends the PoW Chall message with (u0 , Q) to the client.
supporting public auditing, and proved its security and ef- Then, the client returns the corresponding proof values (µ, τ )
ficiency in Section IV. The proposed scheme is secure under computed by using the generator u0 and its own public key α0 .
a reasonable security model and its performance is better The CSS verifies the proof with u0 and the client’s public key
than the existing schemes as shown in Section IV. Here, v 0 by checking whether the following holds;
we provide some techniques to achieve greater security and !
better performance. Y
e(µ, g) == e (BLSHash(i)vi · µ0τ ) , v 0 . (5)
i∈I
A. IMPROVEMENT FROM THE VIEWPOINT OF
SECURITY After obtaining the ownership, the client and the CCS
In this section, we will consider a slightly stronger attack use the same generator u to perform the integrity auditing.
scenario, which was not considered in Section IV. Recall that However, it does not a matter anymore since the client cannot
we assumed the original data holder to be a reliable entity help others by using the values to pass the PoW to obtain the
who behaves honestly. Hence, we used the assumption to ownership of the file F .
analyze the proposed scheme. However, in this section, we
discuss a possible attack scenario, which can be performed B. IMPROVEMENT FROM THE VIEWPOINT OF
by the valid user who is a legitimate data holder, and provide EFFICIENCY
a countermeasure for the attack by slightly modifying the At present, a variety of devices are used to generate and use
scheme in Section IV. data in storage services. Though the capability of the devices
Recall that the proposed model described in Section III has improved more than ever before, we still need to design
uses the same generator u to generate the authentication light schemes for storage services due to the increase in size
tag for integrity verification and the authentication tag for of data. In this viewpoint, we design a technique that can
PoW. In the case of the duplicated upload, it may be pos- permit a client to pass some costly operations to the CSS in
sible to attack the server in PoW process by acquiring the the upload procedure.
VOLUME 4, 2016 7
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
To reduce the computational complexity, as shown in Fig. than the client. Moreover, we can implement the online step
6, the process of uploading duplicate files can be modified. by choosing Q = {I, (vi )} and pre-computing µ before a
The client uploading the duplicate file computes only the subsequent upload process is initiated by a new client. If
authentication tag τ for each CTi , unlike in Section III. That we apply the pre-computation technique, we can reduce the
is, the client does not compute µ and sends the PoW Res computational complexity without increasing the cost for the
message with only the proof τ to the CSS. Then, the CSS CSS in the online step.
computes µ and verifies τ .
This reduces the amount of computation on the client side, VI. CONCLUSION
while the CSS’s computational overhead increases relatively. When storing data on remote cloud storages, users want to be
However, when the client is a lightweight device such as a assured that their outsourced data are maintained accurately
mobile device it is advantageous to transfer a part of the in the remote storage without being corrupted. In addition,
computation to the CSS, which has a a higher performance cloud servers want to use their storage more efficiently. To
8 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
FIGURE 6: Subsequent upload process for improving the computational overhead on the client side
VOLUME 4, 2016 9
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2836328, IEEE Access
services: Deduplication in cloud storage,” IEEE Security & Privacy, vol. KYUNG-HYUNE RHEE received his M.S. and
8, no. 6, pp. 40–47, Dec. 2010. Ph.D. degrees from Korea Advanced Institute of
[10] A. Juels and B.S. Kaliski Jr, “Pors: proofs of retrievability for large files,” Science and Technology (KAIST), Daejeon, Ko-
in Proc. of the 14th ACM conference on Computer and communications rea in 1985 and 1992, respectively. He worked
security (CCS’07), Alexandria, Virginia, USA, 2007, pp. 584–597. as a senior researcher in Electronic and Telecom-
[11] S. Keelveedhi and M. Bellare and T. Ristenpart, “DupLESS: server- munications Research Institute (ETRI), Daejeon,
aided encryption for deduplicated storage,” in Proc. of the 22nd USENIX Korea from 1985 to 1993. He also worked as a
Security Symposium (USENIX Security 13), Washington, D.C. USA,
visiting scholar in the University of Adelaide in
2013, pp. 179–194.
Australia, the University of Tokyo in Japan, the
[12] J. Li, J. Li, D. Xie and Z. Cai, “Secure auditing and deduplicating data in
cloud,” IEEE Transactions on Computers, vol. 65, no. 8, pp. 2386–2396, University of California at Irvine in USA, and
Aug. 2016. Kyushu University in Japan. He has served as a Chairman of Division of
[13] X. Liu, W. Sun, H. Quan, W. Lou, Y. Zhang and H. Li, “Publicly verifiable Information and Communication Technology, Colombo Plan Staff College
inner product evaluation over outsourced data streams under multiple for Technician Education in Manila, the Philippines. He is currently a pro-
keys,” IEEE Transactions on Services Computing, vol. 10, no. 5, pp. 826- fessor in the Department of IT Convergence and Application Engineering,
838, Sept.-Oct. 2017. Pukyong National University, Busan, Korea. His research interests center
[14] H. Shacham and B. Waters, “Compact proofs of retrievability,” in Proc. of on multimedia security and analysis, key management protocols and mobile
the 14th International Conference on the Theory and Application of Cryp- ad-hoc and VANET communication security.
tology and Information Security, Advances in Cryptology - ASIACRYPT
2008, Melbourne, Australia, 2008, pp. 90–107.
[15] Q. Wang, C. Wang, K. Ren, W. Lou and J. Li, “Enabling public auditability
and data dynamics for storage security in cloud computing,” IEEE Trans-
actions on Parallel and Distributed Systems, vol. 22, no. 5, pp. 847–859,
Dec. 2011.
[16] T. Y. Youn, K. Y. Chang, K. R. Rhee and S. U. Shin, “Public Audit
and Secure Deduplication in Cloud Storage using BLS signature,” Re-
search Briefs on Informaiton & Communication Technology Evolution
(ReBICTE), vol. 3, article no. 14, pp. 1-10, Nov. 2017.
[17] J. Yuan and S. Yu, “Proofs of retrievability with public verifiability and
constant communication cost in cloud,” in Proc. of the 2013 international
workshop on Security in cloud computing, Hangzhou, China, 2013, pp.
19–26.
[18] J. Yuan and S. Yu, “Secure and constant cost public cloud storage auditing
with deduplication,” in Communications and Network Security (CNS),
2013 IEEE Conference on, National Harbor, MD, USA, 2013, pp. 145-
153.
10 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.