skip to main content
10.1145/1698790.1698811acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Measuring risk and utility of anonymized data using information theory

Published: 22 March 2009 Publication History

Abstract

Before releasing anonymized microdata (individual data) it is essential to evaluate whether: i) their utility is high enough for their release to make sense; ii) the risk that the anonymized data result in disclosure of respondent identity or respondent attribute values is low enough. Utility and disclosure risk measures are used for the above evaluation, which normally lack a common theoretical framework allowing to trade off utility and risk in a consistent way. We explore in this paper the use of information-theoretic measures based on the notion of mutual information.

References

[1]
J. M. Abowd and L. Vilhuber. How protective are synthetic data? In J. Domingo-Ferrer and Y. Saygin, editors, Privacy in Statistical Databases, volume 5262 of Lecture Notes in Computer Science, pages 239--246, Berlin Heidelberg, 2008. Springer.
[2]
L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math., Math. Phys., 7:200--217, 1967.
[3]
J. Brickell and V. Shmatikov. The cost of privacy: destruction of data-mining utility in anonymized data publishing. In Proc. of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 70--78, 2008.
[4]
J. Domingo-Ferrer and V. Torra. Disclosure protection methods and information loss for microdata. In P. Doyle, J. I. Lane, J. J. M. Theeuwes, and L. Zayatz, editors, Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pages 91--110, Amsterdam, 2001. North-Holland.
[5]
V. S. Iyengar. Transforming data to satisfy privacy constraints. In Proc. of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 279--288, 2002.
[6]
J. Lane, P. Heus and T. Mulcahy. Data access in a cyber world: making use of cyberinfrastructure. Transactions on Data Privacy, 1(1): 2--16, 2008.
[7]
N. Li, T. Li, and S. Venkatasubramanian. t-closeness: privacy beyond k-anonymity and l-diversity. In Proceedings of the IEEE ICDE 2007, 2007.
[8]
W. Li. Mutual information functions vs correlation functions. Journal of Statistical Physics, 60:823--837, 1990.
[9]
J. M. Mateo-Sanz, J. Domingo-Ferrer, and F. Sebé. Probabilistic information loss measures in confidentiality protection of continuous microdata. Data Mining and Knowledge Discovery, 11(2), 2005. 181--193.
[10]
D. Rebollo-Monedero. Quantization and Transforms for Distributed Source Coding. PhD thesis, Stanford University, 2007.
[11]
D. Rebollo-Monedero, J. Forné, and J. Domingo-Ferrer. From t-closeness to pram and noise addition via information theory. In Privacy in Statistical Databases-PSD 2008, volume 5262 of Lecture Notes in Computer Science, pages 100--112, Berlin Heidelberg, 2008.
[12]
D. Rebollo-Monedero, S. Rane, A. Aaron, and B. Girod. High-rate quantization and transform coding with side information at the decoder. Signal Processing, 86(11):3160--3179, 2006.
[13]
P. Samarati. Protecting respondents' identities in microdata release. IEEE Transactions on Knowledge and Data Engineering, 13(6):1010--1027, 2001.
[14]
M. Trottini. Decision models for data disclosure limitation. PhD thesis, Carnegie Mellon University, 2003. http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf.
[15]
J. Xu, W. Wang, J. Pei, X. Wang, B. Shi, and A. W.-C. Fu. Utility-based anonymization using local recoding. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 785--790, 2006.

Cited By

View all
  • (2023)Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based SystemBig Data10.1089/big.2022.0201Online publication date: 31-Oct-2023
  • (2022)Quantifying the Effects of Anonymization Techniques Over Micro-DatabasesIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.314175410:4(1979-1992)Online publication date: 1-Oct-2022
  • (2019)The Influence of Conception Paradigms on Data Protection in E-Learning Platforms: A Case StudyIEEE Access10.1109/ACCESS.2019.29152757(64110-64119)Online publication date: 2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops
March 2009
218 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. information theory
  2. privacy

Qualifiers

  • Research-article

Funding Sources

  • Spanish Government

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 22, 2009
Saint-Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)3
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Big Data Confidentiality: An Approach Toward Corporate Compliance Using a Rule-Based SystemBig Data10.1089/big.2022.0201Online publication date: 31-Oct-2023
  • (2022)Quantifying the Effects of Anonymization Techniques Over Micro-DatabasesIEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.314175410:4(1979-1992)Online publication date: 1-Oct-2022
  • (2019)The Influence of Conception Paradigms on Data Protection in E-Learning Platforms: A Case StudyIEEE Access10.1109/ACCESS.2019.29152757(64110-64119)Online publication date: 2019
  • (2017)Practical Estimation of Mutual Information on Non-Euclidean SpacesMachine Learning and Knowledge Extraction10.1007/978-3-319-66808-6_9(123-136)Online publication date: 24-Aug-2017
  • (2016)A Genetic Algorithm Approach to Synthetic Data ProductionProceedings of the 1st International Workshop on AI for Privacy and Security10.1145/2970030.2970034(1-4)Online publication date: 29-Aug-2016
  • (2016)The Risk-Utility Tradeoff for Data Privacy Models2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS)10.1109/NTMS.2016.7792481(1-5)Online publication date: Nov-2016
  • (2016)Enhancing aggregation phase of microaggregation methods for interval disclosure risk minimizationData Mining and Knowledge Discovery10.1007/s10618-015-0432-z30:3(605-639)Online publication date: 1-May-2016
  • (2014)Redacting sensitive information in software artifactsProceedings of the 22nd International Conference on Program Comprehension10.1145/2597008.2597138(314-325)Online publication date: 2-Jun-2014
  • (2014)Data Privacy with $$R$$Advanced Research in Data Privacy10.1007/978-3-319-09885-2_5(63-82)Online publication date: 22-Aug-2014
  • (2012)Information fusion in data privacyInformation Fusion10.1016/j.inffus.2012.01.00113:4(235-244)Online publication date: 1-Oct-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy