CIPT Onl Mod4Transcript PDF
CIPT Onl Mod4Transcript PDF
Introduction
Introduction
Perhaps the greatest challenge for technology professionals is not the risk to individual privacy posed by
technology itself, but rather in determining how to utilize technology to safeguard privacy. The persistent
challenge that privacy programs face is how to translate abstract privacy harms into tangible, measurable,
and controllable technology capabilities. In this module, we will explore strategies and techniques for
countering privacy threats, as well as data- and process-oriented strategies that may be used to control
and protect personal information.
Information systems don’t contain people, they contain data about people. So, when we refer to identity,
we really mean the link between a piece of information and the individual or individuals associated with
that data. For example, a data item can be about an individual, created by an individual or sent to an
individual. Identity captures what we know about who that individual is.
Labels are characteristics that point to an individual. These can be precise, as in a name, or imprecise,
such as with an attribute, depending on the context. Labeling people based on an attribute, instead of by
name, may make them easier to identify in context. For instance, a man in a business suit may not be
identifiable within a large corporate office, but the same man in a business suit may be easily identifiable
among sunbathers on a beach.
Identifiers are codes or strings used to represent an individual, device or browser. The strength or weakness
of an identifier depends on how precise it is and the context of the data set in which it resides. Strong
identifiers may be numbers, such as those used in national identification, passports, or credit card numbers.
Weak identifiers are usually more general and may belong to more than one individual (for example, postal
zip code, area code or age). However, there are always exceptions and context is important: an individual
identified as being 80 years old may narrow the probability of identification considerably. However, if the
data set being analyzed is from an extremely elderly population, like residents at a nursing home, an 80-
year-old may not be as strong of an identifier because many individuals in that data set are 80 years old.
Of course, if you take the example to its extreme, you get to high risk of identifiability, such as identifying
an individual as 105 years old (a very small population in any data set).
Quasi-identifiers combine data with external knowledge, such as publicly available information, to identify
an individual. For example, in the case of a data breach where account numbers are disclosed, the
account numbers alone, while a strong identifier, may not be identifiable without additional information.
However, if an account number can be linked to an individual using publicly available information, then it
should be considered individually identifiable and be protected as such. Many laws define personal data as
any information relating to an identified or identifiable individual. When assessing identifiability, it is
important not only to think of internal data sets in which quasi-identifiers may be combined towards
identifiability but also publicly available data sets. A good example is the Netflix Prize data set from many
years ago (https://www.cs.utexas.edu/~shmat/shmat_oak08netflix.pdf).
Identifiability is the extent to which a person can be identified. High identifiability coupled with certain
actions (such as creating a new account, or responding to a website’s request for demographic
information) may put individuals at higher risk of tracking and identity theft. The privacy technologist
should understand when device identifiers, like IP addresses, should be used and consider any risks
associated with their choice of identifier.
Deidentification
Deidentification is one of the primary techniques used to prevent an individual’s identity from being
connected to their personal information. Deidentification can be accomplished in many ways such as
through OAuth2, PKI or SSL.
• Anonymization: In anonymized data, direct and indirect identifiers have been removed, and
mechanisms have been put in place to prevent reidentification. Although many techniques have
conditions under which data can be reidentified, even if only as an assumption, the risk is smaller
when data is anonymized.
• k-anonymity, l-diversity and t-closeness: K-anonymity, l-diversity and t-closeness are three
techniques that have been developed to reduce the risk of anonymity of data being compromised
by someone who might combine it with known information to make assumptions about individuals
in a data set. K-anonymity relies on the creation of generalized, truncated or redacted quasi-
identifiers as replacements for direct identifiers such that a given minimum number (“k”) of
individuals in a data set have the same identifier. L-diversity builds on k-anonymity by requiring at
least “l” distinct values in each group of k records for sensitive attributes. T-closeness further
extends l-diversity by reducing the granularity of data in a data set. Privacy technologists should be
aware of the strengths and weaknesses of each technique, as increasing protection of personal
information using these techniques may reduce the usefulness of the resulting data set.
Aggregation
Another common technique for protecting personal information is data aggregation, where information is
expressed in a summary form that reduces the value and quality of data as well as the connection
between the data and the individual it belongs to. While using aggregation may reduce privacy concerns,
it is often still possible to determine individual values from aggregated statistics.
Consider these factors of aggregated data that serve to further protect an individual from being identified.
Select each factor to learn more.
• Frequency versus magnitude data: When reviewing aggregate data, you must first determine if
the data is frequency data or magnitude data. The easiest way to distinguish these two types of
data is to determine whether individuals contribute equally or unequally to the value released. For
example, a count of the number of individuals at a given income and age is frequency data: each
individual contributes one to the cell he or she is in. A graph showing average income by age is
magnitude data: someone with a high income will affect the average much more than an individual
whose income is close to the average. For magnitude data, noise addition or entire suppression of
the cell is typically needed to ensure privacy; for frequency data, rounding techniques may well be
sufficient.
• Noise addition through differential privacy: When data is aggregated, personal identifiers are
removed from the data set being shared. However, it is still possible to reverse engineer the data
to discover the underlying identifiers that were used to create the aggregation (by using auxiliary
information, for example). One way to prevent reverse engineering is to “blur” the data points by
using noise addition through differential privacy. The goal is to ensure that the aggregated data is
still useful, while also making it nonspecific enough to avoid revealing the underlying identifiers.
This is done by using an algorithm to generate values that remain meaningful and yet are
nonspecific.
Select the button for an illustration of how differential privacy might work.
An organization created a draft report showing the distribution of salaries among its employees.
The draft report showed a single individual with the highest salary. Based on this information, it
was easy to assume that data point belonged to the organization’s CEO.
Building off the example above, the single individual earning a particularly high salary could be
identified by reverse engineering the counts by salary. So instead of providing a count of 1 for the
high salary category, a number could be generated using an algorithm. For example: if the value is
queried 5 times, it might produce values of 1.225, 1.768, 3.167, 2.012, 2.988. The algorithm must
ensure that the true value cannot be derived after repeated querying.
• Differential identifiability: While the algorithm used in differential privacy ensures that reverse
engineering does not result in privacy violations, there is no clear guideline on how much noise to
add before the quality of the aggregate value becomes poor. Differential identifiability improves on
differential privacy by setting parameters (based on the individual identification’s contribution) for
the algorithm to generate noise.
Encryption
The fundamental technology of encryption is used to protect privacy in an increasingly digital world.
Encryption is the rapid scrambling of collected information that will require authorized access. Encrypted
information is far less likely to be compromised and is better protected when sent over the internet or
stored on a laptop.
A cryptosystem is the entire collection of materials necessary to encrypt and decrypt information. This
includes algorithms and keys as well as the hardware, software, equipment and procedures needed to
encrypt, decrypt, transmit and manipulate information that is being protected. The strength of the entire
system is what determines the security of the encrypted data. The goal for the privacy technologist is to
use well-known and vetted algorithms where the security of the message lies with the key.
These terms are fundamental concepts of encryption. Select ”Algorithms” below to begin learning. Then
use the “Next” button to proceed through each of the below terms.
• Keys: A key is a small piece of data that controls an algorithm’s execution and is required to
encrypt and decrypt a message.
• Asymmetric encryption: Asymmetric encryption uses one key for encryption and another key for
decryption. The sender uses a public key to encrypt the message, and the recipient uses a private
key to decrypt the message.
• Record encryption: In record encryption, records are encrypted one record at a time. This
provides enhanced protection because the protection is more granular; however, record encryption
may cause performance issues because encrypting and decrypting data can be time consuming.
• Field encryption: Field encryption provides the ability to encrypt specific fields of data; typically,
fields that are considered sensitive, such as credit card numbers or health-related information.
• Quantum encryption: Quantum encryption, also called quantum cryptography, uses the
principles of quantum mechanics to encrypt messages in a way that prevents anyone other than
the intended recipient from reading them. Quantum encryption is limited by its lack of practicality,
including transmission distance and key generation rate, as well as the need for technology that is
still in the early stages of development.
• Public-key infrastructure: Public-key infrastructure makes public-key cryptography workable by
providing tools for obtaining and verifying public keys that belong to individuals, web servers,
organizations and other entities that require digital identification.
• Homomorphic encryption: Homomorphic encryption allows encrypted information to be
manipulated without first being decrypted. Early homomorphic encryption was too slow to be of
practical use but is now fast enough to use with some applications that require high degrees of
privacy and security.
• Polymorphic encryption: In polymorphic encryption, the algorithm (the encryption/decryption
pair) is mutated with each copy of the code, while the outcome of the encryption remains the same
for any given key. The advantages of this type of encryption are that due to the frequent changes
the algorithm becomes more difficult to recognize over time, and it becomes harder to decrypt
because of the lack of an obvious relationship between the algorithm and the results.
• Mix networks: Mix networks, also known as onion routing networks, are a way to hide one’s
traffic within a crowd by combining traffic from multiple computers into a single channel that is
sent between multiple computers, and then separating the traffic out again.
• Secure multiparty computation: Secure multiparty computation is a class of algorithms that
allows programs running on different computers to participate in computations such that results
can be computed without compromising each party’s private data. Currently, multiparty
computation algorithms are faster than homomorphic encryption algorithms but harder to set up
because of the need to coordinate the computations.
• Private information retrieval: Private information retrieval (PIR) is a range of protocols through
which data can be retrieved from a database without revealing to the database or another observer
the information that is retrieved. PIR systems provide for data access but not necessarily for data
modification. Some PIR systems allow for the database to perform sophisticated operations, such
as searching for documents that match a particular keyword, or for retrieving encrypted documents
in sorted order, all without knowing the underlying data that is being operated upon.
Nicholas Merker, CIPT, National Partner, Intellectual Property, Baker & McKenzie LLP, IAPP Faculty
Member
Alright, switching gears to encryption. So, this area… I’ve given this training a lot for the IAPP and this is
one area of confusion for a lot of folks who take this course. And so, what I’d like to do with this little
snippet is to describe what encryption is and, in particular, give the difference between symmetric and
asymmetric encryption. In my experience this is an area a lot of people trip up on.
So, at its very core, encryption is a combination of algorithms and keys. And the keys are put into the
algorithm in order to take plain text and turn it into encrypted text, or take encrypted text and turn it into
plain text. And, if you’re not familiar, plain text is something you can read, like, “The quick brown fox
jumped over the lazy dog.” And encrypted text is something you cannot read, where if you looked at it on
a screen I would call it gobbledygook—you don’t know what it says. And the translation from plain text to
encrypted text and back is done by using an encryption algorithm and putting a key into that algorithm.
There are many types of encryption algorithms. There’s AES, the Advanced Encryption Standard, which is
kind of the norm now, but if you go way back there were tons and tons of different encryption algorithms
that were used. There was things like Blowfish, if you look at proprietary algorithms. All these algorithms
out there, they’ve kind of all boiled down to just a few that have really won out the day.
Another important concept in encryption is the difference between symmetric encryption and asymmetric
encryption. And what symmetric encryption is, it’s where you take plain text and you convert it to
encrypted text; you take encrypted text and you convert it to plain text, using the same key. There is only
one key in a symmetric encryption algorithm, and the example of this that I think most people would be
familiar with is called the Caesar cipher. If you remember that, or it’s also called ROT7 or ROT13 if you’ve
heard that before, but it’s essentially where you take the letter A and you move it down the alphabet
seven steps. You take the letter B, you move it down the alphabet seven steps. And that is your key. You
shift the letter A by seven places, you shift the letter B by seven places, whatever the letter is, you just
shift it seven places, and then it’s the same key going back. You take the encrypted text and you shift it
seven places the other way.
One huge advantage to symmetric encryption is that it’s incredibly fast. That is a very, very small key.
Take this letter, shift it seven places. Very, very, very small key; very, very quick to do. One of the major
disadvantages of symmetric encryption is, “How do I get this key to you?” If you see a bunch of encrypted
text sitting there and you don’t have the key, how am I going to get you that key? And if that key is
intercepted, then someone can not only perform encryption with the key, they can perform decryption
with the key. So that’s were asymmetric encryption comes in. Asymmetric encryption requires a key pair.
Most commonly it would be referred to as a public key and private key. So, what you do in this context
is—and it’s in a mathematical relationship where anything encrypted by the public key can only be
decrypted by the private key, and vice versa. If I encrypt something with a private key, it can only be
decrypted with the public key.
So, this is commonly done where I would hold my private key near and dear to my heart, I’d give it to
absolutely no one. No one can ever see this private key but me. I take my public key and, as its name
implies, I put it out on the internet. I make it available to absolutely anyone. Then what happens is, let’s
say, you want to send me a message, an encrypted message. You would write out the message: “The
quick brown fox jumps over the lazy dog,” and you would encrypt it using my public key. So, you take this
plain text, you perform an encryption algorithm using the public key, and now you have the gobbledygook
encrypted text on the other end. You send that encrypted text to me, and because of the relationship with
these key pairs, only my private key, which I’m the only one who has access to, can decrypt that
message. So, I take the gobbledygook, I take my private key and decrypt it, and now I get, “The quick
brown fox jumps over the lazy dog.” That is asymmetric encryption.
Asymmetric encryption, because it requires this public and private key pair, is slower. The keys are much,
much bigger, and the process is slower. But it solves the problem that we have with symmetric
encryption, where I can just put my public key out anywhere on the internet for anyone to consume, and
as long as I don’t lose that private key, then I am going to have a protected encryption process.
Asymmetric encryption is commonly used on the internet—SSL, TLS commonly uses asymmetric
encryption. Also, cryptocurrency, digital wallets are used with a private key and a public key
infrastructure. If you’ve heard PKI, or Public Key Infrastructure, that is asymmetric encryption.
Asymmetric encryption is used everywhere, and it’s something that—the internet would not work without
asymmetric encryption.
Access management is an essential tool in enforcing privacy requirements regarding who is able to access
data. Access control lists can restrict the individuals, devices or services that may access a resource or set
of resources, and sophisticated access management techniques can restrict access to data based on the
type of data being accessed, the role of the person accessing the data, the location of the user, the time
of day, or the type of device being used to access the data. However, access management cannot ensure
that people with legitimate access to the data do the right things with the data once it’s in their
possession. Those who access the data must have appropriate training and accountability, as well as an
understanding of the legal basis for processing the data.
The idea of least privilege focuses on granting individuals and services the lowest possible access rights to
resources that still allows them to perform required duties. This practice minimizes the ability of the user
to access unnecessary resources or execute unneeded programs. Following a least-privilege regime can
minimize what information can be accessed by hackers or malware, since hackers or malware will be
restricted to the data that the person who was hacked has access to. However, care must be taken to
avoid being too limited in the granting of privileges—if users must constantly ask for additional privileges
it may result in productivity challenges and frustration for both employees and technology administrators.
Both user-based and role-based access controls allow administrators to manage and control the access
rights of a set of individuals in the same way they can manage access controls for one person.
User-based access controls rely on the identity of the user to determine whether to grant or deny access
to a desired resource (for example, a file, directory or website), as well as the type of access to be given.
It is a good way to protect data but puts an extra burden on administrators because users must be added
and removed as their requirements change.
Select the information icon for a tip on a hybrid of role-based and user-based access controls.
TIP: A common approach is a hybrid of role-based and user-based access controls. The role-based
controls are used for the “standard” access control settings, and the user-based controls are meant for
exceptions to the rule. This way, a company can quickly identify outliers during a privacy breach.
Control over the access to resources on a network is based on the context in which the employee is
connected to the network. The broader the context of authority, the more challenging it will be to manage
the privacy of resources (more data, more privacy policies, and more interactions).
The ability to provide cross-enterprise authentication and authorization (also known as single sign-on, or
SSO) is a powerful tool that can streamline access to resources. With single sign-on, users must
remember only one ID and password that can be used across multiple sites, applications and services. If
the password is ever compromised, the user can go to the identity service and reset his or her password
for all sites in one location. However, single sign-on is not without risk. Because a user can be tracked
across each site where their credentials are used, users should be requested to reauthenticate before
allowing access to sensitive resources.
In a federated identity model, a person’s identity is authenticated in a trusted centralized service. All other
services that require knowledge of the person’s identity refer to the trusted centralized service to acquire
the person’s identity. This is often done using tokens that are generated by the trusted service and passed
to the providing service by the user. This reduces the exposure of personal information and the intrusion
of malicious authenticators. A common example of this is single sign-on (SSO).
Remote access, telecommuting and allowing employees to access enterprise resources on their personal
devices, commonly known as “bring your own device” or BYOD, can provide benefits to both employees
and employers, but they do pose security and privacy risks which should be considered. These risks may
include the potential exposure of employee personal information, greater exposure to malware, potential
device theft, and the use of incorrectly configured devices. There is also a risk that individuals who access
the network remotely cannot necessarily be verified as employees. To reduce these risks, administrators
should allow only approved personal devices, provide notice to and obtain consent from users, limit data
transfers and types of access, mandate device controls, and limit social access. To reduce the risk of
unauthorized access, administrators should limit network access, require manual authentication, and use
multifactor authentication. Other mitigations to consider include the use of virtual private networks,
demilitarized zone networks, and firewalls.
Perspectives: Why shouldn’t employees have more access than they need?
Janella Hsia, CIPP/E, CIPP/US, CIPM, CIPT, Principal, Privacy Swan, IAPP Faculty Member
So when, when we set up an access and identity management program, usually we have to take away
privileges from people, because generally people have access to systems they’re not supposed to have
access to, or they have the wrong permissions in systems. And that’s sometimes a difficult conversation,
when you have to say to people, “We’re going to take away the access that you’ve had.” And so, when we
do that, I try to explain to people that the reason that we’re doing this is, it’s actually a benefit to them.
We’re actually making sure that we’re reducing the temptation, sometimes, to look at data that they
shouldn’t be looking at, or we’re preventing them for changing data that they maybe shouldn’t be
changing. So, by reducing the access that they have, it’s actually a benefit to the person and a benefit to
the organization. But some people can feel like it’s a demotion or that something’s being taken away. So,
you have to make sure that, as you’re doing this, that you explain the benefit for having less access to
information.
When I was—a long time ago, I was responsible for data centers and I actually did an audit, which you
should do on all of your systems, you should audit who has access to the systems. And when I did this
audit for this data center, everybody in the building had access to the data center, but that wasn’t needed.
We only needed the technicians who needed to get into the data center should have access to it. And so I
had to take access away from, you know, 90% of the people.
The hardest people to take that access away, sometimes, is the C-level executives. And they really had a
hard time with that. And so, I explained to them that this was actually a good thing. That if their
credentials got compromised, or if their badge got lost or stolen, that if we didn’t actually have to rekey or
reset everything because their access was so high, that if we were giving them the right access at the
right level, then we had less worry over their credentials. And when we think about things like phishing
and credential compromise, which happens all of the time, that’s really a good reason for people to
understand why they shouldn’t have more access than they actually need to do their job.
Authentication
Authentication ensures that the right individuals are accessing the right resources. It can be accomplished
using a variety of mechanisms that generally fall into four categories:
Authentication may require a single factor, using only one of the aforementioned categories (for example
a password, a PIN number, or an RFID card, or a fingerprint); or multiple factors, such as requiring an
individual to scan an ATM card plus enter the correct PIN number, or to enter an ID and password plus a
code that is sent to the user’s cell phone.
Digital Rights Management: DRM is used to ensure that digital content is only delivered to those
who are authorized to receive it. It can also limit what assigned users can do with the content. For
example, a person may be permitted to read a document, but not allowed to print it, email it to
others, copy content from it or modify it.
Summary
• Identity is the link between a piece of information and the individual or individuals associated with
that data; it captures what we know about who that individual is.
• Labels are characteristics that point towards an individual and can be precise (e.g., an individual’s
name) or imprecise (e.g., an individual’s attribute).
• Identifiers can be strong (e.g., a passport number) or weak (e.g., a postal zip code). However,
there are exceptions. An individual labeled as being 105 years old would narrow down the
possibilities as to who that individual is.
• Identifiability is the extent to which a person can be identified and high identifiability in conjunction
with certain actions may put an individual at a higher risk for identity theft.
Review
A token
A national ID number
A corporate ID number
Single sign-on credentials
Deidentification
Anonymization
Algorithm
Label
Key
Label
Token
Pseudo-identifier
4. What type of encryption uses one key for encryption and another key for decryption?
Field
Symmetric
Application
Asymmetric
Process-oriented strategies
Learning objective
Process-oriented strategies
There are many strategies for countering possible privacy violations. Some of these strategies are based
on data and others are based on process. Process-oriented strategies often focus on four main areas: (1)
enforcing policies and processes, (2) demonstrating compliance, (3) informing the individual, and (4)
providing user control.
Organizations must commit to processing personal information in a privacy-friendly way and ensuring
these commitments are honored. Policies, such as when to use encryption, should be routinely analyzed
and applied at the appropriate level. In the privacy-by-design model this would generally be done at the
implementation phase; however, some processes and policies are context-specific and should be
developed as such. For example, if the collection of sensitive personal information is necessary for an
organization’s operations, legal and compliance requirements must be addressed, and employees must be
properly trained. From the perspective of a privacy technologist, policy and process enforcement may
include the use of encryption, the application of security measures so that access to the information is
limited, and the use of automated destruction techniques once the information is no longer needed.
Select each button to learn more about the steps that may be taken to create, maintain and uphold
organizational privacy policies and processes:
• Create: Organizations should create internal privacy policies that best describe how the
organization wishes to manage—and plans to protect—personal information; this includes ensuring
that company employees are adhering to these policies. Processes should include specific strategies
and tactics for data protection, such as hash functions, truncation, tokenization, or cryptography,
as well as contextual policies that explain the reason or purpose for privacy practices.
• Maintain: Organizations should maintain established policies and processes to ensure consistency
of privacy practices throughout the organization. An example of an inconsistent policy would be if
an organization had very strong controls in place to protect customer credit card numbers but did
not have established policies for protecting their own employees’ data. Another example would be
if the organization had a strong policy regarding protecting customer credit card data for one
stakeholder, but less stringent policies for a different stakeholder.
• Uphold: Organizations should uphold privacy and data protection policies as guiding principles
across the organization, treating personal information as an asset and privacy as a primary goal.
Upholding privacy and data protection policies—through employee training, process auditing, and
policy compliance enforcement—helps to demonstrate that the organization values, and is
dedicated to protecting, personal information in all forms and contexts.
Demonstrating compliance
An organization can demonstrate to regulators that it is processing personal information according to its
established privacy policies and procedures by following these key steps. Select each box to the right to
learn more.
• Log: Track all processing of data and review the information for anything that may present a risk.
Any deviations from standard processing procedures, whether due to design, chance, or malicious
actions, should be logged. Logs should be periodically reviewed so that post-activity sanctions,
process changes or technology changes can be imposed.
• Audit: Perform audits regularly to ensure that logging and organizational activities are following
established processes. Auditing provides visibility and an understanding of risks and ensures that
both formal and informal processes are identified, managed and followed.
• Report: Periodically collect information on tests, audits, and logs, and report feedback to those
personnel who are responsible for policy and process implementation within the organization. This
allows organizations to look at their privacy activities holistically and to use that information to
improve privacy practices and processes.
An organization informs individuals about the processing of their personal information by providing a
privacy notice and being transparent about how their information will be collected, used and shared.
Select each box to learn more about what should be considered when informing individuals about
organizational privacy practices.
• Supply: Users should be informed about what personal information is being processed, as well as
organizational policies and procedures for processing, and potential risks. However, communication
about the collection, use and disclosure of personal data can fail due to vague, incomplete or
misinterpreted disclosures, or by overwhelming the recipient with information. Some individuals
also may not fully appreciate the ramifications of what they are being told or agreeing to. Privacy
technologists should consider how to best communicate privacy practices to individuals in a clear,
comprehendible and concise way.
• Notify: Individuals should be notified by the organization if the personal information they provided
has been breached or if the organization wishes to use the information in a manner inconsistent
with the original disclosure or consent. Some countries or jurisdictions have regulations that
prohibit the use of data for a purpose or in a manner that was not originally disclosed, although
reuse may be allowed if the organization makes the individual aware of the new use and allows the
user to either provide updated consent or withdraw consent for the new use. If personal
information is exposed as the result of a breach, individuals should also be notified so they may
take corrective or remedial action if needed.
• Explain: Privacy notices should provide information in a concise and understandable form and
clearly explain why the processing is necessary. This concept is so important that the EU included a
requirement in the GDPR that privacy statements provided to data subjects should be “concise,
transparent, intelligible and [in an] easily accessible form, using clear and plain language.” (Article
12, paragraph 1).
User control
Applying the strategy of giving individuals control over their personal information allows for a more
balanced relationship between the individual and an organization. The following measures can be used to
provide individuals with control.
• Consent: The organization processes only personal information that has been freely given based
on explicit and informed consent.
• Choice: The organization allows the individual to select or exclude the personal information that
can be processed.
• Amend: The organization gives individuals the means to keep their personal information accurate
and up to date.
• Delete: The organization honors the individual’s right to have any personal information removed in
a timely manner, if requested.
Summary
• Process-oriented strategies often focus on four main areas: (1) enforcing policies and
processes, (2) demonstrating compliance, (3) informing the individual, and (4) providing user
control.
• Organizations must commit to processing personal information in a privacy-friendly way and
ensuring these commitments are honored. Policies should be routinely analyzed and applied at
the appropriate level.
• An organization can demonstrate compliance by following these steps: log (tracking all data
processing and reviewing information for risk factors); audit (performing regular audits to
ensure correct procedural following); and report (collecting information and providing feedback
to those individuals responsible for the implementation of an organization’s policy and process).
• An organization informs individuals about the processing of their personal information by
providing a privacy notice and being transparent about how their information will be collected,
used and shared.
• Users should be:
o Informed about what information is being processed and potential risks.
o Notified if their information has been breached or used in a non-disclosed way.
o Provided privacy notices in a concise form to ensure understanding.
• Individuals should be given control over their personal information through the use of: consent,
choice, amending and deleting.
Review
1. Which of the following is NOT one of the four main areas process-oriented strategies focus on?
Demonstrating compliance
Providing user control
Encrypting data
Enforcing policies
Data-oriented strategies
Learning objective
Separating data
In addition to process-oriented data protection strategies, there are also strategies that are more data-
oriented. Some possible data-oriented strategies include separating the data, minimizing the data,
abstracting the data and hiding the data.
When data is collected on an individual, it is often shared with external organizations, such as third-party
vendors. To minimize the risk of a threat actor using multiple sources of data to correlate information
about a particular individual, data may be separated through distribution or isolation.
Distributing data
One method of separation is to distribute the collected information by either logically or physically
segregating it. For example, when a new employee starts at a company, some information, such as
accommodations for disabilities, may go to the human resources department, while salary information
would be directed to payroll. Logical separation may mean placing restricted access on sensitive data that
only allows certain users to access it. For example, a database architect might separate data at the
database layer, rather than at the application layer, in order to prevent programming or configuration
errors from allowing a threat actor to combine data. Physical separation might be accomplished by placing
different data sets on different, physical distinct servers.
Isolating data
Isolating data prevents others from having access to any other network traffic. A commercial entity, for
example, may use a specific program exclusively for purchases, or a customer might be asked to create
an account in a billing portal that exists separately from the rest of the network.
Minimizing data
Minimization involves limiting the amount of personal information that needs to be processed. Ideally, this
should be done at the time of collection by avoiding or preventing the collection of unnecessary
information, but the concept should extend throughout the information life cycle.
There are several steps which should be considered when an organization wishes to minimize the amount
of data being collected and processed.
• Exclude unnecessary data: Information that is not critical to the purpose of collection should be
excluded from collection at the start. Process designers should examine the privacy implications of
the information being collected to determine if collection is truly necessary, or if a design change
could minimize the amount of information collected.
• Select what data will be processed: Selection is similar to exclusion; an organization or
commercial entity can decide, case by case, to process only relevant personal information. For
example, if an individual is placing an order online for in-store pick up rather than delivery, the
individual should not have to provide their home address.
• Strip unnecessary data: Stripping allows for removal of unnecessary information for further
processing or distribution. For example, a service or company may have your name and address on
file but might not need to keep a credit card on file. Online commercial entities can have attributes
built into an address verification system that strips out all data, with the exceptions of street
number and postal code.
• Destroy data when it is no longer needed: Once any personal information is no longer needed,
organizations should plan to destroy it or remove it completely from a system. There are three
times when an organization typically should destroy data:
1. When the data was inadvertently collected.
2. When the data is no longer needed for the purpose for which it was collected.
3. When an individual requests to have their data deleted.
The last two examples may require additional steps for data destruction because once information has
been processed, it may be backed up into the cloud, shared with other systems, or moved into disaster
recovery systems. Setting retention periods and processes prior to collection will help expedite the
destruction process and eliminate stockpiling of unnecessary data.
Abstracting data
Abstraction limits the amount of detail in which personal information is processed. Reducing the precision
of data, while retaining the accuracy and suitability for its purpose, may yield the same desired results for
an organization collecting personal information.
For example, when creating an account online, users typically need to be over a certain age, and yet it
may not be necessary to know the individual’s exact date and time of birth. This information could be
limited to month and year, or a simpler question could be asked, such as, “Are you 14 years or older?”
Grouping aggregates data into correlated sets rather than processing it individually. Algorithms are
sometimes used to “crowdsource” connections by grouping individuals based on previous purchases. For
example, an online retail company might use an algorithm to aggregate purchasing data and suggest
additional purchases to shoppers (perhaps noting that individuals who bought hammers also bought
nails).
Summarizing puts detailed information into categories based on more abstract attributes. While grouping
information is about correlations, summarizing separates out a data element about individuals from
correlated groups. Using the previous example, the information collected on customers’ purchasing habits
could be summarized by their ages. If that business wanted to offer a gift or a special on a customer’s
birthday, that information would need to be further summarized by month and date.
Perturbing adds approximation or “noise” to data to reduce its specificity. For example, if technology
architects are developing a wayfinding app that allows a user to alert other drivers to potential road
hazards, the architects need to be cognizant of other information that could be transmitted (such as the
driver’s name, speed, or current location), and determine if conveying that information is necessary. They
might introduce “noise” by establishing a slight delay or combining similar reports from multiple drivers.
Hiding data
Select each box to the left to reveal ways in which information can be protected by implementing a hiding
strategy.
• Restrict: Restricting prevents unauthorized access, for example, by requiring log-in credentials or
an encryption/decryption key before allowing access to specific data or functionality. Access
authorization may be based on a user’s role (for example, customer service representative) or a
given attribute (perhaps age or group membership) or a given context (such as a user who was
redirected from an authorized website), or it may be specific to an individual (for example, the user
can access only their own data).
• Mix: Mixing processes personal information randomly within a large group to reduce correlation.
For example, if a cable television company wants to determine viewership for each channel they
carry, they would not need to know what customers are watching what channels. If this
information was sorted by customer, viewing patterns might become clear, which would reveal
unnecessary information about that customer (perhaps that they had children or had a specific
political affiliation); however, mixing the data would provide the needed information without
revealing personal details that could be tracked back to a specific individual or household.
• Obfuscate: Obfuscation obstructs the ability to read or understand personal information. This is
most commonly done with encryption or hashing but could also be done simply by using a code or
little-known language.
• Dissociate: Dissociation removes the correlation between subjects and their personal information.
An example might be a restaurant that offers food delivery. Initially, the restaurant needs to know
which customer ordered what food to complete the delivery; however, once the delivery is made,
the correlation could be dissociated if all the restaurant needed to know going forward was how
much of certain foods were ordered over the course of a month across all its customers to plan
food orders for the next month.
• Masking: Data obfuscation, also known as data masking, takes real data as its starting point and
applies various kinds of manipulation to reduce the risk represented by the original data while
preserving desired properties. Transformation techniques include removal of particular fields,
suppression of particular values, shuffling of values across records, encryption or hashing of values
and generalization of values to render the data less precise. Most data masking techniques are
irreversible in order to protect the original data. When data masking is reversible, usually the
technique being deployed is actually tokenization. Dynamic masking is a form of masking that is
performed real-time upon a query being executed, usually on a database. If, for example, a non-
privileged user requests a sensitive data field from a database, the database may strip out portions
of the sensitive data to preserve its confidentiality while still rendering the data useable to the
user.
Summary
• To minimize the risk of a threat actor using multiple sources of data to correlate information about
a particular individual, data may be separated through distribution (logically or physically
segregating it) or isolation.
• Minimizing data limits the amount of personal information that needs processing. This can include:
excluding unnecessary data, selecting what data will be processed, stripping unnecessary data or
destroying data when it’s no longer needed.
• Abstraction limits the amount of detail in which personal information is processed and can be
achieved through grouping, summarizing or perturbing data.
• Hiding data protects personal information by making it unobservable to others.
Review
1. Which of the following are ways in which an organization can lessen the risk to personal data? Select
all that apply.
Minimization
Retention
Abstraction
Aggregation
Hiding
Review answers
Strategies and techniques for protecting privacy
1. Single sign-on credentials
2. Deidentification
3. Key
4. Asymmetric
Process-oriented strategies
1. Encrypting data
Data-oriented strategies
*Quiz questions are intended to help reinforce key topics covered in the module. They are not meant to
represent actual certification exam questions.