Module 4 Flashcards

Technical Meaures and Privacy-Enhancing Technologies

1
Q

Identity

A

Link between a piece of information and the individual or individuals associated with the data; captures what we know about who that individual is. In the language of data, identity are codes or strings used to represent and individual, device or browser. The more precise the identifier, the stronger the identifier. A strong identifier typically are numbers (SSN, Credit card numbers, etc.) and weak identifiers tend to be more general that may belong to more than one individual (zip code, area code, etc.). It should also be noted the strength and weakness can be affected by context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quasi-identifiers

A

Combine data with external knowledge, such as publicly available information, to identify an individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Deidentification

A

A technique used to prevent an individual’s identity from being connected to their personal information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Psuedonymization

A

Replacing individual identifiers with numbers, letters, symbols or a combination of these, such that data points are not directly associated with a specific individual. Note: as long as the original state and the pseudonym is documented, the original data can be restored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Anonymization

A

This method completely removes or alters personal information so that it’s impossible (or extremely difficult) to trace the data back to an individual. There’s no way to reverse it or find out who the data belongs to, even if you have more information.

Example: If a research study deletes all identifying information about participants (like names, addresses, or unique IDs) and generalizes data (e.g., “a 35-year-old male” instead of specific details), then even the researchers cannot figure out who the participants are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tokens

A

Is a system of deidentifying data which uses random tokens as stand-ins for meaningful data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

K-anonymity

A

It’s built on the idea that by combining sets of data with similar attributes, identifying information about any one of the individuals contributing to that data can be obscured. k-Anonymization is often referred to as the power of “hiding in the crowd.” Individuals’ data is pooled in a larger group, meaning information in the group could correspond to any single member, thus masking the identity of the individual or individuals in question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

I-diversity

A

It’s built on the idea that by combining sets of data with similar attributes, identifying information about any one of the individuals contributing to that data can be obscured. k-Anonymization is often referred to as the power of “hiding in the crowd.” Individuals’ data is pooled in a larger group, meaning information in the group could correspond to any single member, thus masking the identity of the individual or individuals in question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

t-closeness

A

A property of a dataset and an extension of k-anonymity that measures the diversity of sensitive values for each column in which they occur.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Aggregation

A

Information is expressed in a summary form that reduces the value and quality of data as well as the connection between the data and the individual it belongs to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Frequency versus magnitude data

A

When reviewing aggregate data, you must first determine if the data is frequency data or magnitude data. Frequency Data: This tells you how often something happens or how many times an event occurs. It’s simply about counting how frequently something takes place. An example; Imagine you’re looking at data from a school. If you want to know how many students got an “A” grade in math, frequency data would show that 30 students received an “A.” It’s just counting how many times the grade “A” appeared.

Magnitude Data: This measures how large or intense something is. It tells you about the size, amount, or level of something, not just how many times it happens. An example; If you’re looking at the total sales for a store, magnitude data would show how much money was made (e.g., $50,000 in sales last month). It’s about the total value, not how many transactions occurred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Noise addition through differential privacy

A

When data is aggregated, personal identifiers are removed from the data set being shared. However, it is still possible to reverse engineer the data to discover the underlying identifiers that were used to create the aggregation (by using auxiliary information, for example). One way to prevent reverse engineering is to “blur” the data points by using noise addition through differential privacy. The goal is to ensure that the aggregated data is still useful, while also making it nonspecific enough to avoid revealing the underlying identifiers. This is done by using an algorithm to generate values that remain meaningful and yet are nonspecific.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Differential Identifiability

A

While the algorithm used in differential privacy ensures that reverse engineering does not result in privacy violations, there is no clear guideline on how much noise to add before the quality of the aggregate value becomes poor. Differential identifiability improves on differential privacy by setting parameters (based on the individual identification’s contribution) for the algorithm to generate noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Encryption

A

The rapid scrambling of collected information that will require authorized access.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Algorithms

A

Mathematical applications applied to a block of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Keys

A

Small piece of data that controls an alorithm’s execution and is required to encrypt and decrypt a message.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Symmetric encryption

A

Way of keeping information secret by using a special code, called a key, to lock (encrypt) and unlock (decrypt) the information. Think of it like a padlock where the same key is used to lock and unlock the padlock. The same key is used to both lock (encrypt) and unlock (decrypt) the information. The advantage of this type of encryption is that it is fast and effective when compared to assymetric encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Asymmetic encryption

A

way of keeping information secure by using two different keys: one for locking (encrypting) the information and another for unlocking (decrypting) it. Think of it like a mailbox: anyone can put a letter in (encrypt), but only the person with the key can open the mailbox and read the letter (decrypt). How it works:

Public Key (Locking/Encrypting): You have a key that you share with everyone. This key is used by others to lock up information they want to send you. It’s like the open slot of a mailbox where anyone can drop a letter in.

Private Key (Unlocking/Decrypting): You have another key that you keep secret and don’t share with anyone. This key is used to unlock the information and read it. It’s like the key that opens the mailbox, allowing you to take out and read the letters.

The advantage is that it is highly secure, but is slower than symmetric encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Application encryption

A

File-level or document-based encryption, provides built-in encryption that is applied throughout a program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Record encryption

A

Records are encrypted one record at a time. Provides enhanced protection, but is time consuming and may cause performance issues.

21
Q

Field encryption

A

Ability to encrypt specific fields of data (credit card number, for example).

22
Q

Quantum Encryption

A

Uses principles of quantum mechanics to encrypt messages in a way that prevents anyone other than the intended recipient from reading them.

23
Q

Public Key Infrasturcture

A

Makes public-key cryptography work by providing tools for obtaining and verifying public keys that belong to organizations, individual, web servers and other entities.

24
Q

Homomorphic Encryption

A

Special type of encryption that allows data to be processed and analyzed without ever being decrypted. In other words, it lets you work with encrypted data as if it were still in its original, readable form, but without ever exposing the actual data.

25
Q

Polymorphic encryption

A

Polymorphic encryption is a type of encryption that changes, or “morphs,” every time it’s used, even if you’re encrypting the same data over and over. This makes it much harder for attackers to crack because the encryption pattern is always different.

How It Works:
Encrypting Data: When you encrypt data, polymorphic encryption uses a method that changes the encryption key or the way the data is encrypted each time. So, even if the same data is encrypted multiple times, the output looks different every time.

Dynamic Changes: The encryption process adapts or changes over time, making it unpredictable and much more secure against attacks that rely on spotting patterns.

Decrypting Data: When it’s time to decrypt the data, the correct key or method is used to reverse the encryption, turning the data back into its original form.

26
Q

Mix network

A

A way of hiding one’s traffic by combining the traffic of multiple computers into a single channel. Eventually the traffic is separated again when on receiving end.

27
Q

Secure Multiparty Computation

A

Class of algorithms that allows programs running on different computers to participate in computations such results can be computed without compromising each party’s private data.

28
Q

Private information retrieval

A

A range of protocols through
which data can be retrieved from a database without revealing to the database or another observer
the information that is retrieved.

29
Q

Access Management

A

Simply where an individual user has restrictions on what can be accessed and not be accessed. Restrictions may be based on type of data being assessed, role of person, location of user, time of day and type of device.

30
Q

Principle of Least Privilege

A

Gives an individual the minimal access to do their job (only provide access to the data that is absolutely needed).

31
Q

User-based access

A

Based on providing access based on the individual user.

32
Q

Role-based

A

In access management is a system that controls who can access specific data or perform certain actions based on their role within an organization. Instead of giving everyone access to everything, it restricts access based on what someone’s job requires.

33
Q

Authentication

A

Based on providing access based on the individual user.

34
Q

Something you know

A

User name or password

35
Q

Something you are

A

Fingerprint or face/voice recognition

36
Q

Something you have

A

Token, Keys, Id badges, smart card

37
Q

Where you are

A

Physical location

38
Q

Multifactor authentication (MFA)

A

Security method that requires you to prove who you are in more than one way before you can access something, like your online bank account or email. It’s like having multiple locks on a door—each lock needs a different key to open it.

39
Q

Domain-based Message Authentication, Reporting and conformance (DMARC)

A

security protocol that helps protect email senders and receivers from spam, phishing, and other email-based attacks. It’s like setting up a rulebook for how emails claiming to be from your domain (e.g., yourcompany.com) should be handled by email servers to make sure they’re genuine and not fake.

How It Works:
Authentication: DMARC checks if an email that says it’s from your domain is actually from you. It does this by verifying that the email passes two other checks—SPF (Sender Policy Framework) and DKIM (DomainKeys Identified Mail). Think of SPF and DKIM as ID checks that DMARC looks at.

Reporting: DMARC allows you to receive reports from email servers about emails sent from your domain. These reports tell you if emails are passing or failing the DMARC checks. It’s like getting feedback from the bouncers at different clubs on whether people pretending to be from your company are getting in or not.

Conformance: You set rules for what happens if an email fails the DMARC check. For example, you can tell email servers to reject, quarantine, or accept emails that fail. It’s like deciding what the bouncer should do if someone fails the ID check—turn them away, put them in a waiting area, or let them in with a warning.

40
Q

Digital Rights Management

A

Used to ensure that digital content is only delivered to those who are authorized to recieve it. DRM is a technology used to control how digital content, like music, movies, e-books, and software, is used and shared. Think of it like a set of locks that content creators put on their products to make sure they’re only used in ways that they allow.

41
Q

Process-oriented strategies

A

Approaches that focus on improving the way tasks are done, rather than just focusing on the end results. The idea is to optimize and refine the steps involved in completing a task to make the whole process more efficient, consistent, and effective. There are four main areas of focus:

  1. Policy and process enforcement
  2. Demonstrating compliance
  3. informing the individual
  4. Providing user control
42
Q

Isolating Data

A

Prevents others from having access to any other network traffic. For example, a customer might be asked to create an account in the billing portal that is entirely seperate from the rest of the network.

43
Q

Distributing Data

A

Method of seperating personal data by either logically or physically segregating it. For example, diability data may go to the HR department, while salary data goes to payroll.

44
Q

Minimizing Data

A

Limitng the amount of personal information that needs to be processed. This can be done by excluding unnecessary data, select what data will be processed, strip unnecessary data, and destroy data when it is no longer needed.

45
Q

Abstraction

A

Limits the amount of detail in which personal information is processed. Am example would be when you sign up for a service online, you might enter your full name, address, phone number, and email. However, when the service sends you a confirmation email, they might only show your first name and the last few digits of your phone number. The rest of your information is hidden or not shown at all.

46
Q

Grouping

A

Data abstraction method that groups aggregate data into correlated sets rather than processing it individually. This means taking individual pieces of personal information, grouping them together, and then summarizing or analyzing them as a whole rather than looking at each person’s details separately. This process helps protect individual privacy while still allowing useful insights to be drawn from the data. Here is an example:

Imagine a company wants to understand the spending habits of its customers without looking at each customer’s specific purchases. Instead of examining what every single customer bought, the company groups customers into categories like age groups or income levels.

For instance, they might look at how much people aged 25-35 spend on average compared to people aged 36-45. They might notice that the 25-35 group spends more on online shopping, while the 36-45 group spends more on groceries.

In this example, the company isn’t looking at what you, specifically, are buying. Instead, they group customers by age and analyze the average spending of each group. This way, your individual purchase history remains private, but the company can still understand trends across different age groups. This is data abstraction—focusing on group-level insights rather than individual details to protect personal privacy while still gaining useful information.

47
Q

Summarize

A

Data abstraction method that seperates out data elements about individual from correlated groups. This means taking specific, detailed personal data and simplifying it by placing it into broader categories or general groups. This helps to protect the privacy of individuals while still allowing useful patterns or trends to be identified. An example would be:

Imagine a hospital collects detailed medical information about its patients, like their exact blood pressure readings, cholesterol levels, and other specific health metrics. Instead of looking at each patient’s exact numbers, the hospital might group patients into broader categories based on these details, such as “high risk,” “moderate risk,” and “low risk” for heart disease.

Here’s how it works:

Instead of showing that Patient A has a blood pressure of 150/90 and cholesterol of 240 mg/dL, which is very detailed, the hospital might place Patient A into the “high risk” category.
Patient B, with slightly lower numbers, might fall into the “moderate risk” category.
By summarizing this detailed information into more abstract categories like “high risk” or “moderate risk,” the hospital can focus on managing care for these groups without needing to always dive into the specific numbers for each patient. This approach helps to ensure that individual details are not overly exposed while still allowing the hospital to make important decisions based on broader trends and patterns.

This abstraction process protects personal details but still allows the hospital to identify and address key health concerns across different groups of patients.

48
Q

Perturbing

A

Data abstraction method of adding noise to data to reduce its specificty. This means slightly altering the details in the data so that it becomes less precise. This way, even if someone tries to analyze the data, they can’t pinpoint exact information about any individual, but the overall trends or patterns in the data still remain useful. An example would be:

Imagine a survey collects the exact ages of people in a town. The survey wants to share this data with researchers but also wants to protect people’s privacy. Instead of sharing the exact age of each person, they add a small amount of “noise” by randomly adjusting each age by a year or two up or down.

For instance:

If someone is 34 years old, the data might show them as 33 or 35 years old instead.
If someone else is 27, the data might record them as 26 or 28.
These small changes (adding noise) make it harder for anyone to identify the exact age of any individual in the town, but researchers can still study general patterns, like how many young adults or seniors live in the town.

This method of adding noise to data ensures that the privacy of individuals is protected, while the data remains useful for analyzing broader trends without compromising personal details.

49
Q

Hiding Data

A

Focuses on protecting personal information by making it unconnectable or unobservable to others. Hiding strategies are as follows:

  1. Restrict: Preents unauthorized access by requiring log-in credentials or an encryption / decryption key.
  2. Mix: Processing personal information randomly within a large group to reduce correlation.
  3. Obfuscate: Obstructs the ability to read or understand personal information. Most commonly done with encryption or hashing.
  4. Dissociate: Removes the correlation between subjects and their personal information.
  5. Masking: Takes real data as its starting point and applies various kinds of manipulation to reduce the risk represented by the orginal data while perserving desired properties.