w10L1 Data Security Flashcards

1
Q

Informational Harms

A

Informational harms
can occur when others use research results or data; learn about subjects as a result of their
participation; and then violate the subjects rights or negatively affect the subjects interests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Information Privacy & Confidentiality

A

Information Privacy & Confidentiality denotes broadly the interests that individuals and groups have in controlling information about or from them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Belmont Report

A

three core principles for ethics in research.

And two of these have strong implications for information
privacy and confidentiality.

The principle of respect for persons
implies that individuals should be treated as autonomous agents,
and
persons with diminished autonomy are entitled to protection.

This implies informed consent and also implies that we should respect people’s choices over confidentiality
and privacy.

The principle of beneficence states
that research must have individual or societal benefit to justify risks.

And this implies that informational risks should be minimized with respect to the benefit that we get.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Participant Harm

A

A informational harm to research participants occurs when others use research results or data; learn about the individual as a result of their participation in the research, and then violate their rights; or negatively impact their interests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Information Security

A

Information Security

Control and protection against unauthorized access, use, disclosure, disruption, modification, or destruction of information.

VS:

Information Privacy

Control and protection over the extent and circumstances of information collection, sharing, and use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Information Privacy

A

Information Privacy

Control and protection over the extent and circumstances of information collection, sharing, and use

VS:

Information Security

Control and protection against unauthorized access, use, disclosure, disruption, modification, or destruction of information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fair Information Practice Principles:

A

Fair Information Practice Principles:
– Notice/Awareness of how information is being collected and used.

– Choice/consent

– Access/participation to processes (including the opportunity to verify how accurate data is and correct)

–Integrity/Security (Mechanisms to ensure security/integrity of information)

– Enforcement/redress - if something goes wrong… there should be ways to help this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

5 Safes for data protection planning!

A

Another framework to consider is the five safes principles,
which originated at the UK Data Service.
This gives a number of simple dimensions to consider as rules of thumb.
1) First is safe projects.
Is the use of the data in this context appropriate?
2) Safe people– can the people who use this data be trusted to use it in an appropriate manner?
How were they vetted or selected?
3) Safe settings refers to how the data is accessed. Is it accessed within a facility that limits unauthorized use?
4) Safe data is around the disclosure risk from the data itself.
What would happen if the data were widely circulated?
5) Safe outputs: outputs are around the risks of the analysis. When the results of the analysis are released,
do they reveal information about people?

Note that these are not really binary issues.
Safety is a continuous matter, and the risk
depends both on the context of the information sharing
and on the subjects involved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

modern principles
for privacy analysis

A

we proposed a set of modern principles
for privacy analysis that complement these.

1) One is the principle of calibration.
- Privacy controls should be calibrated to the intended use and privacy risks associated with the data.

2) The second is to consider inferential risks.
When you think about the harms of information, consider not just re-identification, but also the potential for others to learn about individuals from their inclusion in the data.

3) Third is to have a tiered approach where we use a combination of privacy and security controls
and a variety of ways of getting at information rather than assuming that everyone will access data
in one way and that one set of protections will be enough to control privacy and security for all purposes.

4) The last principle is to anticipate change.
This is a rapidly changing field.
Both the science and the regulations and law
are changing.
And so thinking about how the data landscape is changing
and how the regulatory landscape is changing
is important as you move from lifecycle stages
and as the risks and methods evolve over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some strategies for mitigating risks when making measurement choices? (Select all that apply)

Determine if the sensitive data is necessary for the study

Categorize responses (i.e. income or age) into groups or brackets

Randomized responses

Collecting group responses

None of the above
correct

A

What are some strategies for mitigating risks when making measurement choices? (Select all that apply)

Determine if the sensitive data is necessary for the study

Categorize responses (i.e. income or age) into groups or brackets

Randomized responses (e.g. rocks = list randomization, random response = flip coin … yes or answer truthfully)

Collecting group responses

ABC and D

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which of the following is NOT true about IRBs? (Select all that apply)

IRB approval is sufficient in protecting researchers from legal obligations

IRBs can provide advice on whether the data publication process protects study participants

Data management and data security plans are often subject to approval by IRBs

IRBs can determine the sensitivity of the data being collected

None of the above

A

Which of the following is NOT true about IRBs? (Select all that apply)

Answer is A: IRB approval is sufficient in protecting researchers from legal obligations

ALL OTHERS TRUE
IRBs can provide advice on whether the data publication process protects study participants

Data management and data security plans are often subject to approval by IRBs

IRBs can determine the sensitivity of the data being collected

None of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe data transformation that are protective:

Data partitioning
Redaction
Encryption

A

1) data partitioning, which divides data into different parts
to make the more sensitive parts easier to protect;

2) redaction, which removes information from the data,
either for legal purposes or for information protection;

3) and encryption, which effectively
scrambles the data to make it meaningless
to outside observers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Data partitioning

A

Data partitioning divides the data into multiple pieces so that the more sensitive or identifying parts can be
subject to greater protections.

This reduces risks in information management
and allows you to share some parts of the data
more freely than you would otherwise be able to do.
You should partition the data based on its sensitivity
and on its identifiability.
Typically, data is partitioned into three or more parts–
one which contains highly identifying information,
another which contains highly sensitive information,
another which contains the other measured characteristics.
This set of pieces is then segregated
and can be stored with different levels of protection,
offered to different people at different levels of access,
even collected and transmitted through different channels
depending on the risks involved.
You should plan to segregate data as early as feasible
and to link segregated information
with artificial keys so that you can reassemble that information
if you absolutely need to.
When you choose keys, they should
be chosen at random or in a cryptographically secure way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Redaction

A

De-identification and anonymization
are legal concepts that are typically
without a general rigorous statistical definition
in the law.

De-identification is often accomplished by redaction
or simply removing information.
This can be useful legally and can reduce risk in practice
by making it more difficult to associate sensitive information
with particular individuals.

It’s particularly useful to redact information
that was received during data collection but wasn’t intended to be measured as part of the research design.

For example, we might receive identifying or sensitive
information as part of open-ended responses
where they weren’t anticipated.

Redaction by itself may be sufficient for legal purposes
in some circumstances.
But generally, it does not reliably control the risk
to individuals by itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some ways in which data can be de-identified? (Select all that apply)

Rewriting

Redaction/Removal

Hiding

Partitioning

Encryption

None of the above

A

What are some ways in which data can be de-identified? (Select all that apply)

Rewriting NO

Redaction/Removal YES

Hiding NO

Partitioning YES

Encryption YES

None of the above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kind of information can be encrypted?

A

What can be encrypted?
files
media
file systems
transmissions
computations
data bases

17
Q

The main goal of information protection during the storage
phase is often to provide is to….

A

The main goal of information protection during the storage
phase is often to provide information security.
As you may recall, information security
means preserving the properties of integrity, availability,
and confidentiality.

18
Q

Violations of Integrity

Violations of Availability

Violoations of Confidentiality

A

Confidentiality is violated when unauthorized users
can see the data.

Availability is violated when you can’t
get the data when it’s needed.

And integrity is violated when the data is changed without authorization or a record.

19
Q

3 general sources
and areas of vulnerability.

A

the general sources
and areas of vulnerability.

Sources can be in three different categories–

natural, unintentional, and intentional threats.

A natural threat is a fire or a flood.
This can affect the availability of your systems and the integrity of your data if you have no backups. But generally doesn’t affect confidentiality.

Unintentional human actions, such as sending an email to the wrong place, or

intentional human action
like trying to break into a system
can affect confidentiality.

20
Q

Privacy Core Concepts:

Identifiability

3 types:

Record linkage

Indistinguishability (k-anonymous)

Limits on Adversarial learning

A

Privacy Core Concepts:

Identifiability

Potential for learning about individuals from computations
based on data in which they are included

Identifiability can be measured in a number of different ways.

1) - The most common and the weakest is the record linkage model,

which sometimes can be thought of as “Where’s Waldo.”

In record linkage model of identification, we achieve identification if we can match a real person to a specific record in a database based on what we know.

An example is direct identifiers
that provide an observable characteristic in the world
and exists in the database.

If I know your name, and I can see it in the database, I have linked that record.
You might establish a record linkage,
because you have a collection of information such as your birth
date and social– first four digits of your social security number, and your zip code, that together uniquely identifies you in public and the record in the database.

Generally, controlling for direct identification, or controlling for record linkage, may satisfy compliance with specific laws,
but it doesn’t remove the potential for information harm.

2) A stronger definition of identifiability is indistinguishability.
This can be thought of as hiding in the crowd.

I’m indistinguishable from other records, if based on what’s publicly known about me, only that cluster of records can be found in the database.

For example, if you know my height and weight, and there are seven other records in the database with that height and weight,

I’m indistinguishable from those records on that basis.

A database is called k-anonymous,
which is a form of indistinguishable ness, if for all of the identified quasi-identifying characteristics, the record–the database has at least k-records with those types.

Even when the database is indistinguishable at some level, there’s still a potential for substantial harm.

First, there may be information that is observable, but was not counted in your indistinguishability
calculation.
Perhaps we know somebody’s favorite flavor of ice cream,
and that wasn’t included in your list of identifying attributes.

As important, even though you are indistinguishable, that doesn’t mean that someone can’t learn something
about you.

For example, if there are a million people in the database,
you may still be able to learn that I’m one of a hundred that have both been convicted of a crime and have served time. In which case, you’ve learned something important about me, even though you haven’t identified exactly which record in the database is me.

3) A more general and a stronger version of measuring identifiability is to limit adversarial learning.

This can be thought as guaranteeing confidentiality, and the most popular form of this is called

differential privacy.

This place formal statistical bounds on the total amount that we can learn about any individual in the database
based on the data rate– data release.

And so it guarantees that no matter what other information you have, you can’t learn much more about me than if I hadn’t participated in data collection to begin with.

However, this is a high bar, and currently a challenging one
to implement, and often requires moving from publishing data
as a whole to offering interactive access to data.

21
Q

A dataset is considered K-anonymous when:

For each record, at least k-1 records contain the same identifying characteristics to make them indistinguishable

There are k variables that, when removed, make the dataset fully de-identified

K records contain identifying characteristics

After some records are removed, k records remain in the dataset that are indistinguishable

A

A dataset is considered K-anonymous when:

CORRECT ANSWER:
For each record, at least k-1 records contain the same identifying characteristics to make them indistinguishable

INCORRECT ANSWERS BELOW:

There are k variables that, when removed, make the dataset fully de-identified

K records contain identifying characteristics

After some records are removed, k records remain in the dataset that are indistinguishable

22
Q

generalization

vs:

Suppression

vs.

Aggregation

vs

Perturbation

A

GENERALIZATION: where we take specific values and make them more general, for example, remapping the category of neurosurgeon
to physician.

SUPPRESSION: where we essentially
delete a cell that would render a record too identifiable.

Aggregation, we take a set of records and aggregate the value
for at least one characteristic.

PERTURBATION: where we swap different values
between records or add noise to the data.

23
Q

Public Data Archives

Data enclaves

Controlled remote access

Model servers

A

There are a number of models that support data reuse broadly.

1) PUBLIC DATA ARCHIVE:
The least protective, but most accessible, is the public data archive.
Examples of this might be the Dataverse network or Data.gov.

typically, data in public data archives
may have some disclosure control or redaction applied to it, is available through a lightweight license
such as an online click through, and in limited cases has
some minor vetting of users.

Maybe they have to sign up and provide an email address.

The advantages of this is that it’s very transparent,
allows flexible use of the data.
The disadvantage is that there are
limits to what de-identification can do and still
keep the data useful.
So for some uses of the data, this
will not provide appropriate control and or usefulness.

2) DATA ENCLAVE:
A step more rigorous is data enclave or a virtual data enclave.
These provide either physical or logical restrictions to the data.

Example - have to go on site to get access to the data.

3) CONTROLLED REMOTE ACCESS
Example - log in to a remote server
under the right circumstances.

These provide access to raw data or to minimally processed data
for users that are highly vetted and where
both the analysis and the outputs
are audited and reviewed by experts for disclosure.
The advantage that this can support rich access
to very risky data, but the disadvantages are
that these can be expensive, slow down the research process,
and are generally not as convenient to access.

4) MODEL SERVER:
Another alternative that’s increasingly
in use and in statistical agencies
and other large organizations is a model server where the server provides remote access not to the data itself
but to a set of analyses.

The advantage of limiting data access to analyses is that data access can be made available more widely.
It’s easier to analyze a specific set of methods and make sure that they’re secure and that they do not disclose too much information.

The flip side of that is that the set of analyses that a model server can provide are limited, and there will always be research projects and research questions that can only be answered by going back to the original data.

24
Q

Functions of Consent Document

A

Functions of Consent Document

Ensure informed consent to participate in research

Communicate foreseeable risks

Communicate potential societal benefits

Identify mechanisms for questions and for withdrawal

Communicate procedures for protecting the confidentiality of personal information about the participants

Communicate limits to confidentiality

Communicates processes and benefits of sharing data

25
Q

Consent Document – Practices to avoid

A

Consent Document – Practices to avoid

X Assurances of complete anonymity / privacy
X No plans to share data
X No plans to disseminate/ archive data
X Data will be shared only in an anonymized form
X All d ata destroyed at end of research

26
Q

Functions of Service Level Agreement

A

Service Level Agreement (SLA) is an official commitment between a service provider and a customer

SLA’s generally specify the type, cost, quality, and availability of service

Services for research data should also specify information security and
privacy controls

The functions of a service level agreement is really to protect the researcher and the researcher institution from the organization that
is handling different parts of the Information Technology Infrastructure.

A service level agreement is an official commitment between a service provider, like a storage vendor,
and a customer.

Generally these specify things like the type, cost, quality, and availability of services.

Services for research data, with information privacy or security
concerns, should also specify particular information security
and privacy controls.

There are a number of key elements
to include it a service-level agreement.

One is the privacy and security standard that is used for compliance or for reference.

Any additional information controls that are used should also be included.

In particular, access control policies, backup and retention, policies, usage logging and sharing policies, and any breach notification policies.

For highly sensitive data that poses a higher than average informational risk, you may also consider
incorporating additional elements into your service level agreement.

Many service level agreements for highly sensitive data include duty of care, so that the service provider
has a responsibility to take the same precautions for protection of the data as your institution would.

May include agreements on information residency. Where the data is stored can affect the legal mechanisms that can be used to reach it. And highly sensitive data should include SLAs for incident response, for external auditing,
and for responding to legal records requests.

Both yours, and how they respond to other requests for your sensitive data.

27
Q

Transforming data is one control used when making data publicly available. What type of transformation is done when a birthdate field is transformed to age (in years)?

Local Suppression

Partitioning

Aggregation

Perturbation

Generalization

None of the Above

A

Transforming data is one control used when making data publicly available. What type of transformation is done when a birthdate field is transformed to age (in years)?

Local Suppression

Partitioning

Aggregation

Perturbation

ANSWER: Generalization

None of the Above

28
Q

Data Use Agreements

A

So data use agreements are used when you give data to a third party, And they fulfill a number of different functions.

1) First, they communicate the requirements for handling that data to the third party.

They also help to share some of the risks of data security from your institution with that third party.

They support transparency and reuse
so that there is a way for others to access the data that’s underlies a particular research finding.

They may support legal requirement,
and they can also grant subjects the individual right to take action should their data be misused.

Data use agreements contain a number of key elements.

1) The first is the who.
Who may receive the information and who may use it?

2) The second element is what are the prohibited or permitted uses?

What types of uses are allowed for that data, and are there any further restrictions on sharing, either for privacy or intellectual property
reasons?

3) Data use agreements should also specify any safeguards for use and disclosure that you would want a third party to take for that data, any controls on storage
and transmission, and any controls on further access.

4) If the data needs to be destroyed at some point after a third party receives it, that should be included.

5) And any requirements for auditing,
or review, or reporting or notification,
whether it’s notification of publication based on the data, or notification of the discovery of an individual within it, all of those things should be in a data use agreement.

6) You may also consider indemnification for the institution, although often, third parties are not
practically able to do this.

Giving the right of action to individual subjects. This is one of the only ways in which individual subjects would
be able to reach a third party directly if they misuse data.

Transparency requirements, for example, the requirement
to cite data and publications, and any further sharing
or openness requirements.

29
Q

What is included in an informed consent document? (Select all that apply)

Communicates how the researchers will protect the confidentiality of the participants

Explains the purpose of the study

Assures complete confidentiality of the study participants

Discusses how data will be destroyed at the end of the research

Allows the participant to opt out of the study at any time

Discusses the benefits of the research

None of the above
correct

A

What is included in an informed consent document? (Select all that apply)

CORRECT Communicates how the researchers will protect the confidentiality of the participants

CORRECT Explains the purpose of the study

INCORRECT Assures complete confidentiality of the study participants

INCORRECT Discusses how data will be destroyed at the end of the research

CORRECT Allows the participant to opt out of the study at any time

CORRECT Discusses the benefits of the research

None of the above
correct