w10L1 Data Security Flashcards
Informational Harms
Informational harms
can occur when others use research results or data; learn about subjects as a result of their
participation; and then violate the subjects rights or negatively affect the subjects interests
Information Privacy & Confidentiality
Information Privacy & Confidentiality denotes broadly the interests that individuals and groups have in controlling information about or from them
Belmont Report
three core principles for ethics in research.
And two of these have strong implications for information
privacy and confidentiality.
The principle of respect for persons
implies that individuals should be treated as autonomous agents,
and
persons with diminished autonomy are entitled to protection.
This implies informed consent and also implies that we should respect people’s choices over confidentiality
and privacy.
The principle of beneficence states
that research must have individual or societal benefit to justify risks.
And this implies that informational risks should be minimized with respect to the benefit that we get.
Participant Harm
A informational harm to research participants occurs when others use research results or data; learn about the individual as a result of their participation in the research, and then violate their rights; or negatively impact their interests
Information Security
Information Security
Control and protection against unauthorized access, use, disclosure, disruption, modification, or destruction of information.
VS:
Information Privacy
Control and protection over the extent and circumstances of information collection, sharing, and use
Information Privacy
Information Privacy
Control and protection over the extent and circumstances of information collection, sharing, and use
VS:
Information Security
Control and protection against unauthorized access, use, disclosure, disruption, modification, or destruction of information.
Fair Information Practice Principles:
Fair Information Practice Principles:
– Notice/Awareness of how information is being collected and used.
– Choice/consent
– Access/participation to processes (including the opportunity to verify how accurate data is and correct)
–Integrity/Security (Mechanisms to ensure security/integrity of information)
– Enforcement/redress - if something goes wrong… there should be ways to help this
5 Safes for data protection planning!
Another framework to consider is the five safes principles,
which originated at the UK Data Service.
This gives a number of simple dimensions to consider as rules of thumb.
1) First is safe projects.
Is the use of the data in this context appropriate?
2) Safe people– can the people who use this data be trusted to use it in an appropriate manner?
How were they vetted or selected?
3) Safe settings refers to how the data is accessed. Is it accessed within a facility that limits unauthorized use?
4) Safe data is around the disclosure risk from the data itself.
What would happen if the data were widely circulated?
5) Safe outputs: outputs are around the risks of the analysis. When the results of the analysis are released,
do they reveal information about people?
Note that these are not really binary issues.
Safety is a continuous matter, and the risk
depends both on the context of the information sharing
and on the subjects involved.
modern principles
for privacy analysis
we proposed a set of modern principles
for privacy analysis that complement these.
1) One is the principle of calibration.
- Privacy controls should be calibrated to the intended use and privacy risks associated with the data.
2) The second is to consider inferential risks.
When you think about the harms of information, consider not just re-identification, but also the potential for others to learn about individuals from their inclusion in the data.
3) Third is to have a tiered approach where we use a combination of privacy and security controls
and a variety of ways of getting at information rather than assuming that everyone will access data
in one way and that one set of protections will be enough to control privacy and security for all purposes.
4) The last principle is to anticipate change.
This is a rapidly changing field.
Both the science and the regulations and law
are changing.
And so thinking about how the data landscape is changing
and how the regulatory landscape is changing
is important as you move from lifecycle stages
and as the risks and methods evolve over time.
What are some strategies for mitigating risks when making measurement choices? (Select all that apply)
Determine if the sensitive data is necessary for the study
Categorize responses (i.e. income or age) into groups or brackets
Randomized responses
Collecting group responses
None of the above
correct
What are some strategies for mitigating risks when making measurement choices? (Select all that apply)
Determine if the sensitive data is necessary for the study
Categorize responses (i.e. income or age) into groups or brackets
Randomized responses (e.g. rocks = list randomization, random response = flip coin … yes or answer truthfully)
Collecting group responses
ABC and D
Which of the following is NOT true about IRBs? (Select all that apply)
IRB approval is sufficient in protecting researchers from legal obligations
IRBs can provide advice on whether the data publication process protects study participants
Data management and data security plans are often subject to approval by IRBs
IRBs can determine the sensitivity of the data being collected
None of the above
Which of the following is NOT true about IRBs? (Select all that apply)
Answer is A: IRB approval is sufficient in protecting researchers from legal obligations
ALL OTHERS TRUE
IRBs can provide advice on whether the data publication process protects study participants
Data management and data security plans are often subject to approval by IRBs
IRBs can determine the sensitivity of the data being collected
None of the above
Describe data transformation that are protective:
Data partitioning
Redaction
Encryption
1) data partitioning, which divides data into different parts
to make the more sensitive parts easier to protect;
2) redaction, which removes information from the data,
either for legal purposes or for information protection;
3) and encryption, which effectively
scrambles the data to make it meaningless
to outside observers.
Data partitioning
Data partitioning divides the data into multiple pieces so that the more sensitive or identifying parts can be
subject to greater protections.
This reduces risks in information management
and allows you to share some parts of the data
more freely than you would otherwise be able to do.
You should partition the data based on its sensitivity
and on its identifiability.
Typically, data is partitioned into three or more parts–
one which contains highly identifying information,
another which contains highly sensitive information,
another which contains the other measured characteristics.
This set of pieces is then segregated
and can be stored with different levels of protection,
offered to different people at different levels of access,
even collected and transmitted through different channels
depending on the risks involved.
You should plan to segregate data as early as feasible
and to link segregated information
with artificial keys so that you can reassemble that information
if you absolutely need to.
When you choose keys, they should
be chosen at random or in a cryptographically secure way.
Redaction
De-identification and anonymization
are legal concepts that are typically
without a general rigorous statistical definition
in the law.
De-identification is often accomplished by redaction
or simply removing information.
This can be useful legally and can reduce risk in practice
by making it more difficult to associate sensitive information
with particular individuals.
It’s particularly useful to redact information
that was received during data collection but wasn’t intended to be measured as part of the research design.
For example, we might receive identifying or sensitive
information as part of open-ended responses
where they weren’t anticipated.
Redaction by itself may be sufficient for legal purposes
in some circumstances.
But generally, it does not reliably control the risk
to individuals by itself.
What are some ways in which data can be de-identified? (Select all that apply)
Rewriting
Redaction/Removal
Hiding
Partitioning
Encryption
None of the above
What are some ways in which data can be de-identified? (Select all that apply)
Rewriting NO
Redaction/Removal YES
Hiding NO
Partitioning YES
Encryption YES
None of the above