DS&E 3 Data Gathering Flashcards

Data Gathering: privacy (continued), bias and experimentation

You may prefer our related Brainscape-certified flashcards:
1
Q

What is:

The main use of hashing?

A

Hashing is a one-way transmission from plaintext into message digest with a very complex algorithm. The outcome is a string of a fixed length.
The main target of hashing is integrity, since working with hashes doesn’t need to use the plaintext.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is:

The main use of encryption?

A

Encryption is a two-way transmission from plaintext into ciphertext with a less complex algorithm than hashing. The outcome is a string of a variable length, often depending on the length of the message (and gives a unique string for each message).
The main target of hashing is confidentiality, since working with decryption back into plaintext must allow for a key that reverses the string into a message.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is:

A “salt”

A

A salt is a technique usually used with hashing that adds a randomly generated string to the password and then takes the hash of that combined string. The hash will thus be more secure, since even regularly used passwords can’t be cracked without the salt information. When the table with both the hash and the salt for a person leaks, however, the hacker can set up a rainbow table for each user with the hashes for password combinations with the given salt information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is:

A rainbow table?

A

A rainbow table is a table with a subset of all possible (popular) password and their respective hashes (dictionary). They are used to hack accounts with any of these passwords, by looking if a hash in a database corresponds with a hash in the rainbow table and then using the corresponding password to get in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main issue if a user asks to ‘delete all their data’?

A

If a user asks to ‘delete all their data’ at once from the database, that is a cumbersome task, since there may have been manier downloads with their respective data. That makes that there are many copies on multiple devices (e.g. company laptops). A solution could be to use a hashed personal ID, so the info can never be linked to the subject, when the central database WITH hashes and corresponding personal ID gets deleted (pseudonymization).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is:

Pseudonymization?

A

Pseudonymization is the act of using a hashed value of personal data throughout the system and keeping a hash table with said information secure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is:

Superposition?

A

Superposition is a concept in quantum computing that entails that a bit is in two states at the same time. This superposition collapses when the state is observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is quantum computing a threat and a savior for data privacy?

A

For personal data protection, quantum computing is a threat because it could crack the popular RSA algorithm for asymmetric encryption thanks to Shor’s algorithm for factoring large numbers. However, current QC only have a handful of qubits, whilst thousands are needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a government backdoor, and why is it used?

A

A government backdoor is a way for government agencies to get access to encrypted messaging by companies as a means for detecting criminal/harmful activity. E.g. Israel used cellphone data with the original purpose of counterterrorism, to detect contact between individuals and covid-carriers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the argument in favor of government backdoors?

A

Criminals could be counting on this sort of “marketing pitch” about privacy by certain companies to stay under the radar with criminal activities, so governments want to specifically avoid this. The argument boils down to the fact that “privacy isn’t absolute”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the argument against government backdoors?

A

The eyes security by those backdoors can be at odds with:

1) Freedom: Government backdoors could allow for a loss of perceived freedom of government intrusion in our lives.
2) Security: Government backdoors are basically a special key to decrypt all messages on a service. But this key should be locked behind a safe wall, locked with… another key? What to do with all of these keys? Who precisely gets access?
3) Futility: Backdoors tend to not work well for most criminal activities, since criminals tend to resort to alternative communication e.g. terrorists always speak in person, rather than over an app

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is:

A database?

A

A database is a collection of independent works, data or other materials which are arranged in a systematic or methodical way and are individually
accessible by electronic or other means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is:

Web scraping?

A

Web scraping is a practice that extracts public data from websites. E.g. collect tweets by a certain function, not allowed by Facebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is database right important for public data?

A

Public Data is not Free-to-Copy Data, and thus you cannot extract substantial parts of online/public databases w/o the owner’s consent, even if it’s not copyright protected and .

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are:

Application Programming Interfaces?

A

Application Programming Interfaces or API’s are tools that help software developers create software or can give access to a companies’ data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the issues with Clearview AI?

A

Clearview AI scraped billions of pictures from websites like Facebook to create an inmense database with pictures of people.
The issue lies in (1) public data being gathered while it’s not free to copy, (2) the fact that databases for customers (government…) can leak and (3) that not only law enforcement has access to the database.

17
Q

What is:

Bias?

A

Bias is systematic prejudice against a certain (often sensitive) group.
Different kinds: e.g. bias in data sample, bias in data or model against a sensitive group, bias/variance trade-off and bias in learn model…

18
Q

What is:

Cognitive bias?

A

Cognitive bias is a systematic prejudice in decision-making or thought processes e.g. anchoring bias or confirmation bias.

19
Q

What is:

Statistical bias?

A

Statistical bias is a systematic prejudice in data collection and estimation e.g. sampling bias en self-selection bias.

20
Q

What is:

A/B testing?

A

A/B testing is the practice of dividing a group/sample/population into subgroups and each giving them a different treatment to detect effects of the difference in treatment.

21
Q

What is:

C/Dtesting?

A

C/D testing is a bad practice of A/B testing, in that it will deceive users when applying different treatments e.g. Facebook’s emotional contagion and OKCupid’s bad matches.