DS&E 3 Data Gathering Flashcards
Data Gathering: privacy (continued), bias and experimentation
What is:
The main use of hashing?
Hashing is a one-way transmission from plaintext into message digest with a very complex algorithm. The outcome is a string of a fixed length.
The main target of hashing is integrity, since working with hashes doesn’t need to use the plaintext.
What is:
The main use of encryption?
Encryption is a two-way transmission from plaintext into ciphertext with a less complex algorithm than hashing. The outcome is a string of a variable length, often depending on the length of the message (and gives a unique string for each message).
The main target of hashing is confidentiality, since working with decryption back into plaintext must allow for a key that reverses the string into a message.
What is:
A “salt”
A salt is a technique usually used with hashing that adds a randomly generated string to the password and then takes the hash of that combined string. The hash will thus be more secure, since even regularly used passwords can’t be cracked without the salt information. When the table with both the hash and the salt for a person leaks, however, the hacker can set up a rainbow table for each user with the hashes for password combinations with the given salt information.
What is:
A rainbow table?
A rainbow table is a table with a subset of all possible (popular) password and their respective hashes (dictionary). They are used to hack accounts with any of these passwords, by looking if a hash in a database corresponds with a hash in the rainbow table and then using the corresponding password to get in.
What is the main issue if a user asks to ‘delete all their data’?
If a user asks to ‘delete all their data’ at once from the database, that is a cumbersome task, since there may have been manier downloads with their respective data. That makes that there are many copies on multiple devices (e.g. company laptops). A solution could be to use a hashed personal ID, so the info can never be linked to the subject, when the central database WITH hashes and corresponding personal ID gets deleted (pseudonymization).
What is:
Pseudonymization?
Pseudonymization is the act of using a hashed value of personal data throughout the system and keeping a hash table with said information secure.
What is:
Superposition?
Superposition is a concept in quantum computing that entails that a bit is in two states at the same time. This superposition collapses when the state is observed.
Why is quantum computing a threat and a savior for data privacy?
For personal data protection, quantum computing is a threat because it could crack the popular RSA algorithm for asymmetric encryption thanks to Shor’s algorithm for factoring large numbers. However, current QC only have a handful of qubits, whilst thousands are needed
What is a government backdoor, and why is it used?
A government backdoor is a way for government agencies to get access to encrypted messaging by companies as a means for detecting criminal/harmful activity. E.g. Israel used cellphone data with the original purpose of counterterrorism, to detect contact between individuals and covid-carriers.
What is the argument in favor of government backdoors?
Criminals could be counting on this sort of “marketing pitch” about privacy by certain companies to stay under the radar with criminal activities, so governments want to specifically avoid this. The argument boils down to the fact that “privacy isn’t absolute”.
What is the argument against government backdoors?
The eyes security by those backdoors can be at odds with:
1) Freedom: Government backdoors could allow for a loss of perceived freedom of government intrusion in our lives.
2) Security: Government backdoors are basically a special key to decrypt all messages on a service. But this key should be locked behind a safe wall, locked with… another key? What to do with all of these keys? Who precisely gets access?
3) Futility: Backdoors tend to not work well for most criminal activities, since criminals tend to resort to alternative communication e.g. terrorists always speak in person, rather than over an app
What is:
A database?
A database is a collection of independent works, data or other materials which are arranged in a systematic or methodical way and are individually
accessible by electronic or other means
What is:
Web scraping?
Web scraping is a practice that extracts public data from websites. E.g. collect tweets by a certain function, not allowed by Facebook
Why is database right important for public data?
Public Data is not Free-to-Copy Data, and thus you cannot extract substantial parts of online/public databases w/o the owner’s consent, even if it’s not copyright protected and .
What are:
Application Programming Interfaces?
Application Programming Interfaces or API’s are tools that help software developers create software or can give access to a companies’ data.