lecture 4 (1) Flashcards

Question 1

Q

What are data driven studies good for?

Answer

A

Classification
prediction
Causation

Question 2

Q

Classification

Answer

A

Identifying which of a set of categories an observation belongs to

E.g. is this email spam or not

Question 3

Q

Prediction

Answer

A

Making projects (not necessarily in the future) about the possible values of a target variable

E.g. What is the next word you will write, given the words you have already written

Question 4

Q

Causation

Answer

A

Making inferences about whether and how certain variables causally affect other variables

E.g. how do the eating habits in my town causally impact the inflow of patients at the hospital

Question 5

Q

Data science

Answer

A

The study and application of algorithmic, statistical and mathematical techniques for data mining

Question 6

Q

Data mining

Answer

A

The art of finding useful patterns in very large sets of data (similar to “exploratory data analysis” in statistics)

Question 7

Q

Data

Answer

A

Public records produced by sensory observation or by some measuring device

E.g. your clicks and searchs online, your choices at the supermarket, etc.

Question 8

Q

Algorithm

Answer

A

An explicit set of step-by-step instructions for answering some question, or for performing some task

E.g. for ying you shoes; for making spaghetti alla carbonara, for multiplying two numbers; etc

Question 9

Q

example from text (autonomous cars)

Answer

A

If an autonomous car hits a person, who is responsible

Question 10

Q

When does data count as big data

Answer

A

Large (=size of the files used to archive and distribute data) datasets

(typically) in digital format

Efficiently (=in a reasonable amount of time) analyzable with computational standards

Question 11

Q

How does data become big

Answer

A

Data-sharing and data-producing practices are supported by political and economic interest

Before the 19th century: data were mostly private,
gathered and used by scientists (e.g., astronomical data) or
the administrators of a state (e.g., demographics)
– 1900’s: International institutions (such as the UN) gathering
and spreading information on health, employment,
migration, etc. to base policy on.
– 2000’s: Corporations (e.g., Google, Amazon, TikTok)
creating and controlling data left by billions of people on
the Internet.

Question 12

Q

Why should we care about big data

Answer

A

Ask yourself: would you mind if I download the content of your phone and use it for purposes you may not be aware of? Would you mind if I used everything you wrote online to train ChatGPT? would you mind if you had to run a study on nutrition and healthcare but cannot access or use relevant data (because of privacy or because you do not have the right computing infrastructture)

Question 13

Q

one COMMON MISUNDERSTANDING of the nature of data science:

Answer

A

data reflects objective truth

Question 14

Q

Data should be

Answer

A

Found and stored

E.g. various legal and IT constraints, including privacy laws on data protection, on commercialization of data collection and distribution, availability of suitable technology

Data should be analysed and all tools for data analysis make assumptions (about the statistical structure of the dataset, abouot how to weight difference sources of data)

Question 15

Q

That “data reflects objective truth” could mean that data is simply “out there in the world uncontaminated”, free from human biases and theorising

But this understanding is misguided because of

Answer

A

Fake data (e.g. (chat) bots

Incomplete data (e.g. missing records)

Question 16

Q

Are data uncontaminated?

Answer

Study These Flashcards

A

Culturally and historically situated data
E.g. linguistic data reflecting local or past social norms – Is the word doctor in contemporary english more likely to refer to a man or a woman?

Question 17

Q

Upshots

Answer

Study These Flashcards

A

Data does not speak for itsself

Studies based on big data make (often implicit or opaque) assumptions

Assumptions might or might not be reasonable, just like in any other type of research, but do have real-world consequences

Question 18

Q

Answer

Study These Flashcards

A

lecture 4 (1) Flashcards

(18 cards)