lecture 4 (1) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are data driven studies good for?

A

Classification
prediction
Causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification

A

Identifying which of a set of categories an observation belongs to

E.g. is this email spam or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prediction

A

Making projects (not necessarily in the future) about the possible values of a target variable

E.g. What is the next word you will write, given the words you have already written

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Causation

A

Making inferences about whether and how certain variables causally affect other variables

E.g. how do the eating habits in my town causally impact the inflow of patients at the hospital

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data science

A

The study and application of algorithmic, statistical and mathematical techniques for data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data mining

A

The art of finding useful patterns in very large sets of data (similar to “exploratory data analysis” in statistics)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data

A

Public records produced by sensory observation or by some measuring device

E.g. your clicks and searchs online, your choices at the supermarket, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Algorithm

A

An explicit set of step-by-step instructions for answering some question, or for performing some task

E.g. for ying you shoes; for making spaghetti alla carbonara, for multiplying two numbers; etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

example from text (autonomous cars)

A

If an autonomous car hits a person, who is responsible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When does data count as big data

A

Large (=size of the files used to archive and distribute data) datasets

(typically) in digital format

Efficiently (=in a reasonable amount of time) analyzable with computational standards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does data become big

A

Data-sharing and data-producing practices are supported by political and economic interest

Before the 19th century: data were mostly private,
gathered and used by scientists (e.g., astronomical data) or
the administrators of a state (e.g., demographics)
– 1900’s: International institutions (such as the UN) gathering
and spreading information on health, employment,
migration, etc. to base policy on.
– 2000’s: Corporations (e.g., Google, Amazon, TikTok)
creating and controlling data left by billions of people on
the Internet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why should we care about big data

A

Ask yourself: would you mind if I download the content of your phone and use it for purposes you may not be aware of? Would you mind if I used everything you wrote online to train ChatGPT? would you mind if you had to run a study on nutrition and healthcare but cannot access or use relevant data (because of privacy or because you do not have the right computing infrastructture)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

one COMMON MISUNDERSTANDING of the nature of data science:

A

data reflects objective truth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data should be

A

Found and stored

E.g. various legal and IT constraints, including privacy laws on data protection, on commercialization of data collection and distribution, availability of suitable technology

Data should be analysed and all tools for data analysis make assumptions (about the statistical structure of the dataset, abouot how to weight difference sources of data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

That “data reflects objective truth” could mean that data is simply “out there in the world uncontaminated”, free from human biases and theorising

But this understanding is misguided because of

A

Fake data (e.g. (chat) bots

Incomplete data (e.g. missing records)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Are data uncontaminated?

A

Culturally and historically situated data
E.g. linguistic data reflecting local or past social norms – Is the word doctor in contemporary english more likely to refer to a man or a woman?

17
Q

Upshots

A

Data does not speak for itsself

Studies based on big data make (often implicit or opaque) assumptions

Assumptions might or might not be reasonable, just like in any other type of research, but do have real-world consequences

18
Q
A