Chapter 9 Flashcards

1
Q

What is Big Data?

A

Complex and big data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where does Big Data come from?

A

Social media,
Smart phones,
Sensors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is primary data in Big Data?

A

Specifically collected for research purpose.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is secondary data in Big Data?

A

Not specificaly collected for research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which three characteristics of Big Data are good for research?

A

1: Big,
2: Always on
3: Nonreactive.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which 7 characteristics of Big Data are bad for research?

A

1: Incomplete
2: Inaccessible
3: Nonrepresentative
4: Drifting
5: Algorithmically confounded
6: Dirty
7: Sensitive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is Big an advantage?

A

With rare events you get enough data when there is heterogenicity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is always on an advantage?

A

You have real-time measurements. (Spotting trends)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is nonreactive an advantage?

A

Measuring Big Data sources is less likely to change behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is incomplete an disadvantage?

A

Leaving out important data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is inaccessible a disadvantage?

A

Legal and complience of data acces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is unrepresentative an disadvantage?

A

E.g. consumers saying reviews are important but the data is not always valid.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is drifting an disadvantage?

A

Big Data source can change, the users, the usage or the platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is algorithmically confounded an disadvantage?

A

Design of the platform can change behavior. (FB encourages atleast 20 friends so minimum of friends is not easy to study)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is dirty a disdvantage?

A

Big Data can be loaded with junk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is sensitive a disadvantage?

A

Big Data can have sensitive data.

17
Q

How can you leanr from big data?

A

With measuring, prediction and experiments.

18
Q

What can you do to fight against overfitting?

A

Cross validation

Regularization

19
Q

What is cross validation?

A

Validate model on antoher.

20
Q

What is regularization?

A

Raise the bar for significance.

21
Q

What is Map Reduce?

A

Access data in parralel.

22
Q

Data can be big in which two ways?

A

Tall (many observations)

Wide (few observations)

23
Q

What is a lift?

A

Hoe vaak een variabel voorkomt tenopzichte van een ander variabel.