Session 2.1 Flashcards

1
Q

Data Structure (Volume/Velocity)

A
  • Cross sectional
  • Transactional
  • Panel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cross sectional

A

Data that (almost) never changes. (e.g. city names and locations, customer’s birth date, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Transactional

A

one observation represents one transaction (e.g. a website visit, or a purchase)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Panel

A

one observation represents one individual during a time period (e.g., monthly bill, website visits per week)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Data Structure: Tidy Data

Rules?

A

Put your data on a single table according to the following rules:

1 Each variable must have its own column.
2 Each observation must have its own row.
3 Each value must have its own cell.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Structured Data (Variety)

  • Qualitative/Categorical Data
A

Nominal categories have no natural order
-> e.g., race, gender, country

Ordinal there is a natural ordering of the categories
-> e.g., age bracket, satisfaction level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Structured Data (Variety)

  • Quantitative Data
A

Discrete countable number of distinct values
-> e.g., age, number of kids

Continuous any value within an interval
-> e.g., wage, temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Unstructured Data (Variety)

A

Text-based documents (e.g., tweets, webpages, complaints, emails, etc.)
Images / Videos

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unstructured Data (Variety)

Methods to transform unstructured data into structured data:

A
  • Topic Modeling (text)
  • Sentiment Analysis (text)
  • Feature extraction (image / video / sound)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Quality (Veracity)

Data quality can be affected in two major ways:

A
  1. Missing Data

2. Measurement Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Missing Data (Veracity)

A
  • Missing observations

- Missing values in some observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reasons for missing data (Veracity)

A
  1. Missing at random

If data are missing at random, the remaining observations are still a representative sample of the population

Simplest Solution: listwise deletion, i.e., delete all observations that do not have values for all variables in the analysis

  1. Missing not at random

If data are missing not at random, then the remaining observations are not a representative sample of the population

No Simple Solution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Selection Bias (Veracity) occurs…

A

when the sampling procedure is not random, and thus the sample is not representative of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Selection Bias (Veracity)

A
  1. Self-selection

some members of the population are more likely to be included in the sample because of their characteristics

e. g., participants in a voluntary insurance program
2. Attrition

some observations may be less likely to be present in the sample due to time constraints

e.g., tendency to look only at firms that survive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Measurement Error (Veracity) occurs…

A

when the data collected contains errors that are non random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measurement Error (Veracity)

A
  1. Recall bias

respondents recall some events more vividly than others

e. g., child deaths by guns vs swimming pools
2. Sensitive questions

respondents may not report data accurately

e. g., wages, health conditions
3. Faulty equipment

equipment that exhibits systematic measurement errors

e.g., a thermometer that is off by 1 degree Celsius