Data collection and preparation Flashcards

1
Q

ways to get data from online sources

A
  1. web scrapping
  2. api access
    2.1 Rest API (low-frequency information
    2.2 Streaming APIs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types and properties of Data

A
  1. volume
    - data at rest (space)
  2. velocity
    - data in motion (how fast)
  3. variety
    - data in many forms
  4. veracity
    - data in doubt (uncertainty)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Structure: cross-sectional vs transactional vs panel data

A

cross-sectional: data that almost never changes
transactional: one observation represents one transaction
panel: one observation represents one individual during a time period (etc. monthly bill)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Structured data vs Unstructured data

A

Structure data:
- qualitative/categorical data
- quantitative data
Unstructured data:
- text-based documents
- images/videos/sound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

measurement bias

A

1 recall bias: respondents recall some events more vividly than others
2 sensitive questions: respondent may not report data accurately
3 faulty equipment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

two groups

A
  1. unsupervised methods (no specific target variable)
    - affinity grouping
    - similarity matching
    - clustering
    - sentiment analysis
  2. supervised methods (specific target variable)
    - predictive modelling
    - causal modeling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

types of predictive modeling

A
  1. regression
    predict the numerical value of some variable
  2. classification
    predict which of a small set of classes an individual belongs to
How well did you know this?
1
Not at all
2
3
4
5
Perfectly