Data collection and preparation Flashcards
ways to get data from online sources
- web scrapping
- api access
2.1 Rest API (low-frequency information
2.2 Streaming APIs
Types and properties of Data
- volume
- data at rest (space) - velocity
- data in motion (how fast) - variety
- data in many forms - veracity
- data in doubt (uncertainty)
Data Structure: cross-sectional vs transactional vs panel data
cross-sectional: data that almost never changes
transactional: one observation represents one transaction
panel: one observation represents one individual during a time period (etc. monthly bill)
Structured data vs Unstructured data
Structure data:
- qualitative/categorical data
- quantitative data
Unstructured data:
- text-based documents
- images/videos/sound
measurement bias
1 recall bias: respondents recall some events more vividly than others
2 sensitive questions: respondent may not report data accurately
3 faulty equipment
two groups
- unsupervised methods (no specific target variable)
- affinity grouping
- similarity matching
- clustering
- sentiment analysis - supervised methods (specific target variable)
- predictive modelling
- causal modeling
types of predictive modeling
- regression
predict the numerical value of some variable - classification
predict which of a small set of classes an individual belongs to