intro/eda Flashcards
1
Q
ways to classify analytics
A
- descriptive
- prescriptive
- predictive
2
Q
what is descriptive analytics
A
gather, organise
- tells you what is happening
3
Q
what is predictive analytics
A
- uses data to predict future
- uses association among variables and predicting the likelihood of a phenomenon based on the relationships identified
4
Q
what is prescriptive analytics/decision analytics
A
- looks at multiple options and strategies then decide best course of action
- recommends course of action
5
Q
steps to EDA
A
- define problem
- gater data
- analyse data
- act on anaylsis
6
Q
population
A
includes all entities of interest in a study
7
Q
sample
A
- subset of population
- often randomly chosen and preferably representative of the population as a whole
8
Q
what is descriptive statistics
A
data for whole population
- techniques to describe data
- parameters
9
Q
what is inferential statistics
- steps (3)?
A
- generalise findings of a sample to population
- statistics
- model, estimate parameters, estimate erros via testing
10
Q
types of data
A
- numerical data: continuous, discrete / interval, ratio
- categorical data: nominal, ordinal
- text
- geolocation data
10
Q
types of data
A
- numerical data: continuous, discrete / interval, ratio
- categorical data: nominal, ordinal
- text
- geolocation data
11
Q
what is numerical data
- types?
A
- discrete
- continuous
- cross-sectional: data on cross section of a population at a distinct point in time
- time series: data collected over time
- pooed data: time series of cross sections; observations in each cross section not the same
- panel data: samples of SAME cross-sectional data observed at multiple points in time
12
Q
descriptive measures for numerical variables
A
- mean: average value of an interval or ratio variable; affected by outliers
- median: middle value for data arranged in either ascending or descending order; good for ordinal data; less affected by outliers
- mode: most frequent value for a variable; good for nominal data
13
Q
measuring spread for numerical data
A
- range: max-min
- IQR: Q3 - Q1
- variance
14
Q
measures of symmetry of distribution
A
- skewness
- kurtosis