Descriptive Analytics Flashcards
Data
Set of pieces of individual information (can be likes, tweets, etc.)
3 Question to Ask With New Data
1) How many variables in dataset?
2) How many observations in the dataset?
3) What level are the observations (customer, transaction, store, etc.)?
Quantitative
Scale relevant
Numerical
Qualitative
Set of categories
gender {m,f,o} being {0,1,2} doesn’t mean o is double the value of f
Discrete
Takes only certain values
Continuous
Can take on any numerical value
Variation
Observations for one variable will take on different values, not all values for “churn” will be the same in the data set
Mean
Average
Sensitive to outliers
Median
Middle most value
Ignores a lot of information
Can move abruptly
Mode
Most common value
Not good for continuous variables
Frequency Distribution
Table showing fraction of observations for which a variable takes on each possible value
Variable. # Obs. % of overall Obs
A
B
C
Range of Values
Important if you care about the likelihood of a good outcome - blockbuster
Want to avoid a bad- healthcare
Care about inequality/dispersion- inventory management
Interquartile Range
Difference between 25th and 75th percentile
Standard Deviation
sqrt(variance)
gives the average variation around the mean in the units of the variable
lower means data close to the mean
higher means data spread out
Coefficient of Variation
SD/Mean
Compares degree of variation between data sets