unit 1 - chapter 2 - descriptive statistics Flashcards
mean, median and mode
Where’s the middle of the distribution (shows curved to left bar graph)
Mode highest point on graph
Median will be somewhere in the middle
Mean will be pulled/dragged by the outliers
Bell curve means all are the same
levels of data… measurements to use
1 - Nominal.… Mode
2 - Ordinal…Median (p50/50th percentile)
3 - Interval…Mean
4 - Ratio….Mean
nominal - mode
Mode for category or value for the graph of top billion dollar content companies?
Disney is the mode it is the top of the chart (the most)
Mean for Netflix original content hours?
Drama and Kids because they are the top of the chart (the most)
Mean for time per day on netflix
Highest point on the graph is sunday (2.10 HRS)
It is not tuesday/wednesday and friday because they add up the most and are the same (1:30 HRS)
ordinal - median
Median for movie rating and attendance?
For value it is 300 which is PG13 based off attendances and going half way up
Median for levels of pain and frequency
3.5 level of pain based off of half way up of frequency
Median = PCT(n+1)
PCT = %
N = sample
interval and ratio - mean
Add everything up and divide by N
X bar = sigma (x) / n
Mean for blended strawberry
CF is cumulative frequency
N = 52 (CF top number)
X bar = sigma (x * f)/ n
Units * frequency / CF
Check this…..
The range is
Get it from units!!
= 60-55
= 5
mode
components:
quantity:
outliers:
components: no formula
quantity: one or more
outliers: not affected
median
components:
quantity:
outliers:
components: size of dataset
quantity: only one
outliers: not affected
mean
components:
quantity:
outliers:
components: dataset size and data points
quantity: only one
outliers: affected
4 levels of data
- Nominal - variation ratio
- Ordinal - median deviation
- Interval - standard deviation
- Ratio - standard deviation
standard deviation vs variance
sd
(more risk) Sample = S
Population = Sigma (o)
variance
Sample = s^2
Population =sigma^2 (o^2)
population = parameter
statistic = sample
standard deviation
Different answers: s or o (s2 or o2)
Easier to solve by hand
Square the numerator because x - x bar = 0
Downside of s2 and o2
Problem is variance is in a magnitude greater than data
Answer is squared
standard deviation
Will not zero out
Is based on the mean
Average distance (ruler)
Same scale as original data
Quiz question: Thus the standard deviation is the..
Standard (benchmark)*
Note: SD is influence by outliers
standard deviation is used for
Used as a descriptor
Used to normalize data
In business as a measure of volatility, risk, control and outcome assignment
the variance
Historical value
Appears ana an element or aggregate variability in statistical tools such as ANOVA, SLR, and multiple regression
Dropped in bigger formulas
facebook practice problem
Can you calculate the mean for less than $40 million
What facebook said they would do:
Avg duration of video viewed = total time watched / total number of users watching video
What they actually is:
Avg duration of video viewed = total time watched / total number of users watching video 3 or more seasons
Denominator smaller avg will be bigger
Average duration metrics were inflated 150-200%
Highlight clips increased this number
Tik tok vs youtube videos on facebook
Calculate the mean for less than 40 million dollar contract for Facebook