unit 1 - chapter 2 - descriptive statistics Flashcards

1
Q

mean, median and mode

A

Where’s the middle of the distribution (shows curved to left bar graph)

Mode highest point on graph
Median will be somewhere in the middle
Mean will be pulled/dragged by the outliers

Bell curve means all are the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

levels of data… measurements to use

A

1 - Nominal.… Mode

2 - Ordinal…Median (p50/50th percentile)

3 - Interval…Mean
4 - Ratio….Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

nominal - mode

A

Mode for category or value for the graph of top billion dollar content companies?
Disney is the mode it is the top of the chart (the most)

Mean for Netflix original content hours?
Drama and Kids because they are the top of the chart (the most)

Mean for time per day on netflix
Highest point on the graph is sunday (2.10 HRS)
It is not tuesday/wednesday and friday because they add up the most and are the same (1:30 HRS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ordinal - median

A

Median for movie rating and attendance?
For value it is 300 which is PG13 based off attendances and going half way up

Median for levels of pain and frequency
3.5 level of pain based off of half way up of frequency

Median = PCT(n+1)
PCT = %
N = sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

interval and ratio - mean

A

Add everything up and divide by N
X bar = sigma (x) / n

Mean for blended strawberry
CF is cumulative frequency
N = 52 (CF top number)

X bar = sigma (x * f)/ n
Units * frequency / CF

Check this…..
The range is
Get it from units!!
= 60-55
= 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

mode

components:
quantity:
outliers:

A

components: no formula
quantity: one or more
outliers: not affected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

median

components:
quantity:
outliers:

A

components: size of dataset
quantity: only one
outliers: not affected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

mean

components:
quantity:
outliers:

A

components: dataset size and data points
quantity: only one
outliers: affected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

4 levels of data

A
  1. Nominal - variation ratio
  2. Ordinal - median deviation
  3. Interval - standard deviation
  4. Ratio - standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

standard deviation vs variance

A

sd
(more risk) Sample = S
Population = Sigma (o)

variance
Sample = s^2
Population =sigma^2 (o^2)

population = parameter
statistic = sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

standard deviation

A

Different answers: s or o (s2 or o2)
Easier to solve by hand
Square the numerator because x - x bar = 0

Downside of s2 and o2
Problem is variance is in a magnitude greater than data
Answer is squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

standard deviation

A

Will not zero out
Is based on the mean
Average distance (ruler)
Same scale as original data
Quiz question: Thus the standard deviation is the..
Standard (benchmark)*
Note: SD is influence by outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

standard deviation is used for

A

Used as a descriptor
Used to normalize data
In business as a measure of volatility, risk, control and outcome assignment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the variance

A

Historical value
Appears ana an element or aggregate variability in statistical tools such as ANOVA, SLR, and multiple regression
Dropped in bigger formulas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

facebook practice problem

A

Can you calculate the mean for less than $40 million
What facebook said they would do:
Avg duration of video viewed = total time watched / total number of users watching video

What they actually is:
Avg duration of video viewed = total time watched / total number of users watching video 3 or more seasons
Denominator smaller avg will be bigger
Average duration metrics were inflated 150-200%
Highlight clips increased this number
Tik tok vs youtube videos on facebook

Calculate the mean for less than 40 million dollar contract for Facebook

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

quartiles vs percentiles

A

Quartiles are special percentiles. The first quartile, Q1, is the same as the 25th percentile, and the third quartile, Q3, is the same as the 75th percentile. The median, M, is called both the second quartile and the 50th percentile.

The third quartile, Q3, is nine. Three-fourths (75%) of the ordered data set are less than nine.
The interquartile range is a number that indicates the spread of the middle half or the middle 50% of the data.

To calculate quartiles and percentiles, the data must be ordered from smallest to largest. Quartiles divide ordered data into quarters. Percentiles divide ordered data into hundredths.
To score in the 90th percentile of an exam does not mean, necessarily, that you received 90% on a test. It means that 90% of test scores are the same or less than your score and 10% of the test scores are the same or greater than your test score.

17
Q

sample mean vs population mean

A

The letter used to represent the sample mean is an x with a bar over it (pronounced “x bar”): 𝑥–

The Greek letter μ (pronounced “mew”) represents the population mean. One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random.

18
Q
  1. when is the mean = median
  2. when is the mean > median
A
  1. when the distribution is symmetrical
  2. when the distribution is skewed to the right

mean not mode!
where does the average lie?

19
Q

why is the standard deviation important?

A
  1. provides a numerical measure of the overall amount of variation in a data set, and
  2. can be used to determine whether a particular data value is close to or far from the mean.
  3. the standard deviation provides a measure of the overall variation in a data set
20
Q

variability in samples

A

Observational or measurement variability
Natural variability
Induced variability
Sample variability

21
Q

variability in samples - measurement variability

A

Measurement variability occurs when there are differences in the instruments used to measure or in the people using those instruments.

If we are gathering data on how long it takes for a ball to drop from a height by having students measure the time of the drop with a stopwatch, we may experience measurement variability if the two stopwatches used were made by different manufacturers

22
Q

variability in samples - natural variability

A

Natural variability arises from the differences that naturally occur because members of a population differ from each other.

For example, if we have two identical corn plants and we expose both plants to the same amount of water and sunlight, they may still grow at different rates simply because they are two different corn plants.

23
Q

variability in samples - induced variability

A

Induced variability is the counterpart to natural variability; this occurs because we have artificially induced an element of variation (that, by definition, was not present naturally):

For example, we assign people to two different groups to study memory, and we induce a variable in one group by limiting the amount of sleep they get.

24
Q

variability in samples - sample variability

A

Sample variability occurs when multiple random samples are taken from the same population. For example, if I conduct four surveys of 50 people randomly selected from a given population, the differences in outcomes may be affected by sample variability.