Lesson 2.2: Numerically Summarizing Data Flashcards

Question 1

Q

population mean

Answer

A

denoted by 𝜇
population size = 𝑁

Question 2

Q

sample mean

Answer

A

denoted by 𝑥 with line over it
sample size = 𝑛
R: mean(DataTable$Column)

Question 3

Q

median

Answer

A

middle number of data in ascending order
if odd number of observaions: middle number
if even number of observations: mean of two middle numbers
not greatly affected by outliers
R: median(DataTable$Column)

Question 4

Q

mode

Answer

A

most frequent observation
R: Mode <- function(x) {
ux unique(x)
ux [which.max (tabulate(match(x, ux)))]
}
histogram: hist (data,breaks=seq (low, high, interval))

Question 5

Q

Range
(R)

Answer

A

difference between largest and smallest value
-** R: range(DataTable$Column)** shows lowest and highest values

Question 6

Q

population variance

Answer

A

denoted by σ²
sum of the squared deviations about
the population mean (x-μ)²
Divided by number of observations in population (N)

> (n=length(data$Score))
(x =data$Score)
(v.population = sum((x-mean(x))^2) / n)

Question 7

Q

sample variance

Answer

A

denoted by s²
sum of the squared deviations about
the sample mean (x-sample mean)²
Divided by number of observations in sample minus 1 (n-1)

R: var(DataTable$Column)
R var() only gives sample variance

Question 8

Q

standard deviation

population and sample

Answer

A

square root of the variance
- denoted by σ (population) or s (sample)

Question 9

Q

Coefficient of Variation
(CV)

Answer

A

measures the scatter in the data relative to mean
CV = standard deviation (s) / sample mean (x)
always expressed as percentage

Question 10

Q

Computing standard deviation

Excel and R

Answer

A

Excel: = STDEV(data range)
R: sd(vector)

Question 11

Q

Empirical Rule

Answer

A

Approximately 68% data between 𝜇−1𝜎 𝑎𝑛𝑑 𝜇+1𝜎
Approximately 95% data between 𝜇−2𝜎 𝑎𝑛𝑑 𝜇+2𝜎
Approximately 99.7% data between 𝜇−3𝜎 𝑎𝑛𝑑 𝜇+3𝜎

left to right percentages under curve:
0.15 + (2.35 + (13.5 + (34 + 34) + 13.5) + 2.35) + 0.15

Question 12

Q

Standardized values

z value

Answer

A

Compute Probabilities
Compare two different distributions
We compute standardized values
z = 2 : Data value is 2 standard deviations above the mean
z = -1.6 : Data value is 1.6 standard deviations below the mean

z = (Data value (y) - mean(μ)) / Standard deviation (σ)

Question 13

Q

p-value

Answer

A

Represents the area under the normal distribution curve towards left side
1.0 z-value = 0.15 + 2.35 + 13.50 + 34.00 + 34.00 = 84% p-value (0.8413)
area under normal distribution curve = 1

Question 14

Q

p-value calculation for specific range

example

Answer

A

probability of student scoring b/n 450 and 600 on SAT
mean = 500, sd = 100
z = (600-500)/100 = 1.0 = p-value 0.8413
z = (450-500)/100 = -0.50 = p-value 0.3085
0.8413 - 0.3085 = 0.5328 or 53.28%

Question 15

Q

Standard normal curve
(𝜇=0, 𝜎=1)

Answer

A

Symmetric about its mean 𝜇=0,𝜎=1
Mean = Median = Mode
Single peak at z=0
Inflection point at −1𝑎𝑛𝑑+1
Area under the curve = 1
Area of left ( 𝑚𝑒𝑎𝑛 𝜇=0) = Area of right = ½
Follows the Empirical Rule

Question 16

Q

Normalizing Data

Computing z-values

Excel and R

Answer

Study These Flashcards

A

Excel: =STANDARDIZE(data value, mean, sd)
R: scale(data, mean, sd)

Question 17

Q

Computing p-values
𝑃(𝑧)𝑥 < 𝑧

Excel and R

Answer

Study These Flashcards

A

‘Left’ Area (Probability) under Standard Normal Curve
- Excel: =NORMSDIST(z-value) Normal standard dist
- R: pnorm(z-value)

‘Right’ Area (Probability) under Standard Normal Curve
- Excel: =1-NORMSDIST(z-value)
- R: 1 - pnorm(z-value)

‘In Between’ Area (Probability) under Standard Normal
Curve
- Excel: =NORMSDIST(high z) - NORMSDIST(low z)
- R: pnorm(high z) - pnorm(low z)

Question 18

Q

Converting p-value (‘left’ area) to z-value

Excel and R

Answer

Study These Flashcards

A

Excel: =NORMSINV(p-value)
R: qnorm(p-value)

Question 19

Q

Skewness

types (3)

Answer

Study These Flashcards

A

Mean < Median < Mode: Negative / left skewed distribution
Mean = Median = Mode: Symmetrical distribution with zero skewness
Mean > Median > Mode: Positive / right skewed distribution

Question 20

Q

Percentile

Answer

Study These Flashcards

A

The k^th percentile of a set of data is a value such that k percent of the observations are less than or equal to the value
eg. P2 = 2% of observations are <= value

Question 21

Q

Quartile

Answer

Study These Flashcards

A

The quartiles divide the data into 4 equal parts

*First quartile: Q1 *
- Bottom 25% (25 percentile)

Second quartile: Q2
-Bottom 50% = 50 percentile (median)

Third quartile: Q3
- Bottom 75% = 75 percentile

Question 22

Q

Boxplot

features and R command

Answer

Study These Flashcards

A

Line = median
box = Q1 and Q3
viscus (brackets) = min and max
dot = outlier
R: boxplot(data vector)

Question 23

Q

Skewness and Boxplots

Answer

Study These Flashcards

A

Normal Distribution
- (Q3-Q2) = (Q2-Q1)

Positive Skew
- (Q3-Q2) > (Q2-Q1)

Negative Skew
- (Q3-Q2) < (Q2-Q1)

Question 24

Q

Data Standardization and Scaling

Answer

Study These Flashcards

A

Standardization Data Variation (z-value)
- Range: -3 to +3

**Scaling Data Variation **
- (value - min value) / (max value - min value)
- Range: 0 to 1
- not effective with outliers bc will suppress scaling values of other data elements

Lesson 2.2: Numerically Summarizing Data Flashcards

Central Tendencies and Variability (24 cards)