Lesson 3: Normal Distribution Flashcards

1
Q

Distribution

A
  • histogram of sample space (all possible sample values)
  • X-axis: all possible values in sample space
  • Y axis: Frequency of the sample value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standardized values

z value

A
  • Compute Probabilities
  • Compare two different distributions
  • We compute standardized values
  • z = 2 : Data value is 2 standard deviations above the mean
  • z = -1.6 : Data value is 1.6 standard deviations below the mean

z = (Data value (y) - mean(ΞΌ)) / Standard deviation (Οƒ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

p-value

A
  • Represents the area under the normal distribution curve towards left side
  • 1.0 z-value = 0.15 + 2.35 + 13.50 + 34.00 + 34.00 = 84% p-value (0.8413)
  • area under normal distribution curve = 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

p-value calculation for specific range

example

A

probability of student scoring b/n 450 and 600 on SAT
mean = 500, sd = 100
z = (600-500)/100 = 1.0 = p-value 0.8413
z = (450-500)/100 = -0.50 = p-value 0.3085
0.8413 - 0.3085 = 0.5328 or 53.28%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Standard normal curve
(πœ‡=0, 𝜎=1)

A
  • Symmetric about its mean πœ‡=0,𝜎=1
  • Mean = Median = Mode
  • Single peak at z=0
  • Inflection point at βˆ’1π‘Žπ‘›π‘‘+1
  • Area under the curve = 1
  • Area of left ( π‘šπ‘’π‘Žπ‘› πœ‡=0) = Area of right = Β½
  • Follows the Empirical Rule
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Normalizing Data

Computing z-values

Excel and R

A
  • Excel: =STANDARDIZE(data value, mean, sd)
  • R: scale(data, mean, sd)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Computing p-values
𝑃(𝑧)π‘₯ < 𝑧

Excel and R

A

β€˜Left’ Area (Probability) under Standard Normal Curve
- Excel: =NORMSDIST(z-value) Normal standard dist
- R: pnorm(z-value) Cumulative Density Function (cdf)

β€˜Right’ Area (Probability) under Standard Normal Curve
- Excel: =1-NORMSDIST(z-value)
- R: 1 - pnorm(z-value)

β€˜In Between’ Area (Probability) under Standard Normal
Curve
- Excel: =NORMSDIST(high z) - NORMSDIST(low z)
- R: pnorm(high z) - pnorm(low z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Converting p-value (β€˜left’ area) to z-value

Excel and R

A
  • Excel: =NORMSINV(p-value)
  • R: qnorm(p-value) Quantiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Relative Frequency
(Histogram)

A
  • Individual frequency / total frequencies
  • Histogram y-axis = Density
  • frequency total area = 1.0 (100%)
  • aka Probability Distribution Function (PDF)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Uniform Distribution Function

A
  • When the probability of all possible events in the sample space is same
  • eg. 6 dice sides = equal probability
  • R: runif(n=?) = generate Random Uniform Distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Continuous Distributions

types (12)

A
  1. Uniform
  2. Normal
  3. Chi Square
  4. Fisher’s F
  5. Student’s t
  6. Gamma
  7. Exponential
  8. Beta
  9. Cauchy
  10. Lognormal
  11. Logistics
  12. Weibull
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Discrete Distributions

types (5)

A
  1. Binomial
  2. Poisson
  3. Hypergeometric
  4. Negative Binomial
  5. Wilcox
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Normal Distribution

properties

A
  • Symmetric about its mean πœ‡
  • Mean = Median = Mode
  • Single peak at π‘₯=πœ‡
  • Inflection point at πœ‡βˆ’πœŽπ‘Žπ‘›π‘‘πœ‡+𝜎
  • Area under the curve = 1
  • Area of left ( π‘šπ‘’π‘Žπ‘› πœ‡) = Area of right = Β½
  • Follows the Empirical Rule
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Density Distribution

R function

A

dnorm()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Random Numbers Distribution

A
  • R: rnorm(n= ?, mean= ?, sd= ?)
  • generates n numbers normally distributed with specified mean and sd
  • clean histogram and QQ plot if n is big enough
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Testing Normality

Histogram

A
  • should see bell-shaped curve if normal distribution
  • changes shape with different bin sizes = not as accurate
  • never sure if you have right bin size
17
Q

Testing Normality

Quantile-Quantile (QQ) Plot

A
  • Data is plotted against a theoretical normal distribution
  • If you see a straight line, data is normally distributed
  • X-axis: quantile data (theoretical normal dist)
  • Y-axis: sorted data

Procedure:
1. Sort data
2. Divide standard normal curve into n+1 portions
3. Take z-value for each portion
4. Plot z-values on x-axis and sorted data on y-axis
2. Plot against appropriate quantiles from standard normal distribution

18
Q

QQ Plot

Calculating z-values

A
  1. assign i values of each sorted point (eg. i = 1,2,3,etc)
  2. p-value for each = i / (n+1)
  3. calculate z-values from p-values (Excel or R)
19
Q

QQ Plot generation

R

A

qqnorm(vector)
qqline(vector)

20
Q

R commands

sort column values

ascending

A

table[order(table$column),]

21
Q

R commands

histogram frequency counts

A
  • variable = hist(table$column,seq(start, end, interval))
  • variable$count
22
Q

R commands

cumulative sum linear plot

A

-adds up all vector values
- plot(cumsum(variable$counts),type=’l’)

23
Q

R commands

calculate boxplot stats

A
  • variable = boxplot(table$column)
  • outlier = variable$out
  • Min = variable$stats[1]
  • 1st Quartile = variable$stats[2]
  • Median = variable$stats[3]
  • 3rd Quartile = variable$stats[4]
  • Max = variable$stats[5]
24
Q

R command

filtering results

A

new.variable = (Table$Column > value) & (Table$Column < value)