Distributions and scores Flashcards
Z-score
z = (x – μ) / σ
z = (x – μ) / σ
z-score, where:
X = standartized value
μ = the mean
σ = standard deviation
Bell curve
graphic representation of a normal distribution
Empirical rule
The empirical rule, or the 68-95-99.7 rule, tells you where most of the values lie in a normal distribution:
Around 68% of values are within 1 standard deviation of the mean.
Around 95% of values are within 2 standard deviations of the mean.
Around 99.7% of values are within 3 standard deviations of the mean.
the 68-95-99.7 rule
aka the empirical rule:
Around 68% of values are within 1 standard deviation of the mean.
Around 95% of values are within 2 standard deviations of the mean.
Around 99.7% of values are within 3 standard deviations of the mean.
t-score formula
Where
x̄ = sample mean
μ0 = population mean
s = sample standard deviation
n = sample size
2 conditions of when T score is chosen over Z score
- small sample size (e.g. under 30 samples)
- when sigma is unknown (standard deviation)
levels of freedom
n - 1
the number of independent pieces of information that went into calculating the estimate;
the number of values that are free to vary in a data set
What is standard deviation equal to on a normal distribution?
1
How does t-score change as n increases?
t-score goes up too
What does a Z-score of 0 say?
the value is right in the middle, in the mean
what does a Z-score of 1 say?
exactly one standard deviation above the mean
where are the negative Z-scores, respective to the mean?
to the left from it
where are the positive Z-scores, respective to the mean?
to the right from it
formula to covert Z-score to T-score
T = (Z x 10) + 50.
Confidence levels: what does a 95% confidence level mean?
What a 95 percent confidence level is saying is that if the poll or survey were repeated over and over again, the results would match the results from the actual population 95 percent of the time.
confidence interval graph
confidence coefficient
The confidence coefficient is the confidence level stated as a proportion, rather than as a percentage. For example, if you had a confidence level of 99%, the confidence coefficient would be .99.
The following table lists confidence coefficients and the equivalent confidence levels.
Confidence coefficient (1 – α)Confidence level (1 – α * 100%)0.9090 %0.9595 %0.9999 %
Chi-square analysis: type of data it is best applied on?
nominal
Chi-square statistic: essence
A chi-square (χ2) statistic is a test that measures how a model compares to actual observed data. The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample.
Chi-square statistic: depends on three things
χ2 depends on:
- the size of the difference between actual and observed values
- the degrees of freedom
- the samples size.
what can a Chi-square statistic be used to test?
- χ2 can be used to test whether two variables are related or independent from one another.
- to test the goodness-of-fit between an observed distribution and a theoretical distribution of frequencies.
Formula of the Chi-square statistic
What is Chi-square statistic test for independence?
A χ2 test for independence can tell us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations.
What is Chi-square statistic: Goodness-of-fit?
χ2 provides a way to test how well a sample of data matches the (known or assumed) characteristics of the larger population that the sample is intended to represent. If the sample data do not fit the expected properties of the population that we are interested in, then we would not want to use this sample to draw conclusions about the larger population.
null-Hypothesis
what we want to NULLIFY, to disprove
alternative Hypothesis
what we believe we can prove