Lecture 5 REVISED Flashcards
distribution = ?
a collection of data/scores
how are the values of a distribution ordered?
commonly ordered (e.g., smallest to largest)
steps to building a discrete probability distribution
- define the random variable
- identify values for the random variable
- assign probabilities to values of the random variable
what are the two general requirements for discrete distributions?
every probability is greater than or equal to 0
the sum of all probabilities must equal 1
how to calculate expected value/mean of a random variable?
multiply every variable by their probability and then sum up all the outcomes
how to calculate the variance of a random variable?
- subtract the data point from the mean
- square the difference
- multiply the difference by the probability
- sum up all the squared deviations*probabilities
normal distribution
most popular distribution
represents many natural phenomena
bell shaped, relatively symmetrical curve
what is the highest point on a normal distribution curve?
the mean (which is also the mode & median)
what is standard normal distribution?
special normal distribution with a mean of 0 and a standard deviation of 1
standard normal distribution is centred at 0 and has intervals that increase by 1
each number on horizontal axis is a z-score
what do z-scores tell us?
tells us how many standard deviations a data point is from the mean
e.g., z-score -2 is 2 standard deviations to the left of the mean 0
what does a z-score table tell us?
tells us the total amount of area contained to the left of z
how do you calculate the area to the right of z?
1- area that corresponds to z value
e.g., z = 0.57, 0.7157 area to the left, 1-0.7157=0.2843 area to the right
how do you calculate z-scores for a normal distribution?
z = (x-mean)/standard deviation
two ways standard normal distribution can be used?
forward - from x, calculate z & find the probability/area associated with z
reverse - from probability/area, find z and calculate the data value x associated with that area
rule 68-95-99.7 / empirical rule?
used to remember the percentage of values that lie within an interval
only refers to normal distribution
68.3% of data falls within 1 standard deviation of the mean
95% of the data falls within 2 standard deviations of the mean
99.7% of the data falls within 3 standard deviations of the mean
central limit theorem (CLT)?
as sample size increases, the sampling distribution of the sample rapidly resembles the bell shape of a normal distribution
law of large numbers (LLN)?
n > 30 is considered a ‘large’ sample size
when an action is repeatedly performed, the outcome eventually approaches the expected outcome (mean)
hypothesis = ?
a statement made about a characteristic of a particular populationt
two types of hypotheses?
null hypothesis - statement made, to be tested (H0)
alternative hypothesis - opposite to the null hypothesis (Ha)
status quo = ?
null hypothesis
what is a one/two tailed test?
one tailed = testing one end of the dataset
two tailed= creating a hypothesis about both ends of the dataset, non-directional
right/left tailed test?
right tailed = hypothesis testing the upper end of the dataset
left tailed test = hypothesis testing the lower end of the dataset
significance level?
when testing a hypothesis, there’s a limit/threshold that defines whether or not the hypothesis is true
(alpha) usually 0.05 (5%)
failure to reject a null hypothesis?
no proof that null hypothesis isn’t true, but isn’t certain that it IS true