Week 2: Hypothesis Testing and Its Implications Flashcards
What question are we focusing on the framework this lecture?
Meets assumption of parametric tests?
Answering the question: Meets assumption of parametric tests will determine whether our continous data can be tested with
with parametric or non-parametric tests
A normal distribution is a distribution with the same general shape which is a
bell shape
A normal distribution curve is symmetric around
the mean μ
A normal distribution is defined by two parameters - (2)
the mean (μ) and the standard deviation (σ).
Many statistical tests (parametric) cannot be used if the data is not
normally distributed
What does this diagram show? - (2)
μ = 0 is peak of distribution
Block areas under the curve and gives us insight to way data is distributed and certain scores occuring if they belong to normally distribution e.g., 34.1% of values lie one SD below mean
A z score in standard normal distribution will reflect the number of
SD above or below the mean of a particular score is
How to calculate a z score?
Take a value of participant (e.g., 56 years old) and take away mean of distribution (e.g., mean age of class is 23) divided by SD (class like 2)
If a person scored a 70 on a test with a mean of 50 and a standard deviation of 10
Converting the test scores to z scores, an X of 70 would be…
What the result means…. - (2)
a z score of 2 means the original score was 2 standard deviations above the mean
We can convert our z scores to
pecentiles
Example: What is the percentile rank of a person receving a score of 90 on the test? - (3)
Mean - 80
SD = 5
First calculating z score: graph shows that most people scored below 90. Since 90 is 2 standard deviations above the mean z = (90 - 80)/5 = 2
Z score to pecentile can be looked at table that z score of 2 is equivalent to the 97.7th percentle:
The proportion of people scoring below 90 is thus .977 and proportion of people scoring above 90 is 2.3% (1-0.977)
We can not always measure the whole… for a study
population
What is the sample mean?
an unbiased estimate of the population mean.
Example of sample vs population - (3)
You want to study political attitudes in young people.
Your population is the 300,000 undergraduate students in the Netherlands.
Because it’s not practical to collect data from all of them, you use a sample of 300 undergraduate volunteers from three Dutch universities – this is the group who will complete your online survey.
How can we know how that our sample mean estimate is representative of the population mean?
Via computing standard error of mean - smaller SEM the better
What does this diagram shows you? - (2)
If you take several samples from same population,
each sample has its own mean and some sample means will be different or same as population mean- error - known as SEM
What is sample variation and example - (2)
samples will vary because they contain different members of the population;
a sample that by chance includes some very
good lecturers will have a higher average (higher rating of all lectures) than a sample that, by chance, includes some awful lecturers.
Standard deviation is used as a measure of how
representative the mean was of the observed data.
Small standard deviations represented a scenario in which most data points were
most data points were close to the mean
Large standard deviation represented a situation in which data points were
widely spread
from the mean.
How to calculate the standard error of mean?
computed by dividing the standard deviation of the sample by the the square root of the number in the sample
The larger the sample the smaller the - (2)
standard error of the mean
more confident we can be that the sample mean is representative of the population.
The central limit therom proposes that
as samples get large (usually defined as greater than 30), the sampling distribution has a normal distribution with a mean equal to the population mean, SD = SEM
The standard deviation of sample means is known as the
SEM (standard error of the mean)
A different approach to assess accuracy of sample mean as estimate of - population mean, aside from SE, is to - (2)
calculate boundaries and range of values within which we believe the true value of the population mean value will fall.
Such boundaries are called confidence intervals.
Confidence intervals are created by
samples
A 95% confidence intervals is consructed such that
these intervals (created by samples) will contain the population mean
95% Confidence interval for 100 samples (CI constructed for each) would mean
95 of these samples, the confidence intervals we constructed would contain the true value of the mean in the population.
In fact, for a specific confidence interval, the probability that it contains the population value is either - (2)
0 (it does not contain it) or 1 (it does contain it).
You have no way of knowing which it is.
Diagram shows- (4)
- Dots show the means for each sample
- Lines sticking out representing Ci for the sample means
- If there was a vertical line down it represents population mean
- If confidence intervals don’t overlap then it shows significant difference between the sample means
if our sample means were normally distributed with a mean of 0 and a
standard error of 1, then the limits of our confidence interval
would be –1.96 and +1.96 -
95% of z scores fall between
-1.96 and 1.96
Confidence intervals can be constructed for any estimated parameter, not just
μ - mean
. If the mean represents the true mean well, then the confidence interval of that mean should be
small
if the confidence interval is very
wide then the sample mean could be
very different from the true mean, indicating that it
is a bad representation of the population
Remember that the standard error of the mean gets smaller with the number of observations and thus our confidence interval also gets
smaller - make sense as more we measure more certain sample mean close to population mean
Confidence intervals can be constructed for any estimated parameter, not just
mean , μ
Calculating Confidence intervals (for observations) - rearraning z formula
Know most scores remain at z = 1.96 (upper bound) and z = -1.96 (lower bound)
LB = (-1.96* SD of sample) + mean sample
UB = (+1.96* SD of sample) + mean sample
Calculating Confidence Intervals for sample means - rearranging in z formula
LB = Mean - (1.96 * SEM)
UB = Mean + (1.96 * SEM)
The standard deviation of SAT verbal scores in a school system is known to be 100. A researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random sample of 10 scores.
The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780.
Calculate CI
* M - 530
* N = 10
* SEM = 100/ square root of 10 = 31.62
* Value of z for 95% CI is number of SD one must go from mean (in both directions) to contain 0.95 of the scores
* Value of 1.96 was found in z-table
* Since each tail is to contain 0.025 of the scores, you find the values of z for which is 1-0.025 = 0.975 of the socres below
* 95% of z scores lie between -1.96 and +1.96
* Lower limit = 530 - (1.96) (31.62) = 468.02
* Upper limit = 530 + (1.96)(31.62) = 591.98
Null hypothesis is that there is
no effect of the predictor variable on the outcome variable
The alternate hypothesis is that there is an effect of
the predictor variable on the outcome variable
Null hypothesis signifiance testing computes the probability of the null hypothesis being true which si referred as the
p-value
To test the fit of statistical models to test our hypotheses, we calculate
getting that model (Data) if the Null hypothesis H0 were true (Statistical significance)
What if the proability p- value was small?
we conclude the model fits the data well (explains a lot of the variance) and we gain confidence in the alternative hypothesis H1
Steps in Hypothesis testing (6)
- specify the null hypothesis H0 and the alternative hypothesis H1
- select a significance level. Typically the 0.05 or the 0.01 level.
- calculate a statistic analogous to the parameter specified by the null hypothesis. (e.g. if null defined by parameter μ1- μ2 (diff between two means) then the statistic is M1-M2 (difference between sample means))
- calculate the probability value of obtaining a statistic (statistic computed from the data) as different or more different from the parameter specified in the null hypothesis (often 0 or based on past evid and mean stay same)
- probability value computed in Step 4 is compared with the significance level chosen in Step 2.
- If the outcome is statistically significant, then the null hypothesis is rejected in favor of the alternative hypothesis.
Think of test statistic capturing
signal/noise
Hypo
A testStatistic for which the frequency of particular values is known (t, F, chi-square) and thus we can calculate the
probability of obtaining a certain value or p value.