stats 7 Flashcards
first step of hypothesis testing
State the null hypothesis
second step of hypothesis testing
Set a critical value
third step of hypothesis testing
Calculate a test statistic.
fourth step of hypothesis testing
Compare the test statistic to the critical value
fifth step of hypothesis testing
Find the p-value
sixth step of hypothesis testing
Compare the p-value of your data to the critical value’s significance
level.
Choose a difference in means test: 1. when testing 2 variables, if
the independent variable is categorical and the dependent variable is
numeric,
3. the numeric dependent variable is normally distributed, and
4. you are interested in the difference in the average values of the dependent variable across the categories of independent variable
significance level, alpha (a)
the probability of rejecting the null hypothesis when its actually true, representing the threshold for statistical significance
difference in means testing relies on the student’s
t-distribution
The student’s t distribution is
the distribution of the values that the differences in sample means can take.
The Student’s t distribution is generally
bell-shaped, like the normal
distribution
However, when the sample size is small (less than 30 observations), the Student’s t-distribution shows
increased variability (i.e., is
flatter) than the normal distribution.
Independent sample t-test
a statistical test that compares the means of two independent groups to see if there is a significant difference
Paired Samples t-test (Dependent t-test)
a statistical test that compares the means of two related groups or matched pairs.
One-Sample t-test
a statistical test that compares the mean of a
single sample to a known value (often the population mean)
Each of these tests can be
one- or two-tailed
One-tailed t-test
a statistical test used to determine if there is a significant difference in the means of two groups, with a specific directional hypothesis.
The one-tailed t-test only looks at
one end (tail) of the distribution
Two-tailed t-test
a statistical test used to determine if there is a
significant difference in the means of two groups, without specifying a direction
The two-tailed t-test looks at
both ends (tails) of the
distribution
Normality
he data in each group should be approximately
normally distributed
-* This assumption is particularly important when sample sizes are small
(typically n < 30).
* For larger sample sizes, the t-test is robust to violations of normality
due to the Central Limit Theorem
Homogeneity of Variances
the variances of the two groups should be equal (or approximately equal
* Homogeneity can be tested using Levene’s test or Bartlett’s test.
* If the variances are significantly different, you may need to use a
Welch’s t-test, which does not assume equal variances.
test of significance
asking whether the difference between populations
difference between means
if we compare an infinite number of pairs of reasonably large samples from this population, we could form a frequency distribution of the differences between pairs of sample means
a difference between means is another
statistic, its descriptive of two samples, rather than one
we need to identify the - of a difference
direction
the centre point of the distribution represents the
frequency of pairs of samples with zero difference between their means
sampling distribution of the differences between means
differences between two means of a huge number of pairs of random samples drawn from the same population.
standard error of the differences between means
the dispersion in the distribution of differences between sample means can be measured in standard deviation units
in significance testing there are two opposite risks
type 1 error
type 2 error
type 1 error
-someone may accept a difference as significant when it is not
we guard against a type 1 error by
demanding a more stringent level of significance (ex 1% rather than 5%)
type 2 error
-if we ask for a bigger difference between sample means before we’ll accept that there is a real difference between the populations, then the more likely it is that we’ll fail to recognize a difference as being real
the emphasis is on avoiding type —- errors
1
the greater the difference in standard deviation between 2 samples, the —
less accurately can we establish the significance of the difference between their means.
parametric
it is a fact that most of the classical statistical techniques assume that samples are drawn from normally distributed populations and allow us to estimate the parameters of such populations.
the idea of normal distribution is inappropriate to
category-data
non parametric tests require differences to be much
bigger if they are to be accepted as significant
when both variables are continuous-
we can visually detect covariation
researchers should state if using a 1 or 2 tailed test -
before collecting the data
what does 1% imply
we want a difference between means so large that the probability of its occurring by chance from the theoretical ‘no difference’ population mentioned above would be 1% or less
critical region
where the alternative hypothesis will seem more acceptable to us than the null hypothesis
z test
use standard deviation as a unit (z unit) for measuring the point where the critical region begins and relate it to the proportions of the normal curve
large samples are only accurate with
z-tests
with samples of less than 30, - tests are used
t-tests
a test where several samples can be compared at once
f test- analysis of variance