Stats Flashcards

1
Q

Type 1 error

A

A type 1 error (aka false positive) occurs when a researcher incorrectly rejects a true null hypothesis.
This means findings are reported as significant when they have actually occurred by chance.
o The risk of committing a type I error is reduced by using a lower value for p (e.g. a p-value of 0.01 means there is a 1% chance of committing a Type I error), also correct for multiple comparisons!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

• Type 2 error

A

: A type II error (aka false negative) occurs when a researcher fails to reject a null hypothesis which is really false. Here a researcher concludes there is not a significant effect, when actually there is.
o The risk of committing a type II error is decreased by ensuring the test has enough power, which means having a sample size large enough to detect a practical difference when one truly exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Null hypothesis

A

Proposes that no statistical significance exists in a set of observations and that effects seen are due to chance alone

Note: can never prove or disprove the null hypothesis, only provide insufficient/sufficient evidence to reject it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

p value definition

A

The p value (a function of the observed sample results) is defined as the probability of obtaining a result equal to or more extreme that what was actually observed, assuming that the null hypothesis is true

Here, “more extreme” is dependent on the way the hypothesis is tested. Before the test is performed, a threshold value is chosen, called the significance level of the test, traditionally 5% (p<0.05) or 1% (p<0.01)

With p<0.05, a difference this big (or bigger) will occur 5% of the time (1 in 20) by chance (NOT that the chance of being wrong is 5%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SD

A

Reflects the variability of means for a certain sample size, a measure of the precision with which the sample mean is estimated

Descriptive stats

Root variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

SEM

A

Standard error of the sample mean’s estimate of the true population mean.

It is a measure of the variability of means when many similar samples were taken from the population of possible measurements

SE = s/root n

Inferential statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence intervals

A

Confidence intervals consist of a range of values (interval) that act as good estimates of/contain the unknown true population mean

A 95% confidence interval around the sample mean means that this interval will contain the true population mean 95% of the time

We can say that we are 95% confident that the true population mean is within this interval

95% CI = mean +/- (1.96xSEM)

The narrower the confidence interval, the more precise the estimate 🡪 as a general rule, as sample size increases the CI should become narrower

If CIs dont overlap with the means then we can say that there is a statistically significant difference at 0.05 significance level

However, if CIs include 0 we cannot say there is a significant difference - ‘does exam stress make people perform better or worse’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Effect size

A

effect size = mean of experimental group - mean of control group/SD (SD is pooled or averaged between groups)

Simpler way of wuantifying difference between groups - gives us an idea of the magnitude of the difference

Generally, effect size >0.8 indicates large effect

Effect size of 0.5 or 0.2 are considered moderate or small, respectively

Advantages:

Emphasises the size of the difference rather than confounding this with sample size

Particularly valuable for quantifying effectiveness of particular intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What will happen to SD and SEM as you increase sample size?

A

SD should in theory stay the same (imagine normal distribution)

SEM should decrease (disguise)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why would you use SEM over SD?

A

To test differences between means rather than assess the spread and variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

T test

A

difference in means/SEM

Assumptions:

  • Normality – populations tested must be normally distributed
    • This can be tested graphically and/or statistically – Shapiro-Wilk test or Kolmogorov Smirnov test are frequently used
  • Homoscedasticity (homogeneity of variance)
    • The 2 samples tested should have the same finite variance, which can be tested for using Levene’s test
    • Can get around requirement by using a ‘t test with unequal variance’ – uses an approximation

However, T tests (and ANOVAs) are generally very ‘robust’ and work well even if the above assumptions have been violated (especially the homogeneity of variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Z test

A

Used to determine whether 2 population means are significantly different when the variances are known and the sample size is large

The test statistic is assumed to have a normal distribution, and parameters such as SD should be known for accurate Z test to be performed

T test vs Z test:

  • Z-test is a statistical hypothesis test that follows a normal distribution while T-test follows a Student’s T-distribution.
  • A T-test is appropriate when you are handling small samples (n < 30) while a Z-test is appropriate when you are handling moderate to large samples (n > 30)
  • T-test is more adaptable than Z-test since Z-test will often require certain conditions to be reliable. Additionally, T-test has many methods that will suit any need.
  • T-tests are more commonly used than Z-tests.
  • Z-tests are preferred than T-tests when standard deviations are known
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ANOVA assumptions

A

Normal distribution of the dependent variable (population distributions are approximately normal)

Homogeneity of variances

Observations within each sample are independent of each other (does not apply for repeated measures ANOVA)

For repeated measures ANOVAs the assumption of sphericity must be met, which is the condition where the variances of the differences between all combinations of related groups are equal

Violation of sphericity is serious for RM ANOVA and causes the test to be too liberal (increase in type I error rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Repeated measures ANOVA

A

Equivalent to a one way ANOVA, but for related, not independent groups, and is the extension of the paired t test (needed if you are following one person over time and gathering multiple data points from them e.g. measure blood pressure at 5 time points for one person over the course of treatment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Pearsons assumptions

WHat is r value?

A

Assumptions: relationship is linear, observations are independent

Pearson’s ‘r’ value is the correlation co-efficient derived by the product moment method

The ‘r2’ value is the square of this ‘r’ value and represents the ‘co-efficient of determination’

It ranges between 0 and 1 with 1 representing a perfect linear correlation between two variables

The p value gives the probability that the correlation is derived by chance alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

difference between pearson and spearmen

A

Pearson = parametric

Spearmans = non parametric, can be used for