Key Concepts Flashcards

1
Q

What are the two different types of data?

A

Categorical and quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name three types of categorical data and give an example of each

A

Binary (two levels) - e.g. Are you a smoker yes or no

Nominal (no ranking) - e.g. ethnicity

Ordinal(ranked)- e.g. height

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name two types of quantitative data

A

Discrete (isolated values) e.g. number of therapy sessions completed 1, 2, 3

Continuous (any values in interval) e.g. age, clinical scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Give 4 factors that define a normal distribution of continuous data and include an example

A

Symmetrical
Most data close to the middle
Extreme values are rare
Mathematically helpful
E.g. height of men

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does positive/right skewed distribution of continuous data appear?

A

Most values are clustered around the left tail of the distribution while the right tail of the distribution is longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does negative/left skewed distribution of continuous data appear?

A

Most values are clustered around the right tail of the distribution while the left tail of the distribution is longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a fat-tailed distribution?

A

Where extreme values are more likely
E.g. Distribution of wealth, 80/20 rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What can make classical statistics difficult?

A

Fat tailed distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What do descriptive statistics describe?

A

Data collected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What cannot be used to make inference about the wider population as values in the true population could differ due to chance?

A

Descriptive statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is typically used to describe quantitative (continuous) data?

A

A measure of the average (mean or median)

A measure of variability (standard deviation, quartiles)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A symmetric mean equals…

A

median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

True or false:

Skewed data mean does not equal median

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is sensitive to outliers?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is on the same scale as your data?

A

Standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is not on the same scale as your data?

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are two main approaches to measure variance?

A
  1. SD and variance
  2. Percentiles
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the difference between standard deviation and variance?

A

Variance is the average squared deviations from the mean, while standard deviation is the square root of this number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the empirical rule?

A

The percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean,

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What descriptive statistics are used to describe categorical data?

A

Binary and multinomial data:
Number and proportion in each category

Ordinal data:
Small number of categories: Number and proportion in each category

Larger categories for ordinal data: Median and 25th and 75th percentile
Mean (sd) – less common.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is statistical inference?

A

Making statements about the population from the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does statistical inference not address?

A
  1. If a study is biased
  2. If observed associations are causal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are two different approaches to statistical inference?

A
  1. Frequentist
  2. Bayesian
24
Q

What approach to statistical inference is more common in medical and psychological research at the moment?

A

Frequentist

25
Q

What features define a frequentist approach to statistical inference?

A
  1. Using p-values, confidence intervals, maximum likelihood
  2. Inference is based on the observed data.
  3. Make probability statements about the data, given the value of a parameter: β€œThe probability of observing data as extreme as this, given there is no treatment effect is 3%.”
  4. Different people will get the same results applying the same analysis to the same data.
26
Q

What features define a bayesian approach to statistical inference?

A
  1. Credible intervals, priors, posterior probability
  2. Incorporates prior beliefs into statistical inference
  3. Allows probability statements about parameters, given the data (and prior beliefs) e.g. β€œGiven the data we have observed, there is a 97% chance the treatment is effective”
  4. Different people will get different results depending on their prior beliefs
27
Q

In bayesian and frequentist statistics conclusions will be similar if..

A

The sample size is large enough and the strength of prior beliefs weak.

28
Q

What is used as a measure of uncertainty when using a frequentist approach?

A
  1. Confidence interval
  2. p-value:
29
Q

What is an alpha-level confidence interval?

A

An interval of uncertainty around an estimate for a parameter

30
Q

Confidence intervals are…

A

Intervals that, under repeated sampling, would contain the true value alpha percent of the time .

31
Q

What do we typically calculate?

A

95% confidence intervals

32
Q

What is the standard error?

A

The standard deviation of an estimate’s sampling distribution

33
Q

Often the standard error can be calculated from the standard deviation of the population the statistic is being calculated on

True or false

A

True

34
Q

How can you calculate the standard error for a mean?

A

Divide standard deviation by the square root of the sample size

35
Q

The standard error will, in most cases..

A

Get smaller as n increases

36
Q

The standard deviation does change systematically with sample size

True or false

A

False

The standard deviation does not change systematically with sample size

37
Q

If we assume our estimate is from a normal distribution how can we calculate the confidence interval?

A

95% 𝐢𝐼=π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’ Β±1.96Γ—π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ
E.g. for a mean 95% 𝐢𝐼=π‘’π‘ π‘‘π‘–π‘šπ‘Žπ‘‘π‘’ Β±1.96 (π‘ π‘‘π‘Žπ‘›π‘‘π‘Žπ‘Ÿπ‘‘ π‘‘π‘’π‘£π‘–π‘Žπ‘‘π‘–π‘œπ‘›)/βˆšπ‘›

38
Q

As sample size increases confidence interval becomes what?

A

Smaller

39
Q

For means what is often used instead of a normal distribution and what does this often lead to?

A

t-distribution

This leads to a different multiplier for the standard error to 1.96, usually fairly close to 2

40
Q

What is a p-value?

A

The probability of observing the data, or data more extreme given the parameter of interest takes a given value.

41
Q

What is a null hypothesis?

A

The value the parameter is set to take

Typically the null hypothesis is for no effect or association.

42
Q

p-values reported from models are typically..

A

for parameters to be equal to zero

43
Q

What is used to make decisions (or inference) about the value of a population parameter?

A

Statistical test of hypothesis

44
Q

What does a statistical test of inference consist of?

A

A statistical test of hypothesis consists of five parts

1 . The null hypothesis, denoted by H0

  1. The alternative hypothesis, denoted by H1
  2. One tailed: H1 d: parameter > H0
  3. Two tailed: H1 : parameter β‰  H0
    Two tailed p-values are almost always used
  4. The p-value
  5. A significance threshold (0.05)
45
Q

When would we reject the null hypothesis?

A

If the p-value is below the significance threshold we reject the null hypothesis and conclude that the alternative hypothesis is true

46
Q

If the p-value is not below the significance threshold we do not have evidence to reject the null hypothesis.

Why is this?

A

This does not mean the null hypothesis is true

A non-significant p-values tells us we do not know much

47
Q

What does β€˜p < 0.05’ mean?

A

There is evidence that there is a difference: If there was no difference we’d have been unlikely to see the data we did.

48
Q

What does p > 0.05 mean?

A

There is insufficient evidence to conclude there is a difference. If there was no difference our results would not be unexpected.But we cannot rule out a difference.

If a p-value is not statistically significant we cannot conclude that there is no difference.

49
Q

What are two errors from hypothesis tests?

A

Type 1 error (Ξ±)

Type 2 error (Ξ²)

50
Q

What is a type 1/a error?

A

Falsely conclude there is a difference

Controlled with significance threshold

If the significance threshold is 0.05 we expect a type 1 error rate of 5%

51
Q

What is a Type 2 error (Ξ²)?

A

Fail to conclude that the there is evidence for a difference when there is a true difference

Sample size, magnitude of true difference, and variability of data effect type 2 error rates

Power = 1 - Ξ²

Power is the probability of concluding there is a difference, when true.

Low powered test: Unlikely to be significant even if there is a difference

52
Q

What can you determine when given a -1 alpha level confidence interval?

A

Whether the p-value is statistically significant at the Ξ± level.

i.e. given a 95% confidence interval you can tell if the p-value would be significant at the 5% level

53
Q

If the confidence interval contains the null hypothesis, p > 0.05

If the confidence interval does not contain the null hypothesis p <0.05

What is an example of this?

A

For example if the null hypothesis is 0:

95% CI of -1.1 to -0.1 would correspond to a statistically significant result

-1.1 to 0.1 would correspond to a result that was not statistically significant.

54
Q

What causes type 1 error?

A

Multiple testing & p-hacking

55
Q

What enhances the issue of multiple testing and p-hacking?

A

-Selective reporting enhances the problem, eg:
Only report significant results and ignore non-significant results

  • Place more emphasis on significant results

-Selective reporting can occur at the study level: studies with non-significant findings are less likely to be published

56
Q

What are solutions to multiple testing and p hacking?

A
  1. Bonferroni correction: divide significance threshold by number of tests
    - This can be conservative
    - Leads to larger sample sizes being required
  2. Pre-specification of outcomes, analysis methods, and studies
    - Can specify primary outcome – stops emphasis being shifted to significant results
    - Makes visible the number of tests conducted
    - Compulsory in randomised controlled trials e.g. All trials campaign http://www.alltrials.net/
    - Harder to do in more exploratory studies
57
Q

What are some reasons for banning the p-value?

A
  1. Can be manipulated with multiple testing
  2. They are often misinterpreted
    People often interpret p > 0.05 as meaning β€œno effect”
  3. Over reliance on significance thresholds
    p = 0.04 given wildly different interpretation to p = 0.06
  4. Bayesian argument:
    p-value tells us probability of observing the data given no effect
    What we want to know is probability of an effect. This can only be achieved with Bayesian inference.