Statistical models 1 Flashcards

1
Q

Why summarise data

A
  • We can make general statements beyond specific observations.
  • Typically done using tables or graphs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Summarising data using tables

A
  • Frequency distributions
  • Cumulative distributions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Summarising data using graphs

A
  • Histograms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a distribution?

A

Information about the data you have for one variable.
Properties of distributions
* What the central tendency is (mean, median or mode).
* How symmetrical the data is either side of the mean (skew).
* How variable the data is (e.g. data range, standard deviation and kurtosis).
* If it’s a “normal distribution”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Central tendency (the average)

A
  • Mean: (sum of values) divided by (number of values).
  • Median: middle value in a list ordered from smallest to largest. 50th percentile.
  • Mode: most frequently occurring value on the list.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Skew (symmetry of distribution)

A

Positive skew: tail points to right or positively

Negative: tail points to left or negatively

Normal is symetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kurtosis

A

Positive kurtosis
Leptokurtic: centre very high

Negative kurtosis
Platykurtic: centre very flat

Normal distribution
Mesokurtic: normal bell curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Normal distribution

A

Symetrical bell curve where mean, mode and median are close

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Variability

A

How spread out a set of data is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Range

A

The range of a variable is the biggest value minus the smallest value. Vulnerable to extreme scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interquartile range

A

The interquartile range (IQR) is like the range, but instead of the difference between the biggest and smallest value the difference between the 25th percentile and the 75th percentile is taken. Used a lot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mean absolute deviation

A

Mean absolute deviation is the mean of all of the absolute deviation scores of a data set. An absolute deviation is the difference between the score and the mean. Used sometimes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variance

A

The variance is the mean of the mean absolute deviation scores squared. Not used much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard deviation

A

The square root of the variance. Used the most.

In general, you should expect 68% of the data to fall within 1 standard deviation of the mean, 95% of the data to fall within 2 standard deviation of the mean, and 99.7% of the data to fall within 3 standard deviations of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Standard score

A

raw score - mean, divided by standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

z-score

A

The position of a raw score in terms of its distance from the mean, when measured in standard deviation units.

The z-score is positive if the value lies above the mean, and negative if it lies below the mean. A z-score of 1 = 1SD away from the mean.

z-scores tell how an individual sits within a distribution.

17
Q

What is a statistical model?

A
  • A statistical model uses maths to summarises a dataset relative to
    multiple variables.
  • A simple description of relationships in the dataset.
  • Where descriptive statistics describe the data, inferential statistics use statistical models. These models enable you to make inferences about the data, e.g. you can decide whether two variables are associated or whether one group is bigger than the other.
  • Data = model + error/ residuals
  • We can use models to predict
    values (but these predictions will always have error around them).
18
Q

What is a good statistical model?

A
  • The smaller the error, the better the model “fit”.

You can include so many variables that you “over fit” the
model to the data, which is problematic in terms of generalisation.

19
Q

Why do we care about error?

A

It tells us that we don’t fully understand
our outcome/dependent variable.

It tells us there may be interesting factors
at play in the relationship/data under
investigation.

If we included covariates such as sex, we
may have a better chance of detecting a
significant effect.

20
Q

standardisation

A

Standardisation is the process of converting scores on different scales to a common scale. Once the standardization is done, all the features will have a mean of zero and a standard deviation of one, and thus, the same scale.

21
Q

Null hypothesis testing

A

Null hypothesis is the baseline against which we test our hypothesis of interest: that is, what would we expect the data to look like if there was no effect? The null hypothesis always involves some kind of equality, alternative inequality.

Importantly, null hypothesis testing operates under the assumption that the null hypothesis is true unless the evidence shows otherwise.

  • Step 1: Formulate a hypothesis that embodies our prediction (before
    seeing the data).
  • Step 2: Specify null and alternative/experimental hypotheses.
  • Step 3: Collect some data relevant to the hypothesis.
  • Step 4: Fit a model to the data that represents the alternative hypothesis and compute the test statistics.
  • Step 5: Compute the probability of the observed value of that statistic
    assuming that the null hypothesis is true.
  • Step 6: Assess the “statistical significance” (p value) of the result.

You should never make a decision about how to perform a hypothesis test once you have looked at the data, as this can introduce serious bias into the results.

22
Q

The p value

A

Under the assumption that the null hypothesis is true, the p value is the probability of getting a sample as or more extreme as our own.
* Is a probability.

  • It is not the probability of the null hypothesis being true.
  • It is not the probability that you are making the wrong decision.
  • It is not the probability that if you ran the study again, you would obtain
    the same result that % of the time.
  • It does not mean you found an important effect.
  • It does not reflect the size of the effect.

We should reject the null hypothesis if the p-value is less than 0.05.

23
Q

Statistical significance

A
  • Dichotomous assessment.
  • Arbitrary with historical reference point.
  • Dependent on your alpha value and your p value.
  • Please never use terms such as nearly significant,
    trend for significance, close to significant.
  • Please always report the actual p value of your
    model, not just p<.05 or p<.01.
  • Please never report p=.000, as it’s a probability, this
    means p<.001.
24
Q

The alpha value

A
  • This is predefined before you run your
    analyses.
  • Determines what you define as a “significant”
    or “not significant” effect by comparing with p
    value from statistical model.
  • Typically alpha is set at 0.05, meaning that
    p<.05 is deemed statistically significant. But
    this is entirely arbitrary!
  • In setting your alpha value you are balancing
    Type I and Type II errors.
25
Q

Type I error

A

False positive

26
Q

Type II error

A

False negative

27
Q

Effect size

A
  • As (more?) important as the p value, but often
    over looked.
  • Measure of the strength of the effect.
  • For example, sleep can have a large effect on
    memory performance or a small effect.
  • This is really important if we are looking at
    interventions or findings that may inform policy.
  • There are different ways to measure effect size,
    e.g. r2, Cohen’s d.
28
Q

Statistical power

A
  • Power = 1 − the probability of making a Type II error
  • Three factors affect power:
  • Sample size: larger samples provide greater statistical
    power.
  • Effect size: a given design will always have greater power
    to find a large effect than a small effect (because finding
    large effects is easier).
  • Type I error rate: there is a relationship between Type I
    error and power such that (all else being equal) decreasing
    Type I error will also decrease power.
29
Q

When running a statistical model, we need to
consider the key ingredients…

A
  • Our approach of null hypothesis testing.
  • Setting our alpha value before running statistical
    models.
  • The effect size and p value of models.
  • Keeping in mind Type I and II errors, and our power
    to detect effects.
30
Q

We use the terms predictor variable and outcome variable in…

A

Observational research