We can make general statements beyond specific observations. Typically done using tables or graphs.

Statistical models 1 Flashcards by Aaron Jones

Why summarise data

We can make general statements beyond specific observations.
Typically done using tables or graphs.

How well did you know this?

Not at all

Perfectly

Summarising data using tables

Frequency distributions
Cumulative distributions

How well did you know this?

Not at all

Perfectly

Summarising data using graphs

Histograms

How well did you know this?

Not at all

Perfectly

What is a distribution?

Information about the data you have for one variable.
Properties of distributions
* What the central tendency is (mean, median or mode).
* How symmetrical the data is either side of the mean (skew).
* How variable the data is (e.g. data range, standard deviation and kurtosis).
* If it’s a “normal distribution”.

How well did you know this?

Not at all

Perfectly

Central tendency (the average)

Mean: (sum of values) divided by (number of values).
Median: middle value in a list ordered from smallest to largest. 50th percentile.
Mode: most frequently occurring value on the list.

How well did you know this?

Not at all

Perfectly

Skew (symmetry of distribution)

Positive skew: tail points to right or positively

Negative: tail points to left or negatively

Normal is symetrical

How well did you know this?

Not at all

Perfectly

Kurtosis

Positive kurtosis
Leptokurtic: centre very high

Negative kurtosis
Platykurtic: centre very flat

Normal distribution
Mesokurtic: normal bell curve

How well did you know this?

Not at all

Perfectly

Normal distribution

Symetrical bell curve where mean, mode and median are close

How well did you know this?

Not at all

Perfectly

Variability

How spread out a set of data is.

How well did you know this?

Not at all

Perfectly

Range

The range of a variable is the biggest value minus the smallest value. Vulnerable to extreme scores.

How well did you know this?

Not at all

Perfectly

Interquartile range

The interquartile range (IQR) is like the range, but instead of the difference between the biggest and smallest value the difference between the 25th percentile and the 75th percentile is taken. Used a lot.

How well did you know this?

Not at all

Perfectly

Mean absolute deviation

Mean absolute deviation is the mean of all of the absolute deviation scores of a data set. An absolute deviation is the difference between the score and the mean. Used sometimes.

How well did you know this?

Not at all

Perfectly

Variance

The variance is the mean of the mean absolute deviation scores squared. Not used much.

How well did you know this?

Not at all

Perfectly

Standard deviation

The square root of the variance. Used the most.

In general, you should expect 68% of the data to fall within 1 standard deviation of the mean, 95% of the data to fall within 2 standard deviation of the mean, and 99.7% of the data to fall within 3 standard deviations of the mean.

How well did you know this?

Not at all

Perfectly

Standard score

raw score - mean, divided by standard deviation.

How well did you know this?

Not at all

Perfectly

z-score

Study These Flashcards

The position of a raw score in terms of its distance from the mean, when measured in standard deviation units.

The z-score is positive if the value lies above the mean, and negative if it lies below the mean. A z-score of 1 = 1SD away from the mean.

z-scores tell how an individual sits within a distribution.

What is a statistical model?

Study These Flashcards

A statistical model uses maths to summarises a dataset relative to
multiple variables.
A simple description of relationships in the dataset.
Where descriptive statistics describe the data, inferential statistics use statistical models. These models enable you to make inferences about the data, e.g. you can decide whether two variables are associated or whether one group is bigger than the other.
Data = model + error/ residuals
We can use models to predict
values (but these predictions will always have error around them).

What is a good statistical model?

Study These Flashcards

The smaller the error, the better the model “fit”.

You can include so many variables that you “over fit” the
model to the data, which is problematic in terms of generalisation.

Why do we care about error?

Study These Flashcards

It tells us that we don’t fully understand
our outcome/dependent variable.

It tells us there may be interesting factors
at play in the relationship/data under
investigation.

If we included covariates such as sex, we
may have a better chance of detecting a
significant effect.

standardisation

Study These Flashcards

Standardisation is the process of converting scores on different scales to a common scale. Once the standardization is done, all the features will have a mean of zero and a standard deviation of one, and thus, the same scale.

Null hypothesis testing

Study These Flashcards

Null hypothesis is the baseline against which we test our hypothesis of interest: that is, what would we expect the data to look like if there was no effect? The null hypothesis always involves some kind of equality, alternative inequality.

Importantly, null hypothesis testing operates under the assumption that the null hypothesis is true unless the evidence shows otherwise.

Step 1: Formulate a hypothesis that embodies our prediction (before
seeing the data).
Step 2: Specify null and alternative/experimental hypotheses.
Step 3: Collect some data relevant to the hypothesis.
Step 4: Fit a model to the data that represents the alternative hypothesis and compute the test statistics.
Step 5: Compute the probability of the observed value of that statistic
assuming that the null hypothesis is true.
Step 6: Assess the “statistical significance” (p value) of the result.

You should never make a decision about how to perform a hypothesis test once you have looked at the data, as this can introduce serious bias into the results.

The p value

Study These Flashcards

Under the assumption that the null hypothesis is true, the p value is the probability of getting a sample as or more extreme as our own.
* Is a probability.

It is not the probability of the null hypothesis being true.
It is not the probability that you are making the wrong decision.
It is not the probability that if you ran the study again, you would obtain
the same result that % of the time.
It does not mean you found an important effect.
It does not reflect the size of the effect.

We should reject the null hypothesis if the p-value is less than 0.05.

Statistical significance

Study These Flashcards

Dichotomous assessment.
Arbitrary with historical reference point.
Dependent on your alpha value and your p value.
Please never use terms such as nearly significant,
trend for significance, close to significant.
Please always report the actual p value of your
model, not just p<.05 or p<.01.
Please never report p=.000, as it’s a probability, this
means p<.001.

The alpha value

Study These Flashcards

This is predefined before you run your
analyses.
Determines what you define as a “significant”
or “not significant” effect by comparing with p
value from statistical model.
Typically alpha is set at 0.05, meaning that
p<.05 is deemed statistically significant. But
this is entirely arbitrary!
In setting your alpha value you are balancing
Type I and Type II errors.

Type I error

False positive

Type II error

False negative

Effect size

* As (more?) important as the p value, but often over looked. * Measure of the strength of the effect. * For example, sleep can have a large effect on memory performance or a small effect. * This is really important if we are looking at interventions or findings that may inform policy. * There are different ways to measure effect size, e.g. r2, Cohen’s d.

Statistical power

* Power = 1 − the probability of making a Type II error * Three factors affect power: * Sample size: larger samples provide greater statistical power. * Effect size: a given design will always have greater power to find a large effect than a small effect (because finding large effects is easier). * Type I error rate: there is a relationship between Type I error and power such that (all else being equal) decreasing Type I error will also decrease power.

When running a statistical model, we need to consider the key ingredients…

* Our approach of null hypothesis testing. * Setting our alpha value before running statistical models. * The effect size and p value of models. * Keeping in mind Type I and II errors, and our power to detect effects.

We use the terms predictor variable and outcome variable in...

Observational research

Statistical models 1 Flashcards

(30 cards)