STAT 1: Basic Statistical Definitions Flashcards
1
Q
Define significance level
A
- the threshold at which we are prepared to accept a Type I error (false positive)
- when you reject a null hypothesis that is actually true
- the probability you will make the mistake of rejecting the null hypothesis when it is true
- commonly 0.05
2
Q
What is a type I error?
A
- a false positive
- rejecting a null hypothesis that is actually true
- usually what we are looking to reduce
3
Q
Define type II error
A
- a false negative
- accepting a false null hypothesis
4
Q
What are p-values?
A
- it measures the probability of getting a more extreme result, if the null hypothesis were true
- a value between 0-1
- it tells us the odds of getting our experimental data if the null hypothesis is correct
- if p-value is lower than significance level, reject H0
5
Q
Why do you reject null hypothesis is p-value is lower than significance level?
A
- let’s say p-value is 0.02 and the significance level is 0.05
- it is saying that the probability of getting the experimental result if H0 is true, is 2%
- this is a more extreme value (less likely to happen) than our significance level
6
Q
What is pseudoreplication?
A
- the process of artificially inflating the number of samples or replicates
- the degrees of freedom are over-inflated
7
Q
What are degrees of freedom?
A
- the number of independent pieces that went into calculating an estimate
- ( n - 1 )
8
Q
Name the different type of variables?
A
Continuous variables:
- can take any value
- numerical
Categorical (nominal) / Discrete (numerical) Variables:
- fit into categories
- no intrinsic ordering
Ordinal variable:
- similar to categorical but with a clear ordering of categories
- e.g. strongly agree to don’t know to disagree
- distance between options are hard to measure
9
Q
Describe the shape of this distribution
A
- positive skew
- discrete data
10
Q
What is central tendency?
A
- the tendency of the observations to centre around a particular value
- there are three measures of tendency:
- mean
- median
- mode
11
Q
What is dispersion?
A
- the size of the distribution of values
- e.g. standard deviation, IQR, variance
12
Q
What is standard deviation?
A
- the average amount all values deviate from the mean
13
Q
What is the population?
A
- it refers to all the cases that we want our inferences to apply to
14
Q
What is our sample?
A
- since it is unlikely to sample the whole population, you take a sample from the population
15
Q
What is a technical replicate?
A
- every single experiment is taken from the same sample
- e.g. taking a blood sample from one person and measuring gene expression multiple times
- gives an accurate measurement of one sample
- also tells us how accurately we are measuring the results
- if numbers are very different each time, we know not to trust any single measurement