STAT 1: Basic Statistical Definitions Flashcards

1
Q

Define significance level

A
  • the threshold at which we are prepared to accept a Type I error (false positive)
  • when you reject a null hypothesis that is actually true
  • the probability you will make the mistake of rejecting the null hypothesis when it is true
  • commonly 0.05
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a type I error?

A
  • a false positive
  • rejecting a null hypothesis that is actually true
  • usually what we are looking to reduce
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define type II error

A
  • a false negative
  • accepting a false null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are p-values?

A
  • it measures the probability of getting a more extreme result, if the null hypothesis were true
  • a value between 0-1
  • it tells us the odds of getting our experimental data if the null hypothesis is correct
  • if p-value is lower than significance level, reject H0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do you reject null hypothesis is p-value is lower than significance level?

A
  • let’s say p-value is 0.02 and the significance level is 0.05
  • it is saying that the probability of getting the experimental result if H0 is true, is 2%
  • this is a more extreme value (less likely to happen) than our significance level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is pseudoreplication?

A
  • the process of artificially inflating the number of samples or replicates
  • the degrees of freedom are over-inflated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are degrees of freedom?

A
  • the number of independent pieces that went into calculating an estimate
  • ( n - 1 )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name the different type of variables?

A

Continuous variables:

  • can take any value
  • numerical

Categorical (nominal) / Discrete (numerical) Variables:

  • fit into categories
  • no intrinsic ordering

Ordinal variable:

  • similar to categorical but with a clear ordering of categories
  • e.g. strongly agree to don’t know to disagree
  • distance between options are hard to measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the shape of this distribution

A
  • positive skew
  • discrete data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is central tendency?

A
  • the tendency of the observations to centre around a particular value
  • there are three measures of tendency:
  • mean
  • median
  • mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is dispersion?

A
  • the size of the distribution of values
  • e.g. standard deviation, IQR, variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is standard deviation?

A
  • the average amount all values deviate from the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the population?

A
  • it refers to all the cases that we want our inferences to apply to
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is our sample?

A
  • since it is unlikely to sample the whole population, you take a sample from the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a technical replicate?

A
  • every single experiment is taken from the same sample
  • e.g. taking a blood sample from one person and measuring gene expression multiple times
  • gives an accurate measurement of one sample
  • also tells us how accurately we are measuring the results
  • if numbers are very different each time, we know not to trust any single measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a biological replicate?

A
  • each measurement comes from a different sample that comes from a different individual
  • e.g. tells us about the gene expression in a group
17
Q

What is the difference between biological and technical replicate?

A
  • technical replicates test the variability of the testing protocol
  • biological replicates test the variability between samples
18
Q

What is standard error?

A
  • the standard deviation of the sample means
  • sample means = means from different samples
19
Q

What is the difference between standard error and standard deviation?

A
  • standard error quantifies the variations in the means from multiple set of measurements
  • whereas standard deviation quantifies the variation within a set of measurements
  • confusingly, it is possible to estimate standard error from a single set of measurements
20
Q

How does increasing sample size affect standard error?

A
  • as sample size gets larger, dispersion gets smaller
  • this means the mean gets closer to the population mean
  • they have an inversely proportional relationship
  • as sample size increases, standard error decreases
21
Q

Define 95% confidence interval

A
  • if the experiment is repeated many times and 95% confidence intervals are taken for each sample, 95% of the intervals will contain the population mean
22
Q

What type of data is the mode used for?

A
  • used in categorical data
23
Q

What type of data is the mean used for and why?

A
  • used for continuous data
  • useful as it is fairly stable from one sample to another
24
Q

What type of data is the median used for and why?

A
  • continuous data
  • preferred in distribution where there are a few extreme values