Data and distributions Flashcards
Fundamental features of a dataset
Datapoints (n)
Measures of central tendency (mean, variance, SD)
Type of distribution
Key properties of the normal distribution
Continuous data
Bell shaped curve
How do we define the normal distribution?
By the mean and SD
In normally distributed data, 95% of the data lies within…
1.96 SDs of the mean
Properties of the Poisson distribution
Discrete data
As the mean gets bigger, the distribution becomes more like the normal distribution
When the mean is > ___ you can approximate the Poisson distribution with a _____ distribution
5
Normal
Poisson distributions are right-skewed for ____ means
Low
As the mean gets _____ in a Poisson distribution, it gets more like the _____ distribution
Bigger
Normal
Properties of the Binomial distribution
Discrete variables
Defined by the number of trials and the probability of one of two possible events happening
How do we define the Poisson distribution
By the mean
Mean = variance
How do we define the Binomial distribution?
Number of trials (n)
The probability of one of two events happening (p)
When p is low in the Binomial distribution, we see a _____ skewed distribution
Right
(Most of the time the event will not happen)
When p is very high in the Binomial distribution, we see a _______ skewed distribution
Left
(Most of the time the event will happen)
As p increases in the Binomial distribution, the distribution becomes more…
Bell shaped (like the normal distribution)
What sort of data is normally distributed?
Means of samples of random variables
Quantities that are the sum of many independent processes
Body temp
Brain size
What sort of data is Poisson distributed?
Counts of independently occurring events in homogenous units
E.g. Quadrat sampling
What sort of data is Binomially distributed?
Counts (or proportions) of ‘successes’ vs ‘failures’ out of a fixed number of trials, where the probability of success is constant
Number of ‘heads’ in 10 coin flips
Proportion of a population with a particular mutation
Number of adult frogs developing from fixed initial number of tadpoles
What is the 95% confidence interval for a mean?
An interval of numbers that will contain the true value of the mean 95% of the time
How to calculate standard error of the mean
standard deviation / root(n)
n = sample size
As the sample size increases, standard error gets _____
Smaller
As n increases, so does root n, so the standard deviation is being divided by a larger number and therefore gets smaller
The smaller the sample size, the _____ the confidence interval needs to be to ensure an x% chance of containing the true value of the mean
Wider
When we don’t know the standard deviation of the population precisely, we have to use ______ to calculate the confidence interval
A t-distribution
How to calculate a confidence interval
CI = mean +/- tcrit x SE
Approximate rule of thumb for 95% CI
mean +/- 2 x SE of the mean
The 95% confidence interval is _____ than the 99% confidence interval
Narrower
What is the null hypothesis?
There is no significant difference between the populations we are looking at
What is the alternative hypothesis?
The hypothesis that something will happen (there will be an effect of a drug, two groups will be different etc)
What is a two-tailed test?
A hypothesis test where we don’t have any reason to suppose that the effect will be in a particular direction
What is a one-tailed test?
A hypothesis test where we have reason to believe that the difference will be in a specific direction