Distributions, Sample & Populations Flashcards
What is a continuous variable?
A variable where values can change
e.g. temperature could be 4C. 10.34C or -0.0000513C
What is a discrete variable?
A numbered variable that has a fixed set of values
e.g. number of cars shown
What is a histogram?
- Visualise the distribution of a dataset
What do histograms for continuous data look like?
X- axis is split into “bins”
Each bin covers a set range
- Increasing the number of bins in the histogram gives more resolution
- Fewer bins are less noisy but tend to be less informative
What are the 4 distribution shape metrics?
- Mean
- Variance
- Skewness
- Ketosis
Describe the mean in terms of distribution shape
Shape stays the same but the centre of mass shifts
Describe variance in terms of distribution shape
stretches or compresses the data
Describe skewness in terms of distribution shape
negative skewness will have a long tail
Describe ketosis in terms of distribution shape
Effects the peak (high = sharp peak)
What do you use to test for normal distributions
Shapiro Wilk W
Shapiro Wilk P
How does Shapiro Wilk W test for normal distributions
- Testing the null hypothesis that our data is normally distributed
If the test is non-significant = normal
If the test is significant = then the distribution is significantly different from a normal distribution.
So higher values indicate more normal data
What does the Shapiro Wilk P test do?
A probability indicating how significant any difference from normality is
How does sampling work?
- we can’t test everyone, we can only take a sample from our sample population
- The issue is that our populations aren’t heterogeneous. There will be lots of additional variability that we can’t control
How is the standard error of a mean used to make inferences about a sample of data?
How well do we believe our data sample can be used to approximate to our population
What is the calculation for the standard error of a mean
Sample mean = (sum if all individual data points) divided by (total number of data points)
Sample standard deviation = the square root of (the sum of the squared difference between the sample mean and each individual data point) divided by (total number of data points)