Distributions, Sample & Populations Flashcards

Question 1

Q

What is a continuous variable?

Answer

A

A variable where values can change

e.g. temperature could be 4C. 10.34C or -0.0000513C

Question 2

Q

What is a discrete variable?

Answer

A

A numbered variable that has a fixed set of values

e.g. number of cars shown

Question 3

Q

What is a histogram?

Answer

A

Visualise the distribution of a dataset

Question 4

Q

What do histograms for continuous data look like?

Answer

A

X- axis is split into “bins”
Each bin covers a set range

Increasing the number of bins in the histogram gives more resolution
Fewer bins are less noisy but tend to be less informative

Question 5

Q

What are the 4 distribution shape metrics?

Answer

A

Mean
Variance
Skewness
Ketosis

Question 6

Q

Describe the mean in terms of distribution shape

Answer

A

Shape stays the same but the centre of mass shifts

Question 7

Q

Describe variance in terms of distribution shape

Answer

A

stretches or compresses the data

Question 8

Q

Describe skewness in terms of distribution shape

Answer

A

negative skewness will have a long tail

Question 9

Q

Describe ketosis in terms of distribution shape

Answer

A

Effects the peak (high = sharp peak)

Question 10

Q

What do you use to test for normal distributions

Answer

A

Shapiro Wilk W
Shapiro Wilk P

Question 11

Q

How does Shapiro Wilk W test for normal distributions

Answer

A

Testing the null hypothesis that our data is normally distributed

If the test is non-significant = normal

If the test is significant = then the distribution is significantly different from a normal distribution.
So higher values indicate more normal data

Question 12

Q

What does the Shapiro Wilk P test do?

Answer

A

A probability indicating how significant any difference from normality is

Question 13

Q

How does sampling work?

Answer

A

we can’t test everyone, we can only take a sample from our sample population
The issue is that our populations aren’t heterogeneous. There will be lots of additional variability that we can’t control

Question 14

Q

How is the standard error of a mean used to make inferences about a sample of data?

Answer

A

How well do we believe our data sample can be used to approximate to our population

Question 15

Q

What is the calculation for the standard error of a mean

Answer

A

Sample mean = (sum if all individual data points) divided by (total number of data points)

Sample standard deviation = the square root of (the sum of the squared difference between the sample mean and each individual data point) divided by (total number of data points)