S1 Theory Questions Flashcards

1
Q

How is the PMCC affected by linear coding?

A

It’s not

PMCC (correlation) is unchanged if you code the original data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name 3 properties of a normal distribution

A

1) The curve is a bell shape
2) The curve is symmetrical about the mean
3) mean=mode=median
4) Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the key feature of a histogram? And for what type of data would you use a histogram?

A

The key feature of a histogram is that the area of each block is proportional to the frequency.
We use a histogram for continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you know if 2 events are independent?

A

1) P(AnB) = P(A) x P(B)

2) P(A/B) = P(A)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name 2 ways you can identify positive skew

A

1) The tail on a histogram is on the right / the majority of the values are bunched up on the left
2) Median closer to lower quartile than higher quartile (Q3-Q2 > Q2-Q1)
3) Pearson’s coefficient of skewness is positive
4) The median is less than the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

To find the lower quartile from a list of discrete data values, how do you work out which value to use??

A

n/4
(If the answer is a decimal, round up)
(If the answer isn’t a decimal, use half way between this value and the one after)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Give 2 reasons why we would use a statistical model

A

1) Used to simplify or represent a real world problem
2) Cheaper or quicker or easier or more easily modified
3) To improve understanding of the real world problem
4) Used to predict outcomes from a real world problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In probability, what is the addition rule?

A

P(AuB) = P(A) + P(B) - P(AnB)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name 2 advantages of using a box plot to display data

A

1) shows outliers
2) shows skewness
3) shows the spread / range / IQR
4) Shows the maximum/minimum/median/quartiles
5) 2 box plots on the same scale allow us to do comparisons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define a statistical model

A

A statistical process to describe or more predictions about the expected behavior of a real-world problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name the 7 steps of statistical modeling?

A

1) Recognise a real world problem
2) Devise a statistical model
3) Use model to make predictions
4) Experimental data is collected
5) Comparisons made against devised model
6) Evaluation (statistical concepts used to test how well model describes real-world problem)
7) Refine model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When might you use a back-to-back stem and leaf diagram?

A

To compare 2 sets of discrete data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate Pearson’s coefficient of skewness? and how do you interpret it?

A

3(mean - median) / standard deviation

positive = positive skew 
negative = negative skew
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean if a distribution is skewed? And what should you do in this situation?

A

If a distribution is skewed then it has extreme values. In this case it is better to use the median and IQR to describe the data because they are not affected by outliers, unlike the mean and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you work out the variance of a data set

A

The mean of the squares minus the square of mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a discrete uniform distribution with x = 1,2,3,….n what are the formulae for E(X) and Var(X) ?

A

E(X) = (n+1)/2

Var (X) = (n+1)(n-1) / 12

17
Q

How does coding affect the mean and standard deviation?

A

Adding or subtracting a number to the data would add or subtract that number to the mean but would not affect the standard deviation.
Multiplying or dividing the data by a number would multiply or divide the mean and standard deviation by that number.

18
Q

In discrete random variables, what does F(2.4) mean?

A

It is the probability that X is less than or equal to 2.4

19
Q

Name a disadvantage of using a statistical model

A

A model will never be able to cater for all the eventualities of a real life problem

20
Q

When are you able to make predictions using a least squares regression line?

A

Predictions inside the range of data (interpolation) should be accurate, as long as there is a fairly strong linear relationship (correlation) between the 2 variables.
Extrapolation (estimating outside the range of data collected) is to be treated with caution as the linear relationship may not remain valid/

21
Q

How do you standardize a normal distribution?

A

Subtract the mean and divide by the standard deviation

22
Q

What is the formula for the probability of A given B ?

A

P(A/B) = P(AnB) / P(B)

23
Q

How do you know if 2 events are mutually exclusive?

A
P(AnB) = 0 
or P(AuB) = P(A) + P(B)
24
Q

On a scatter graph, which variable should go on the x-axis? And what do you call the variable that goes on the y-axis?

A

The x-axis is for the explanatory variable (independent variable)
The y-axis is for the response (or dependent variable)

25
Describe and give an example of a discrete uniform distribution
A discrete uniform distribution is a probability distribution when the same probability of all outcomes occurring (eg rolling a fair die)
26
How do you estimate the median from a table of grouped continuous data
The median is the n/2th value. Calculate the cumulative frequency to find what class the median lies in. Use interpolation within that class to estimate the median. Ensure you use the correct class boundaries