S1 Theory Questions Flashcards
How is the PMCC affected by linear coding?
It’s not
PMCC (correlation) is unchanged if you code the original data
Name 3 properties of a normal distribution
1) The curve is a bell shape
2) The curve is symmetrical about the mean
3) mean=mode=median
4) Continuous
What is the key feature of a histogram? And for what type of data would you use a histogram?
The key feature of a histogram is that the area of each block is proportional to the frequency.
We use a histogram for continuous data
How do you know if 2 events are independent?
1) P(AnB) = P(A) x P(B)
2) P(A/B) = P(A)
Name 2 ways you can identify positive skew
1) The tail on a histogram is on the right / the majority of the values are bunched up on the left
2) Median closer to lower quartile than higher quartile (Q3-Q2 > Q2-Q1)
3) Pearson’s coefficient of skewness is positive
4) The median is less than the mean
To find the lower quartile from a list of discrete data values, how do you work out which value to use??
n/4
(If the answer is a decimal, round up)
(If the answer isn’t a decimal, use half way between this value and the one after)
Give 2 reasons why we would use a statistical model
1) Used to simplify or represent a real world problem
2) Cheaper or quicker or easier or more easily modified
3) To improve understanding of the real world problem
4) Used to predict outcomes from a real world problem
In probability, what is the addition rule?
P(AuB) = P(A) + P(B) - P(AnB)
Name 2 advantages of using a box plot to display data
1) shows outliers
2) shows skewness
3) shows the spread / range / IQR
4) Shows the maximum/minimum/median/quartiles
5) 2 box plots on the same scale allow us to do comparisons
Define a statistical model
A statistical process to describe or more predictions about the expected behavior of a real-world problem
Name the 7 steps of statistical modeling?
1) Recognise a real world problem
2) Devise a statistical model
3) Use model to make predictions
4) Experimental data is collected
5) Comparisons made against devised model
6) Evaluation (statistical concepts used to test how well model describes real-world problem)
7) Refine model
When might you use a back-to-back stem and leaf diagram?
To compare 2 sets of discrete data
How do you calculate Pearson’s coefficient of skewness? and how do you interpret it?
3(mean - median) / standard deviation
positive = positive skew negative = negative skew
What does it mean if a distribution is skewed? And what should you do in this situation?
If a distribution is skewed then it has extreme values. In this case it is better to use the median and IQR to describe the data because they are not affected by outliers, unlike the mean and standard deviation
How do you work out the variance of a data set
The mean of the squares minus the square of mean
In a discrete uniform distribution with x = 1,2,3,….n what are the formulae for E(X) and Var(X) ?
E(X) = (n+1)/2
Var (X) = (n+1)(n-1) / 12
How does coding affect the mean and standard deviation?
Adding or subtracting a number to the data would add or subtract that number to the mean but would not affect the standard deviation.
Multiplying or dividing the data by a number would multiply or divide the mean and standard deviation by that number.
In discrete random variables, what does F(2.4) mean?
It is the probability that X is less than or equal to 2.4
Name a disadvantage of using a statistical model
A model will never be able to cater for all the eventualities of a real life problem
When are you able to make predictions using a least squares regression line?
Predictions inside the range of data (interpolation) should be accurate, as long as there is a fairly strong linear relationship (correlation) between the 2 variables.
Extrapolation (estimating outside the range of data collected) is to be treated with caution as the linear relationship may not remain valid/
How do you standardize a normal distribution?
Subtract the mean and divide by the standard deviation
What is the formula for the probability of A given B ?
P(A/B) = P(AnB) / P(B)
How do you know if 2 events are mutually exclusive?
P(AnB) = 0 or P(AuB) = P(A) + P(B)
On a scatter graph, which variable should go on the x-axis? And what do you call the variable that goes on the y-axis?
The x-axis is for the explanatory variable (independent variable)
The y-axis is for the response (or dependent variable)