Topic 4: Statistics and Probability of Statistics Flashcards

Question 1

Q

What is the mean, median and mode (3)

Answer

A

-The mean is the average of all the values
-The median is the middle value
-The mode is the value which occurs most frequently

Question 2

Q

What is the variance (2)

Answer

A

-S² = ∑(x_i - x^_)²/n-1
-n-1 are the degrees of freedom, the number of observations of x_i you can freely choose

Question 3

Q

What is an example of degrees of freedom ()

Answer

A

-Say n = 5, and we want a mean of x^_ = 16
-We can freely choose x₁ = 5, x₂ = 37, x₃ = 4, x₄ = 6
-However, to then achieve our mean of 16 our x₅ has to = 28
-Hence, we have 4 degrees of freedom as we can freely choose the first 4

Question 4

Q

What are some measures of spread (4)

Answer

A

-Standard deviation is √s²
-The range is the difference between the largest and smallest value
-The IQR is the difference between the 25th and 75th percentile points
-The mean absolute deviation = ∑|x_i - x^_|/n (although we don’t use too much as can’t differentiate)

Question 5

Q

What is skewness + different values (2,3)

Answer

A

-Skewness gives a numerical measure of how asymmetric a distribution is
-∑(x_i - x^_)/ns³

-A value of 0 means there is a symmetric distribution
-A positive value means the distribution is positively skewed to the right
-A negative value means the distribution is negatively skewed to the left

Question 6

Q

What is kurtosis + different values (2,3)

Answer

A

-Kurtosis gives a measure of the proportion of observations which lie in the tails of the distribution
-∑(x_i - x^_)/ns⁴

-A value of 3 = the distribution is like a normal
-A value < 3 = the distribution is more platykurtic (more flat)
-A value > 3 = the distribution is more leptokurtic (more peaked)

Question 7

Q

What is the formula for sample covariance ()

Answer

A

-S_{X, Y} = ∑((x_i - x^_)(y_i - y^_))/n-1 = (∑x_iy_i - nx^_y^_)/n-1
-The degrees of freedom is n-1 as we only need the mean of x or y

Question 8

Q

What is Cov(aX, Y) equal to, and what does this mean scale wise (2)

Answer

A

-cov(ax, y) = acov(x,y)
-Covariance is hence not scale free, and thus scale variance exists

Question 9

Q

What is the formula for correlation, and what does this mean scale wise (2)

Answer

A

-C_xy = (Cov(X, Y))/(√V(x)√V(y)) = S_xy/s_ss_y
-This is scale free, and hence scale invariant

Question 10

Q

What are some properties of a correlation coefficient (5)

Answer

A

-1 ≤ r_XY ≤ 1
-r_XY = -1 means there is a perfect negative association
-r_XY = 1 means there is a perfect positive linear association
-r_XY = 0 means there is no linear association
-as |r_XY| increases, there is a stronger association

Question 11

Q

What is the difference between an estimator and estimate (2)

Answer

A

-An estimator (θ^{^}) of a population parameter (θ) (something we don’t know) is a random variable and is a function of the data
-An estimate is a particular realisation, or an actual value based on a specific sample of data points

Question 12

Q

What are the conditions for an estimator to be unbiased (3)

Answer

A

-An estimator (θ^{^}) is said to be unbiased if E(θ^{^}) = θ
-That is if the mean of the sampling distribution of the estimator is centered on the unknown parameter
-This thus means the estimators x^_ and s_x² are unbiased

Question 13

Q

What happens when you repeatedly take the sample mean of a sample (2)

Answer

A

-If you repeatedly take the sample mean of a sample, the sample mean should be the population mean
-If E(x_i) = μ, then E(X^_) = E(Σx_i/n) = 1/n E(Σx_i) = (1/n)(E(x₁) + E(X₂) + …) = μ

Question 14

Q

How do we say that one estimator is more efficient than the other (3)

Answer

A

-We say that θ^{^}₁ is more efficient than θ^{^}₂ if V(θ^{^}₁) < V(θ^{^}₂)
-One possible measure of this is relative efficiency constructed as V(θ^{^}₁)/V(θ^{^}₂)
-In general, if we are choosing between two unbiased estimators then we choose the estimator with the smaller variance

Question 15

Q

What is the tradeoff between unbiased and efficient statistics (2,1)

Answer

A

-Unbiased = accurate mean
-Efficient = small variance (σ²/n)

-On average, we usually pick the unbiased with the most efficiency

Question 16

Q

What is a maximum likelihood estimation + toss a coin example (1,3)

Answer

A

-A maximum likelihood estimation is when you see something happen, you have parameters, and then you choose values which maximise the probability of what you saw happening

-Suppose the outcome of heads and tails is p and 1-p, you toss a coin 50 times and get 20 heads and 30 tail
-The joint probability of this happening is ⁵⁰C₂₀p²⁰(1-p)³⁰
-p = 0.4 is the maximum likelihood estimate for this

Question 17

Q

What is consistency (3)

Answer

A

-Suppose θ^{^}_n is an estimator of θ for a sample of X₁, …, X_n.
-Then, θ^{^}_n is a consistent estimator of θ if for every ε > 0, P(|θ^{^}_n - θ| > ε) -> 0 as n -> infinity
-The probability that the absolute difference between the estimator and the parameter being larger than the error number goes to zero as n gets bigger

Question 18

Q

What is the central limit theorem (1)

Answer

A

-The central limit theorem is that any summation distribution will turn into a normal one with a large enough sample

Question 19

Q