Topic 4: Statistics and Probability of Statistics Flashcards

1
Q

What is the mean, median and mode (3)

A

-The mean is the average of all the values
-The median is the middle value
-The mode is the value which occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the variance (2)

A

-S2 = ∑(xi - x_)2/n-1
-n-1 are the degrees of freedom, the number of observations of xi you can freely choose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of degrees of freedom ()

A

-Say n = 5, and we want a mean of x_ = 16
-We can freely choose x1 = 5, x2 = 37, x3 = 4, x4 = 6
-However, to then achieve our mean of 16 our x5 has to = 28
-Hence, we have 4 degrees of freedom as we can freely choose the first 4

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are some measures of spread (4)

A

-Standard deviation is √s2
-The range is the difference between the largest and smallest value
-The IQR is the difference between the 25th and 75th percentile points
-The mean absolute deviation = ∑|xi - x_|/n (although we don’t use too much as can’t differentiate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is skewness + different values (2,3)

A

-Skewness gives a numerical measure of how asymmetric a distribution is
-∑(xi - x_)/ns3

-A value of 0 means there is a symmetric distribution
-A positive value means the distribution is positively skewed to the right
-A negative value means the distribution is negatively skewed to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is kurtosis + different values (2,3)

A

-Kurtosis gives a measure of the proportion of observations which lie in the tails of the distribution
-∑(xi - x_)/ns4

-A value of 3 = the distribution is like a normal
-A value < 3 = the distribution is more platykurtic (more flat)
-A value > 3 = the distribution is more leptokurtic (more peaked)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for sample covariance ()

A

-SX, Y = ∑((xi - x_)(yi - y_))/n-1 = (∑xiyi - nx_y_)/n-1
-The degrees of freedom is n-1 as we only need the mean of x or y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Cov(aX, Y) equal to, and what does this mean scale wise (2)

A

-cov(ax, y) = acov(x,y)
-Covariance is hence not scale free, and thus scale variance exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for correlation, and what does this mean scale wise (2)

A

-Cxy = (Cov(X, Y))/(√V(x)√V(y)) = Sxy/sssy
-This is scale free, and hence scale invariant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some properties of a correlation coefficient (5)

A
  • -1 ≤ rXY ≤ 1
    -rXY = -1 means there is a perfect negative association
    -rXY = 1 means there is a perfect positive linear association
    -rXY = 0 means there is no linear association
    -as |rXY| increases, there is a stronger association
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between an estimator and estimate (2)

A

-An estimator (θ^) of a population parameter (θ) (something we don’t know) is a random variable and is a function of the data
-An estimate is a particular realisation, or an actual value based on a specific sample of data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the conditions for an estimator to be unbiased (3)

A

-An estimator (θ^) is said to be unbiased if E(θ^) = θ
-That is if the mean of the sampling distribution of the estimator is centered on the unknown parameter
-This thus means the estimators x_ and sx2 are unbiased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when you repeatedly take the sample mean of a sample (2)

A

-If you repeatedly take the sample mean of a sample, the sample mean should be the population mean
-If E(xi) = μ, then E(X_) = E(Σxi/n) = 1/n E(Σxi) = (1/n)(E(x1) + E(X2) + …) = μ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we say that one estimator is more efficient than the other (3)

A

-We say that θ^1 is more efficient than θ^2 if V(θ^1) < V(θ^2)
-One possible measure of this is relative efficiency constructed as V(θ^1)/V(θ^2)
-In general, if we are choosing between two unbiased estimators then we choose the estimator with the smaller variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the tradeoff between unbiased and efficient statistics (2,1)

A

-Unbiased = accurate mean
-Efficient = small variance (σ2/n)

-On average, we usually pick the unbiased with the most efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a maximum likelihood estimation + toss a coin example (1,3)

A

-A maximum likelihood estimation is when you see something happen, you have parameters, and then you choose values which maximise the probability of what you saw happening

-Suppose the outcome of heads and tails is p and 1-p, you toss a coin 50 times and get 20 heads and 30 tail
-The joint probability of this happening is 50C20p20(1-p)30
-p = 0.4 is the maximum likelihood estimate for this

17
Q

What is consistency (3)

A

-Suppose θ^n is an estimator of θ for a sample of X1, …, Xn.
-Then, θ^n is a consistent estimator of θ if for every ε > 0, P(|θ^n - θ| > ε) -> 0 as n -> infinity
-The probability that the absolute difference between the estimator and the parameter being larger than the error number goes to zero as n gets bigger

18
Q

What is the central limit theorem (1)

A

-The central limit theorem is that any summation distribution will turn into a normal one with a large enough sample