Statistical Inference: Section 1&2 Flashcards

1
Q

Define a random sample.(2)

A

all members of the population have the same chance of being included in the sample
all combinations of, say,nmembers have the same chance of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does i.i.d stand for?(1)

A

Independent and identically distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name common continuous distributions.(4)

A

Normal distribution
Standard normal distribution (special type of normal where mean=0,var=1)
Exponential
Uniform distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name common discrete distributions. Difference between these?(3)

A

Binomial

Poisson (main difference to binomial is there is no upper limit on this eg “out of” like in binomial)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Bernoulli trial?(1)

A

Only 2 outcomes to the experiment, has binomial distribution if n fixed (no of experiments), constant of p probability and independent trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you generate a random sample of 10 from N(100, 15^2) on R?(1)
*note N=normally distributed,
The mean?
Variance?(3)

A

1)normal_sample_1 = rnorm(10,100,15)
generates the sample and places it under “normal_sample_1”.
2)mean_1 = mean(normal_sample_1), storing it as “mean_1”
3)var_1 = var(normal_sample_1), storing as “var_1”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is simulation?(1)

A

Using samples of a known population to test the means of the estimates, hence when it comes to unknown populations we have greater confidence in these estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How would you generate a poisson sample of 20 from Po(4) on R?(1)
The mean?
Variance?(3)

A

> poisson_sample = rpois(20,4)
poisson_mean = mean(poisson_sample)
poisson_var = var(poisson_sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Binomial distribution where the number of trials is equal to 100, and the success probability is equal to 0.5. Sample mean? (2)
Note this is equivalent to tossing a fair coin 100 times and counting the number of heads.

A

> binomial_sample = rbinom(1,100,0.5)

> binomial_sample[1] 46

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is exploratory data analysis?(1)

A

If we want to try and assess how well a particular probability distribution might work as a model for some data, we need to have a look at the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a 5 number summary?How do you find this in R?What about if you wanted the mean too?(3)

A

Min, LQ, Median, UQ, Max
quantile(x) gives five number summary
summary(x) gives this and the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define the sample mean.(1)

A

For a sample consisting of n observations x1,x2,…,xn, the sample mean ̄x is defined as the arithmetic mean of the observations, i.e. ̄x=1/n*∑Xi–>n i=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does W denote?(1)

A

Theθj, j= 1,…,pwill belong to a set of valuesW, called the parameter space, and soXis a member of a family of distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define sample variance.(1)

A

For a sample consisting of n observations x1,x2,…,xn, the sample variance s^2 is defined as s^2=1/(n−1)∑i=1 to n for (xi− ̄x)^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Difference between estimator and estimate?(1)

A

This process, which we can repeat, is our estimator, and any particular sample gives us an estimate.

Estimators are conclusions drawn about the population from a sample ie a sample mean would be an estimator for the population mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a sample statistic?What is special about these?(2)

A

Any particular function defined on the random sample,

note that a sample statistic is a function of random variables, and so itself is a random variable with its own distribution, an expectation and a variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Important properties of estimators.(3)

A

1)Bias, represented as E[theta-hat]-theta
theta-hat=estimate, smaller bias=better, 0 bias=unbiased estimator
2)Variance, smaller the better, if optimally small (achieves a theoretically lower limit) is known as “efficient”
3)BEST MEASURE- as involves bias and variance-Mean square error (MSE) is E[(theta-hat - theta)^2] combines both bias and variance, when not possible to minimise both bias and variance simultaneously then minimising MSE is a good compromise.

18
Q

How to work out probabilities from a normal distribution using R?(1)

A

Use the pnorm function
pnorm(x, mean=, sd=, lower.tail=TRUE/FALSE)

true means< or equal
false means> .

19
Q

How would you do the following question in r:

What proportion of trusts have N1 more than 20% higher than E1, ie more than 1.2×E1? (1)

A

length(which(hospitals$N1>1.2*hospitals$E1))

20
Q

How would you do the following question in r:
Suppose trusts are independent and a further sample of 10 trusts is selected randomly.
Let W be the number of new trusts that have N1 more than 20% higher than E1. Use
your answer to the previous question (0.16) to estimate
(a) Pr(W = 0), [3dp] ( 10 marks)
(b) Var(W).

A

Binomial distribution as INDEPENDENT and TWO outcomes

a) pbinom(0,10,0.16)
b) Variance=np(1-p).

21
Q

What is the POPULATION expectation of an exponential random variable?(1)

A

1/lambda

22
Q

What would be the parameter space for the normal distribution?For exponential?(2)

A

{(μ,σ2) :μ∈R,σ >0}.

W={λ:λ >0}.

23
Q

What is a parameter?(1)

A

A numerical summary of a POPULATION/distribution, usually unknown eg if total cars in US was population, total number of red cars would be parameter as unknown and difficult to know.

24
Q

What is a statistic?(1)

A

Summary of data/sample function of SAMPLE

25
Q

What is a sample statistic?(1)

A

Any function of the of the random SAMPLE
Note: A sample statistic is a function of random variables, and so itself is a random variable with its own distribution, an expectation and a variance.

26
Q

Trade off between bias and variance, give example.(1)

A

Can trade of bias for a smaller variance, think of 2 archers one with no bias but lots of variance and one with a little bias but no variance, the second archer would be better.

27
Q

For the following independent distributions X~Po(lambda) Y~Po(2lambda), classify the following as statistics, parameters or neither:

a) X+2Y
b) lambdahat=function of X&Y
c) lambdahat/lambda
d) Pr

A

a) A function of random variables so statistic
b) Statistic
c) Neither, stat and parameter
d) Parameter.

28
Q

Poisson distribution expected value, variance, parameter.(1)

A

All the same.

Eg X~Po(lambda) then E[X]=lambda=Var[X]

29
Q

Variance of the sum of INDEPENDENT random variables

Var(aX+bY)=

A

a^2Var(X)+b^2Var(Y).

30
Q

If you have an unbiased sample what is the MSE equal to?(1)

A

The variance.

31
Q

For the following independent distributions X~Po(lambda) Y~Po(2lambda) what estimator would you recommend?

See week 2 online for full example.

A

Ideally want unbiased estimator so w1X+w2Y=lambda (the target)
Therefore MSE=variance
plug these in and get quadratic for w2 solve for minimum and get solutions.

32
Q

What is the Central Limit Theorem?Importance?(4)

A

Xn−→dN(μ,σ2/n) as n→∞.
Ie, for set of iid variables X1,X2,X3… there is a mean mu and variance sigma^2, Xbar defined as mean of first n terms (note this means Xbar is its own random variable woth distribution)

Central limit basically states that irrespective of the distribution, for n increasing it will tend to a normal distribution with variance sigma^2/n and mean of the mu

OR
If you standardise (ie subtract mean divide by sd) you get normal distribution now as n tends to infinity this converges to N~(0,1) ie normal with 0 mean and variance 1.

Can approximate, if exam q doesn’t mention distribution use this!

33
Q

In the CLT what does the little d on the arrow mean in the formula?(1)

A

Means convergence of the distribution ie is changing from previous to the normal.
Also means you have a random limit

34
Q

Large sample (asymptotic) properties of estimators.(3)

A

1) Asymtotic unbiasedness (E[ˆθ]→θ as n→ ∞) Note that if ˆθ is unbiased for all n, thenE[ˆθ] =θ, and it automatically follows that ˆθ is asymptotically unbiased
2) Consistency: met by 2 things:
a) Need to be asymptotic unbiased
b) V ar[ˆθ]→0 as n→∞
3)Asymptotic efficiency-said to be this if:
Variance converges to theoretical lower limit as n tends to infinity ie (actual variance / lower limit)→1 as n→∞.

35
Q

Incorrect question on assignment correction:

Suppose Y is an exponential random variable with expected value equal to the median of |N2 - E2|. What is the variance ofY?[2dp] ( 10 marks)

A

Answer correction:

The Exp(λ) distribution has expected value 1/λ, variance 1/λ2 and cumulative distribution function

((F(x) = Pr(X \leq x) = \left{ \begin{array}{ll} 0 & \hspace*{0.5cm} x<0\ 1- e^{-\lambda x} & \end{array}))

> lambda=1/median(tem)
round(1/lambda^2,2)
[1] 14.44

36
Q

When do you refer to pdf and when for pmf?(1)

A

R has functions for the uniform, exponential and normal distributions, where we now refer to the
probability density function, rather than probability mass function for discrete distributions like poisson and binomial.

37
Q

When calculating confidence intervals in r how do you do it?(1)

A

For normal random use qnorm function and use probability with half the amount wanted ie
for 95 confidence interval use qnorm(0.975) note 2.5 is half the 5 wanted
for 90% CI use qnorm(0.95)
for 99% CI use qnorm(0.995)

38
Q

If asked to calculate a sample size given a required confidence interval, how would you do this?(1)

A

Make it the difference bertween the confidence intervals, winds up being 2* the critical value*sd including n and you rearrange to find n, rounding up to get a smaller interval.

39
Q

Variance for sample statistic mean and regular?(1)

A

sigma^2 is regular

sigma^2/n is the variance for the mean of a sample.

40
Q

What is the pdf of an exponential random variable?The cdf?(2)

A

fx(x)=lambdae^(-lambdax)

cdf=Fx(x)=1-e^(-lambda*x)