Statistical Inference: Section 1&2 Flashcards
Define a random sample.(2)
all members of the population have the same chance of being included in the sample
all combinations of, say,nmembers have the same chance of being included in the sample.
What does i.i.d stand for?(1)
Independent and identically distributed.
Name common continuous distributions.(4)
Normal distribution
Standard normal distribution (special type of normal where mean=0,var=1)
Exponential
Uniform distribution.
Name common discrete distributions. Difference between these?(3)
Binomial
Poisson (main difference to binomial is there is no upper limit on this eg “out of” like in binomial)
What is a Bernoulli trial?(1)
Only 2 outcomes to the experiment, has binomial distribution if n fixed (no of experiments), constant of p probability and independent trials.
How would you generate a random sample of 10 from N(100, 15^2) on R?(1)
*note N=normally distributed,
The mean?
Variance?(3)
1)normal_sample_1 = rnorm(10,100,15)
generates the sample and places it under “normal_sample_1”.
2)mean_1 = mean(normal_sample_1), storing it as “mean_1”
3)var_1 = var(normal_sample_1), storing as “var_1”.
What is simulation?(1)
Using samples of a known population to test the means of the estimates, hence when it comes to unknown populations we have greater confidence in these estimates.
How would you generate a poisson sample of 20 from Po(4) on R?(1)
The mean?
Variance?(3)
> poisson_sample = rpois(20,4)
poisson_mean = mean(poisson_sample)
poisson_var = var(poisson_sample).
Binomial distribution where the number of trials is equal to 100, and the success probability is equal to 0.5. Sample mean? (2)
Note this is equivalent to tossing a fair coin 100 times and counting the number of heads.
> binomial_sample = rbinom(1,100,0.5)
> binomial_sample[1] 46
What is exploratory data analysis?(1)
If we want to try and assess how well a particular probability distribution might work as a model for some data, we need to have a look at the data.
What is a 5 number summary?How do you find this in R?What about if you wanted the mean too?(3)
Min, LQ, Median, UQ, Max
quantile(x) gives five number summary
summary(x) gives this and the mean.
Define the sample mean.(1)
For a sample consisting of n observations x1,x2,…,xn, the sample mean ̄x is defined as the arithmetic mean of the observations, i.e. ̄x=1/n*∑Xi–>n i=1
What does W denote?(1)
Theθj, j= 1,…,pwill belong to a set of valuesW, called the parameter space, and soXis a member of a family of distributions.
Define sample variance.(1)
For a sample consisting of n observations x1,x2,…,xn, the sample variance s^2 is defined as s^2=1/(n−1)∑i=1 to n for (xi− ̄x)^2.
Difference between estimator and estimate?(1)
This process, which we can repeat, is our estimator, and any particular sample gives us an estimate.
Estimators are conclusions drawn about the population from a sample ie a sample mean would be an estimator for the population mean.
What is a sample statistic?What is special about these?(2)
Any particular function defined on the random sample,
note that a sample statistic is a function of random variables, and so itself is a random variable with its own distribution, an expectation and a variance.
Important properties of estimators.(3)
1)Bias, represented as E[theta-hat]-theta
theta-hat=estimate, smaller bias=better, 0 bias=unbiased estimator
2)Variance, smaller the better, if optimally small (achieves a theoretically lower limit) is known as “efficient”
3)BEST MEASURE- as involves bias and variance-Mean square error (MSE) is E[(theta-hat - theta)^2] combines both bias and variance, when not possible to minimise both bias and variance simultaneously then minimising MSE is a good compromise.
How to work out probabilities from a normal distribution using R?(1)
Use the pnorm function
pnorm(x, mean=, sd=, lower.tail=TRUE/FALSE)
true means< or equal
false means> .
How would you do the following question in r:
What proportion of trusts have N1 more than 20% higher than E1, ie more than 1.2×E1? (1)
length(which(hospitals$N1>1.2*hospitals$E1))
How would you do the following question in r:
Suppose trusts are independent and a further sample of 10 trusts is selected randomly.
Let W be the number of new trusts that have N1 more than 20% higher than E1. Use
your answer to the previous question (0.16) to estimate
(a) Pr(W = 0), [3dp] ( 10 marks)
(b) Var(W).
Binomial distribution as INDEPENDENT and TWO outcomes
a) pbinom(0,10,0.16)
b) Variance=np(1-p).
What is the POPULATION expectation of an exponential random variable?(1)
1/lambda
What would be the parameter space for the normal distribution?For exponential?(2)
{(μ,σ2) :μ∈R,σ >0}.
W={λ:λ >0}.
What is a parameter?(1)
A numerical summary of a POPULATION/distribution, usually unknown eg if total cars in US was population, total number of red cars would be parameter as unknown and difficult to know.
What is a statistic?(1)
Summary of data/sample function of SAMPLE