Statistics Flashcards
What two things must you do when making a stemplot?
- Use equal intervals.
- Give a key with an example to show how to read it.
What are the two definitions of outliers?
- More than 1.5x the IDR above or below the upper and lower quartiles respectively.
- 2 standard deviations above or below the mean.
If there are n data points, which data point is the:
- median?
- UQ?
- LQ?
- [(n+1)/2]th point
Let m be the number of points strictly above or below the median
- [(n+1)/2] + [(m+1)/2] th point
- [(m+1)/2] th point
What are the formula for:
- s.d. of a population?
- s.d. of a sample?
- σ = Sxx/n = [Σ(xi - x)2] / n = [Σ (xi2) - nx2] / n
- s = Sxx/n - 1 = [Σ(xi - x)2] / n - 1 = [Σ (xi2) - nx2] / n - 1
What are the formulae for:
- mean
- Sxx
when using grouped data?
- x_bar = Σfixi/Σfi
- Sxx = Σ xi2fi - [(Σfixi)2/n]
- Where is the tail for a positvely skewed distribution?
- Where is the tail for a negatively skewed distribution?
- away from y axis
- close to y axis.
What are the two defining features of a normal distribution?
- Symmetrical
- Bell-shaped
If a variable X is coded to a variable Y such that y = ax + b, what are the mean and s.d. of y in terms of the mean and s.d. of x?
- y_bar = a(x_bar) + b
- sy = |a|sx
If two variables, X and Y, are combined what are the new mean and variance?
- mean = [Σx + Σy/nx + ny]
- variance = [Σx2 + Σy2/nx + ny] - (mean)2
What is the formula for:
- E(X)
- E(f(X))
- E(aX + b)
- E(f(X) + g(X))
- Var(X)
- Var(aX + b)
- i=1Σn xi P(X = xi) = μ
- i=1Σn f(xi) P(X = xi)
- a E(X) + b
- E(f(X))+ E(g(X))
- E(X2) - μ2
- a2 Var(X)
What are the formulae for:
- Var(X + Y)
- E(aX + bY)
- Var(aX + bY)
- Var(X) + Var(Y)
- a E(X) + b E(Y)
- a2 Var(X) + b2 Var(Y)
What formulae are used to calulate expectation and variance:
- for multiple observations of same variable?
- for scaling one observation by a factor?
- n E(X) & n Var(X)
- n E(X) & n2Var(X)
What are the conditions for a discrete uniform distribution?
Each value is equally likely to occur i.e. P(X= xi) = 1/n for i = 1, 2, 3, …, n
E(X) = a + [n+1/2] where a is one less than the lowest value which is included in the distribution
Var(X) = n2 - 1/12
What are the conditions for a discrete geometric distribution?
- Outcome either sucess or faliure
- Independent trial
- Prob. of success, p, is same for each trial
- X ~ Geo(p)
- P(X = r) = qr-1 x p
- Mode is always 1
- P(X =< x) = 1-qx
- E(X) = 1/p
- Var(X) q/p2
What are the conditions for the binomial model?
- Finite number of trials, n
- Each trial is a success or failure
- Prob. of success, p, is same for each trial
- Dis. rand. var. X gives no. of successful outcomes in n trials
- X ~ Bin(n,p)
- P(X = r) = nCr * qn-r * pr
- E(X) = np
- Var(X) = npq
What is mutual exclusivity?
P(A n B) = 0
P(A u B) = P(A) + P(B)
In general, what is the formula for P(A u B)?
P(A) + P(B) - P(A n B)
How can you show that the probability of event A happening is independent of event B happening?
P(B | A) = P(B n A)/P(A)
If A and B are independent, P(A n B) = P(A) x P(B)
Then: P(B | A) = P(A) x P(B)/P(A) = P(B)
How to find μ & σ when given two probabilities?
Let the given probabilities be:
P(X > a) = p1 and P(X > b) = p2
- Standardise each variable (write z as a function of a and a function of b)
- Write an expression for z in terms of φ-1 and p1, and φ-1 and p2
- Equate the equations in the first two steps.
- Eliminate μ or σ to solve for the other, then find the one you eliminated.
How can we approximate the Binomial distribution with the Normal distribution and what are the conditions that make this a good approximation (including continuity corrections)?
If X~Bin(n, p), we can make the approximation:
X~N(np, npq)
Only if np>5 and nq>5
Continuity corrections:
If inclusive boundary, normal range goes to ±0.5 above or below the upper/lower boundaries respectively.
If exclusive boundary, the normal range goes to ∓ below or above the upper/ lower boundaries respectively.
What are the steps of a normal/binomial hypothesis test?
- Define variable, stating n but keeping p as a variable, as well as assumptions leading to trials being independent.
- State hypotheses & distribution according to H0.
- State level (%) and type (one/two-tailed) as well as rejection criterion e.g. ‘The test value, x, will lie in the critical region if P(X >= x) < 5%.’
- Calulate required probability and make conclusion.
When is taking a census impractical and why?
When the pop. size is large.
Time consuming and expensive and difficult to do with accuracy
What are the advantages of taking a sample survey?
Can get data quickly and cheaply
Can give accurate indications if sample is representative
What are some sources of bias?
- Bad sampling frame
- Wrong sampling unit
- Non-response by some of the units
- Bias from person conducting survey