Statistics Flashcards

1
Q

What two things must you do when making a stemplot?

A
  1. Use equal intervals.
  2. Give a key with an example to show how to read it.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two definitions of outliers?

A
  1. More than 1.5x the IDR above or below the upper and lower quartiles respectively.
  2. 2 standard deviations above or below the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If there are n data points, which data point is the:

  1. median?
  2. UQ?
  3. LQ?
A
  1. [(n+1)/2]th point

Let m be the number of points strictly above or below the median

  1. [(n+1)/2] + [(m+1)/2] th point
  2. [(m+1)/2] th point
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the formula for:

  1. s.d. of a population?
  2. s.d. of a sample?
A
  1. σ = Sxx/n = [Σ(xi - x)2] / n = [Σ (xi2) - nx2] / n
  2. s = Sxx/n - 1 = [Σ(xi - x)2] / n - 1 = [Σ (xi2) - nx2] / n - 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the formulae for:

  1. mean
  2. Sxx

when using grouped data?

A
  1. x_bar = Σfixi/Σfi
  2. Sxx = Σ xi2fi - [(Σfixi)2/n]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q
  1. Where is the tail for a positvely skewed distribution?
  2. Where is the tail for a negatively skewed distribution?
A
  1. away from y axis
  2. close to y axis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two defining features of a normal distribution?

A
  1. Symmetrical
  2. Bell-shaped
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If a variable X is coded to a variable Y such that y = ax + b, what are the mean and s.d. of y in terms of the mean and s.d. of x?

A
  1. y_bar = a(x_bar) + b
  2. sy = |a|sx
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If two variables, X and Y, are combined what are the new mean and variance?

A
  • mean = [Σx + Σy/nx + ny]
  • variance = [Σx2 + Σy2/nx + ny] - (mean)2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the formula for:

  • E(X)
  • E(f(X))
  • E(aX + b)
  • E(f(X) + g(X))
  • Var(X)
  • Var(aX + b)
A
  • i=1Σn xi P(X = xi) = μ
  • i=1Σn f(xi) P(X = xi)
  • a E(X) + b
  • E(f(X))+ E(g(X))
  • E(X2) - μ2
  • a2 Var(X)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the formulae for:

  • Var(X + Y)
  • E(aX + bY)
  • Var(aX + bY)
A
  • Var(X) + Var(Y)
  • a E(X) + b E(Y)
  • a2 Var(X) + b2 Var(Y)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What formulae are used to calulate expectation and variance:

  1. for multiple observations of same variable?
  2. for scaling one observation by a factor?
A
  1. n E(X) & n Var(X)
  2. n E(X) & n2Var(X)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the conditions for a discrete uniform distribution?

A

Each value is equally likely to occur i.e. P(X= xi) = 1/n for i = 1, 2, 3, …, n

E(X) = a + [n+1/2] where a is one less than the lowest value which is included in the distribution

Var(X) = n2 - 1/12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the conditions for a discrete geometric distribution?

A
  • Outcome either sucess or faliure
  • Independent trial
  • Prob. of success, p, is same for each trial
  • X ~ Geo(p)
  • P(X = r) = qr-1 x p
  • Mode is always 1
  • P(X =< x) = 1-qx
  • E(X) = 1/p
  • Var(X) q/p2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the conditions for the binomial model?

A
  • Finite number of trials, n
  • Each trial is a success or failure
  • Prob. of success, p, is same for each trial
  • Dis. rand. var. X gives no. of successful outcomes in n trials
  • X ~ Bin(n,p)
  • P(X = r) = nCr * qn-r * pr
  • E(X) = np
  • Var(X) = npq
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is mutual exclusivity?

A

P(A n B) = 0

P(A u B) = P(A) + P(B)

17
Q

In general, what is the formula for P(A u B)?

A

P(A) + P(B) - P(A n B)

18
Q

How can you show that the probability of event A happening is independent of event B happening?

A

P(B | A) = P(B n A)/P(A)

If A and B are independent, P(A n B) = P(A) x P(B)

Then: P(B | A) = P(A) x P(B)/P(A) = P(B)

19
Q

How to find μ & σ when given two probabilities?

A

Let the given probabilities be:

P(X > a) = p1 and P(X > b) = p2

  1. Standardise each variable (write z as a function of a and a function of b)
  2. Write an expression for z in terms of φ-1 and p1, and φ-1 and p2
  3. Equate the equations in the first two steps.
  4. Eliminate μ or σ to solve for the other, then find the one you eliminated.
20
Q

How can we approximate the Binomial distribution with the Normal distribution and what are the conditions that make this a good approximation (including continuity corrections)?

A

If X~Bin(n, p), we can make the approximation:

X~N(np, npq)

Only if np>5 and nq>5

Continuity corrections:

If inclusive boundary, normal range goes to ±0.5 above or below the upper/lower boundaries respectively.

If exclusive boundary, the normal range goes to ∓ below or above the upper/ lower boundaries respectively.

21
Q

What are the steps of a normal/binomial hypothesis test?

A
  1. Define variable, stating n but keeping p as a variable, as well as assumptions leading to trials being independent.
  2. State hypotheses & distribution according to H0.
  3. State level (%) and type (one/two-tailed) as well as rejection criterion e.g. ‘The test value, x, will lie in the critical region if P(X >= x) < 5%.’
  4. Calulate required probability and make conclusion.
22
Q

When is taking a census impractical and why?

A

When the pop. size is large.

Time consuming and expensive and difficult to do with accuracy

23
Q

What are the advantages of taking a sample survey?

A

Can get data quickly and cheaply

Can give accurate indications if sample is representative

24
Q

What are some sources of bias?

A
  • Bad sampling frame
  • Wrong sampling unit
  • Non-response by some of the units
  • Bias from person conducting survey
25
Q

What is simple random sampling?

A

Assigning a number to every unit in the sampling frame and picking numbers randomly, without replacement.

26
Q

What is systematic sampling?

A

List the population in some order and choose every kth member after picking a random starting point.

27
Q

What is stratified sampling?

A

Splitting population into proportionate strata e.g. age groups and simple random sampling within each stratum.

28
Q

What is cluster sampling?

A

Population naturally split into clusters. Random sampling used to determine which clusters to sample, and then sample within each chosen cluster.

29
Q

What is quota sampling?

A

Population split into subgroups and a certain number (quota) from each subgroup are chosen not neccessarily randomly. Used if no population frame available.

30
Q

What are the conditions for a Poisson distribution?

A
  • Events occur singly and at random in a given interval of time or space.
  • λ, the mean number of occurences in the given interval is known and finite.
  • The number of occurences in the given interval, X~Po(λ)
  • P(X = x) = e * [λx/x!]
  • E(X) = λ & Var(X) = λ
  • If λ is an integer, there are two modes, λ - 1 & λ
  • If λ is not an integer, the mode is the integer below λ
31
Q

When and how can we approximate the Binomial distribution as Poission distribution?

A

When n is large (>50) and p is small (<0.1), X~Po(np) is appropriate

32
Q

If X~Po(λ) and Y~Po(μ), what is the distribution of X + Y (assuming X and Y are independent)?

A

X + Y ~ Po(λ + μ)

33
Q

When is the least squares regression line x on y used instead of y on x?

A
  • When neither variable in controlled (independent) and you want to interpolate a value of x for a given value of y
  • When y is the independent variable and you want to interpolate either x given y or y given x.
  • y on x is used in the opposites of these situations.
34
Q

How to carry out a significance test for PMCC?

A
  1. H0 : ρ = 0, H1 : [if 1-tailed test, either ρ is > or <0 depending on whether you are testing for a +ve or -ve correlation, if 2-tailed test ρ ≠ 0]
  2. State significance level and read critical value from tables.
  3. Reject if r is greater than critical value and make conclusion.
35
Q

How to carry out a Spearman’s Rank hypothesis test?

A
  1. State H0s = 0) & H1s > or < or ≠ 0)
  2. State level and type of test.
  3. State rejection criterion (sample size is n, from tables critical value is ‘a’, so reject H0 if rs > ‘a’
  4. Calculate rs :
    • Rank all points in terms of one metric and then in the other metric. If n values of a metric are the same, rank them all as: their value + [(n-1)/2]
    • Square the differences between each points score in each metric
    • Apply formula in booklet
  5. Make conclusion.