If there are n data points, which data point is the: 1. median? 2. UQ? 3. LQ?

1. [(n+1)/2]th point Let m be the number of points strictly above or below the median 2. [(n+1)/2] + [(m+1)/2] th point 3. [(m+1)/2] th point

What are the formula for: 1. s.d. of a population? 2. s.d. of a sample?

1. σ = Sxx/n = [Σ(xi - x)2] / n = [Σ (xi2) - nx2] / n 2. s = Sxx/n - 1 = [Σ(xi - x)2] / n - 1 = [Σ (xi2) - nx2] / n - 1

What are the formulae for: 1. mean 2. Sxx when using grouped data?

1. x_bar = Σfixi/Σfi 2. Sxx = Σ xi2fi - [(Σfixi)2/n]

1. Where is the tail for a positvely skewed distribution? 2. Where is the tail for a negatively skewed distribution?

1. away from y axis 2. close to y axis.

What is the formula for: E(X) E(f(X)) E(aX + b) E(f(X) + g(X)) Var(X) Var(aX + b)

i=1Σn xi P(X = xi) = μ i=1Σn f(xi) P(X = xi) a E(X) + b E(f(X))+ E(g(X)) E(X2) - μ2 a2 Var(X)

What are the formulae for: Var(X + Y) E(aX + bY) Var(aX + bY)

Var(X) + Var(Y) a E(X) + b E(Y) a2 Var(X) + b2 Var(Y)

What formulae are used to calulate expectation and variance: 1. for multiple observations of same variable? 2. for scaling one observation by a factor?

1. n E(X) & n Var(X) 2. n E(X) & n2Var(X)

Statistics Flashcards by Shrey Shah

What two things must you do when making a stemplot?

Use equal intervals.
Give a key with an example to show how to read it.

How well did you know this?

Not at all

Perfectly

What are the two definitions of outliers?

More than 1.5x the IDR above or below the upper and lower quartiles respectively.
2 standard deviations above or below the mean.

How well did you know this?

Not at all

Perfectly

If there are n data points, which data point is the:

median?
UQ?
LQ?

[(n+1)/2]^th point

Let m be the number of points strictly above or below the median

[(n+1)/2] + [(m+1)/2] ^th point
[(m+1)/2] ^th point

How well did you know this?

Not at all

Perfectly

What are the formula for:

s.d. of a population?
s.d. of a sample?

σ = S_xx/n = [Σ(x_i - x)²] / n = [Σ (x_i²) - nx²] / n
s = S_xx/n - 1 = [Σ(x_i - x)²] / n - 1 = [Σ (x_i²) - nx²] / n - 1

How well did you know this?

Not at all

Perfectly

What are the formulae for:

mean
S_x_x

when using grouped data?

x_bar = Σf_ix_i/Σf_i
S_xx = Σ x_i²f_i - [(Σf_ix_i)²/n]

How well did you know this?

Not at all

Perfectly

Where is the tail for a positvely skewed distribution?
Where is the tail for a negatively skewed distribution?

away from y axis
close to y axis.

How well did you know this?

Not at all

Perfectly

What are the two defining features of a normal distribution?

Symmetrical
Bell-shaped

How well did you know this?

Not at all

Perfectly

If a variable X is coded to a variable Y such that y = ax + b, what are the mean and s.d. of y in terms of the mean and s.d. of x?

y_bar = a(x_bar) + b
s_y = |a|s_x

How well did you know this?

Not at all

Perfectly

If two variables, X and Y, are combined what are the new mean and variance?

mean = [Σx + Σy/n_x + n_y]
variance = [Σx² + Σy²/n_x + n_y] - (mean)²

How well did you know this?

Not at all

Perfectly

What is the formula for:

E(X)
E(f(X))
E(aX + b)
E(f(X) + g(X))
Var(X)
Var(aX + b)

_i=1Σⁿ x_i P(X = x_i) = μ
_i=1Σⁿ f(x_i) P(X = x_i)
a E(X) + b
E(f(X))+ E(g(X))
E(X²) - μ²
a² Var(X)

How well did you know this?

Not at all

Perfectly

What are the formulae for:

Var(X + Y)
E(aX + bY)
Var(aX + bY)

Var(X) + Var(Y)
a E(X) + b E(Y)
a² Var(X) + b² Var(Y)

How well did you know this?

Not at all

Perfectly

What formulae are used to calulate expectation and variance:

for multiple observations of same variable?
for scaling one observation by a factor?

n E(X) & n Var(X)
n E(X) & n²Var(X)

How well did you know this?

Not at all

Perfectly

What are the conditions for a discrete uniform distribution?

Each value is equally likely to occur i.e. P(X= x_i) = 1/n for i = 1, 2, 3, …, n

E(X) = a + [n+1/2] where a is one less than the lowest value which is included in the distribution

Var(X) = n² - 1/12

How well did you know this?

Not at all

Perfectly

What are the conditions for a discrete geometric distribution?

Outcome either sucess or faliure
Independent trial
Prob. of success, p, is same for each trial
X ~ Geo(p)
P(X = r) = q^r-1 x p
Mode is always 1
P(X =< x) = 1-q^x
E(X) = 1/p
Var(X) q/p²

How well did you know this?

Not at all

Perfectly

What are the conditions for the binomial model?

Finite number of trials, n
Each trial is a success or failure
Prob. of success, p, is same for each trial
Dis. rand. var. X gives no. of successful outcomes in n trials
X ~ Bin(n,p)
P(X = r) = nCr * q^n-r * p^r
E(X) = np
Var(X) = npq

How well did you know this?

Not at all

Perfectly

What is mutual exclusivity?

Study These Flashcards

P(A n B) = 0

P(A u B) = P(A) + P(B)

In general, what is the formula for P(A u B)?

Study These Flashcards

P(A) + P(B) - P(A n B)

How can you show that the probability of event A happening is independent of event B happening?

Study These Flashcards

P(B | A) = P(B n A)/P(A)

If A and B are independent, P(A n B) = P(A) x P(B)

Then: P(B | A) = P(A) x P(B)/P(A) = P(B)

How to find μ & σ when given two probabilities?

Study These Flashcards

Let the given probabilities be:

P(X > a) = p₁ and P(X > b) = p₂

Standardise each variable (write z as a function of a and a function of b)
Write an expression for z in terms of φ^-1 and p₁, and φ^-1 and p₂
Equate the equations in the first two steps.
Eliminate μ or σ to solve for the other, then find the one you eliminated.

How can we approximate the Binomial distribution with the Normal distribution and what are the conditions that make this a good approximation (including continuity corrections)?

Study These Flashcards

If X~Bin(n, p), we can make the approximation:

X~N(np, npq)

Only if np>5 and nq>5

Continuity corrections:

If inclusive boundary, normal range goes to ±0.5 above or below the upper/lower boundaries respectively.

If exclusive boundary, the normal range goes to ∓ below or above the upper/ lower boundaries respectively.

What are the steps of a normal/binomial hypothesis test?

Study These Flashcards

Define variable, stating n but keeping p as a variable, as well as assumptions leading to trials being independent.
State hypotheses & distribution according to H₀.
State level (%) and type (one/two-tailed) as well as rejection criterion e.g. ‘The test value, x, will lie in the critical region if P(X >= x) < 5%.’
Calulate required probability and make conclusion.

When is taking a census impractical and why?

Study These Flashcards

When the pop. size is large.

Time consuming and expensive and difficult to do with accuracy

What are the advantages of taking a sample survey?

Study These Flashcards

Can get data quickly and cheaply

Can give accurate indications if sample is representative

What are some sources of bias?

Study These Flashcards

Bad sampling frame
Wrong sampling unit
Non-response by some of the units
Bias from person conducting survey

What is simple random sampling?

Assigning a number to every unit in the sampling frame and picking numbers randomly, without replacement.

What is systematic sampling?

List the population in some order and choose every k^th member after picking a random starting point.

What is stratified sampling?

Splitting population into proportionate strata e.g. age groups and simple random sampling within each stratum.

What is cluster sampling?

Population naturally split into clusters. Random sampling used to determine which clusters to sample, and then sample within each chosen cluster.

What is quota sampling?

Population split into subgroups and a certain number (quota) from each subgroup are chosen not neccessarily randomly. Used if no population frame available.

What are the conditions for a Poisson distribution?

* Events occur singly and at random in a given interval of time or space. * λ, the mean number of occurences in the given interval is known and finite. * The number of occurences in the given interval, X~Po(λ) * P(X = x) = e^-λ \* [λ^x/x!] * E(X) = λ & Var(X) = λ * If λ is an integer, there are two modes, λ - 1 & λ * If λ is not an integer, the mode is the integer below λ

When and how can we approximate the Binomial distribution as Poission distribution?

When n is large (\>50) and p is small (\<0.1), X~Po(np) is appropriate

If X~Po(λ) and Y~Po(μ), what is the distribution of X + Y (assuming X and Y are independent)?

X + Y ~ Po(λ + μ)

When is the least squares regression line x on y used instead of y on x?

* When neither variable in controlled (independent) and you want to interpolate a value of x for a given value of y * When y is the independent variable and you want to interpolate either x given y or y given x. * y on x is used in the opposites of these situations.

How to carry out a significance test for PMCC?

1. H₀ : ρ = 0, H₁ : [if 1-tailed test, either ρ is \> or \<0 depending on whether you are testing for a +ve or -ve correlation, if 2-tailed test ρ ≠ 0] 2. State significance level and read critical value from tables. 3. Reject if r is greater than critical value and make conclusion.

How to carry out a Spearman's Rank hypothesis test?

1. State H₀ (ρ_s = 0) & H₁ (ρ_s \> or \< or ≠ 0) 2. State level and type of test. 3. State rejection criterion (sample size is n, from tables critical value is 'a', so reject H₀ if r_s \> 'a' 4. Calculate r_s : * Rank all points in terms of one metric and then in the other metric. If n values of a metric are the same, rank them all as: their value + [(n-1)/2] * Square the differences between each points score in each metric * Apply formula in booklet 5. Make conclusion.

Statistics Flashcards

(35 cards)