Probability and Statistics Flashcards

Question 1

Q

What is the binomial coefficient

Answer

A

(n k) = n! / k!(n-k)!

Question 2

Q

what is a Bernoulli trial

Answer

A

only has 2 possible outcomes

Question 3

Q

rule for probability that 2 independent events both occur

Answer

A

‘and’ rule -> multiplication

Question 4

Q

rule for probability that one or another event occurs

Question 5

Q

how to find probability that a and b occur given that b occurs

Answer

A

P(a and b) / P(b)

Question 6

Q

pdf for binomial distribution
P(X=k) =

Answer

A

(n k) p^k (1-p)^n-k

binomial coefficient X probability of success k times X probability of failure n-k times

Question 7

Q

definition of Expectation

Answer

A

the sum of all possible outcomes, weighted by their probabilities

Question 8

Q

When can the Poisson distribution be used

Answer

A

large n
small p
(ie rare events)

Question 9

Q

formula for µ, the density parameter

Answer

A

µ = np (=E(x))
n = number of trials
p = probability of success

Question 10

Q

pdf for poisson distribution P(X=k) ≈

Answer

A

e^µ µ^k / k!

Question 11

Q

E(x) for binomial distribution

Question 12

Q

E(x) for Poisson distribution

Question 13

Q

what are the parameters for the geometric distribution

Answer

A

p, probability of success

Question 14

Q

pdf for geometric distribution
P(X = k) =

Answer

A

(1 - p)^n-1 p

probability of the n-1 failures before the one probability of success

Question 15

Q

expectation E(x) for geometric distribution

Question 16

Q

parameters for the exponential distribution

Answer

A

lambda = the rate parameter

Question 17

Q

what’s the difference between the exponential and geometric distribution

Answer

A

geometric = discrete
exponential = continuous
exponential distribution can be used to model the geometric when n gets large and p gets very small

Question 18

Q

pdf for exponential distribution
f(x) =

Answer

A

lambda e^ - (lambda x)

Question 19

Q

cdf for exponential distribution
F(x) =

Answer

A

1 - e^ - (lambda x)

(if can’t remember can integrate the pdf between 0 and x)

Question 20

Q

what does the cdf show

Answer

A

an expression that gives the probability that a random variable X falls between 0 and x

Question 21

Q

expected value of the exponential distribution

Answer

A

1 / lambda

Question 22

Q

parameters of the normal distribution

Answer

A

µ - the mean
sigma - std

Question 23

Q

expectation for normal distribution

Answer

A

µ = mean

Question 24

Q

what does the Z scale do (normal distribution)

Answer

A

measures how many stds a point lies from the mean of its parent distribution

normalises the data

Question 25

Q

formula for Z scale

Answer

A

Z = (Xi - µ) / std

X = point
µ = mean of parent distribution
std = std of parent distribution

Question 26

Q

critical value for 2 tailed standard normal at alpha=0.05

Answer

A

+- 1.96

+-1.96*sigma for not normalised

Question 27

Q

when is the t distribution used

Answer

A

small sample size
don’t know mean

Question 28

Q

difference between t distribution and normal distribution

Answer

A

t has longer tale, therefore has more extreme critical values for same significance level

as the sample size in t increases, the t distribution tends to the normal

Question 29

Q

formula for t scale

Answer

A

( X - µ) / Sx

X = sample mean
µ = population mean (often unknown)
Sx = standard error of mean

Question 30

Q

what is standard error of mean (SEM)

Answer

A

Sx = s / root(n)

s = sample std
n = sample size

Question 31

Q

what is the p value

Answer

A

probability of observing a result equal to or more extreme than the outcome

Question 32

Q

what is a type one error

Answer

A

rejecting the null when its true
‘False positive’

Question 33

Q

what is a type two error

Answer

A

fail to reject the null when its false
‘False negative’

Question 34

Q

what is alpha level

Answer

A

level of confidence at which we reject the null

probability of a type one error

Question 35

Q

why shouldn’t you use multiple t tests for multiple comparisons

Answer

A

the probability of a type 1 error gets large

Question 36

Q

what should you use instead of multiple t tests for comparisons

Question 37

Q

what is the within-group variance

Answer

A

comparing the distribution of replicates to their treatment mean

Question 38

Q

what is the among/between group variance

Answer

A

comparing the distribution of the treatment means to the grand mean

Question 39

Q

what is the F statistic in ANOVA

Answer

A

among / within

Question 40

Q

what are treatments in ANOVA

Answer

A

the different samples

Question 41

Q

what are replicates in ANOVA

Answer

A

sample units within treatments

Question 42

Q

formula for Chi-square test statistic

Answer

A

∑ (o - e)^2 / e

Question 43

Q

formula for Pearsons r test statistic

Answer

A

(use Z scale)
r = ∑(Zxi + Zyi) / n-1

Question 44

Q

formula for slope estimate, b of a regression line

Answer

A

b = ∑(Xi - X)(Yi - Y)
—————————-
∑(Xi - X)(Xi - X)

Xi = x values
X = mean of x values

Question 45

Q

residual formula

Answer

A

residual = Yi - ^Yi

y value minus the value of y on the regression line

Question 46

Q

problems with regression analysis

Answer

A

induced correlations ( ie values that sum to 100% or 1, such as mineral compositions may indicate correlation in more than one variable falsely)
correlation vs causation
pseudoreplication (single area data taken from doesn’t represent all)

Question 47

Q

What is the t-test used for

Answer

A

test whether a sample is drawn from a population of specific mean
test if means of 2 samples differ

Question 48

Q

what is the ANOVA test used for

Answer

A

test whether ≥ 3 samples are drawn from populations with equal means
(like students t)

Question 49

Q

what is the Chi-square test used for

Answer

A

test how well observed categorial data fits a given model/expected values

Question 50

Q

How to find within-group variance

Answer

A

s.s / d.f

s.s. = ∑(Xi - X)^2
-> distance from treatment means

d.f. = n-1 (for each treatment, then added together (ie total replicates - number of treatments))

Question 51

Q

how to find among (between) group variance

Answer

A

s.s / d.f

s.s. = ∑ (Xti - Xg)
-> distance of treatment means from grand mean

d.f. = n - 1 ( number of treatments -1)

Question 52

Q

when do you reject ANOVA null hypothesis

Answer

A

when F statistic > table value, based on numerator and denominator degrees of freedom

Question 53

Q

assumptions for t test

Answer

A

data from normally distributed populations
data from populations of equal variance
samples drawn at random from parent distributions

Question 54

Q

assumptions for ANOVA

Answer

A

data drawn from normally distributed populations
data from populations of equal variance
data independent of one another