Probability and Statistics Flashcards

1
Q

What is the binomial coefficient

A

(n k) = n! / k!(n-k)!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is a Bernoulli trial

A

only has 2 possible outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

rule for probability that 2 independent events both occur

A

‘and’ rule -> multiplication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

rule for probability that one or another event occurs

A

addition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to find probability that a and b occur given that b occurs

A

P(a and b) / P(b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

pdf for binomial distribution
P(X=k) =

A

(n k) p^k (1-p)^n-k

binomial coefficient X probability of success k times X probability of failure n-k times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

definition of Expectation

A

the sum of all possible outcomes, weighted by their probabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When can the Poisson distribution be used

A

large n
small p
(ie rare events)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

formula for µ, the density parameter

A

µ = np (=E(x))
n = number of trials
p = probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

pdf for poisson distribution P(X=k) ≈

A

e^µ µ^k / k!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

E(x) for binomial distribution

A

np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

E(x) for Poisson distribution

A

µ = np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what are the parameters for the geometric distribution

A

p, probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

pdf for geometric distribution
P(X = k) =

A

(1 - p)^n-1 p

probability of the n-1 failures before the one probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

expectation E(x) for geometric distribution

A

1 / p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

parameters for the exponential distribution

A

lambda = the rate parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what’s the difference between the exponential and geometric distribution

A

geometric = discrete
exponential = continuous
exponential distribution can be used to model the geometric when n gets large and p gets very small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

pdf for exponential distribution
f(x) =

A

lambda e^ - (lambda x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

cdf for exponential distribution
F(x) =

A

1 - e^ - (lambda x)

(if can’t remember can integrate the pdf between 0 and x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what does the cdf show

A

an expression that gives the probability that a random variable X falls between 0 and x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

expected value of the exponential distribution

A

1 / lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

parameters of the normal distribution

A

µ - the mean
sigma - std

23
Q

expectation for normal distribution

A

µ = mean

24
Q

what does the Z scale do (normal distribution)

A

measures how many stds a point lies from the mean of its parent distribution

normalises the data

25
Q

formula for Z scale

A

Z = (Xi - µ) / std

X = point
µ = mean of parent distribution
std = std of parent distribution

26
Q

critical value for 2 tailed standard normal at alpha=0.05

A

+- 1.96

+-1.96*sigma for not normalised

27
Q

when is the t distribution used

A

small sample size
don’t know mean

28
Q

difference between t distribution and normal distribution

A

t has longer tale, therefore has more extreme critical values for same significance level

as the sample size in t increases, the t distribution tends to the normal

29
Q

formula for t scale

A

( X - µ) / Sx

X = sample mean
µ = population mean (often unknown)
Sx = standard error of mean

30
Q

what is standard error of mean (SEM)

A

Sx = s / root(n)

s = sample std
n = sample size

31
Q

what is the p value

A

probability of observing a result equal to or more extreme than the outcome

32
Q

what is a type one error

A

rejecting the null when its true
‘False positive’

33
Q

what is a type two error

A

fail to reject the null when its false
‘False negative’

34
Q

what is alpha level

A

level of confidence at which we reject the null

probability of a type one error

35
Q

why shouldn’t you use multiple t tests for multiple comparisons

A

the probability of a type 1 error gets large

36
Q

what should you use instead of multiple t tests for comparisons

A

ANOVA

37
Q

what is the within-group variance

A

comparing the distribution of replicates to their treatment mean

38
Q

what is the among/between group variance

A

comparing the distribution of the treatment means to the grand mean

39
Q

what is the F statistic in ANOVA

A

among / within

40
Q

what are treatments in ANOVA

A

the different samples

41
Q

what are replicates in ANOVA

A

sample units within treatments

42
Q

formula for Chi-square test statistic

A

∑ (o - e)^2 / e

43
Q

formula for Pearsons r test statistic

A

(use Z scale)
r = ∑(Zxi + Zyi) / n-1

44
Q

formula for slope estimate, b of a regression line

A

b = ∑(Xi - X)(Yi - Y)
—————————-
∑(Xi - X)(Xi - X)

Xi = x values
X = mean of x values

45
Q

residual formula

A

residual = Yi - ^Yi

y value minus the value of y on the regression line

46
Q

problems with regression analysis

A
  • induced correlations ( ie values that sum to 100% or 1, such as mineral compositions may indicate correlation in more than one variable falsely)
  • correlation vs causation
  • pseudoreplication (single area data taken from doesn’t represent all)
47
Q

What is the t-test used for

A
  • test whether a sample is drawn from a population of specific mean
  • test if means of 2 samples differ
48
Q

what is the ANOVA test used for

A
  • test whether ≥ 3 samples are drawn from populations with equal means
    (like students t)
49
Q

what is the Chi-square test used for

A
  • test how well observed categorial data fits a given model/expected values
50
Q

How to find within-group variance

A

s.s / d.f

s.s. = ∑(Xi - X)^2
-> distance from treatment means

d.f. = n-1 (for each treatment, then added together (ie total replicates - number of treatments))

51
Q

how to find among (between) group variance

A

s.s / d.f

s.s. = ∑ (Xti - Xg)
-> distance of treatment means from grand mean

d.f. = n - 1 ( number of treatments -1)

52
Q

when do you reject ANOVA null hypothesis

A

when F statistic > table value, based on numerator and denominator degrees of freedom

53
Q

assumptions for t test

A
  • data from normally distributed populations
  • data from populations of equal variance
  • samples drawn at random from parent distributions
54
Q

assumptions for ANOVA

A
  • data drawn from normally distributed populations
  • data from populations of equal variance
  • data independent of one another