Categorical Data Flashcards

1
Q

What is the response data for categorical data?

A
  • Binary (0’s or 1’s), denoting the presence or absence of some feature/event
  • Proportions (which are bounded by 0 and 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the range for a probability?

A

Must lie between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If an event will never occur, what probability is it?

A

Probability of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If the probability of an event F occuring is Pr(F), what is the probability of it’s complement, Pr(F with line)

A

Pr(F with line) = 1 - Pr(F)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the probability mass function tell us?

A
  • Gives the probability that a discrete random variable is exactly equal to some value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the expected value give us an idea about?

A

The centre or location of a probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does the variance give us an idea about?

A

The spread of a probability distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if the variance is large?

A

The values of X will vary from the expectation a lot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the Binomial distribution characterize?

A

Binary outcomes for a repeated event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the two parameters that a Binomial distribution has?

A
  • fixed number of trials (n)

- probability of a success (p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

X is said to have a binomial distribution if….

A
  • there are only two possible outcomes
  • there are a fixed number of trials
  • p is constant for all trials
  • binomial variable is the total number of successes in n trials
  • each trial is independent on other trials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the PMF of a binomial distribution give us?

A

The probability of seeing X successes out of the N trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe the shape of the binomial distribution if p is low

A
  • small number of successes

- distribution is skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe the shape of the binomial distribution if p is 0.5

A
  • half of the trials are successful

- distribution is symmetrical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the shape of the binomial distribution if p is high

A
  • large number of successes

- distribution is skewed to the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the expected value of the binomial distribution?

A

np

17
Q

What is the variance of the binomial distribution?

A

np(1-p)

18
Q

What kind of distribution do the sample proportions form about the true population proporiton?

A

Normally distributed

19
Q

What is teh standard deviation of the sample proportion?

A

sqrt(p(1-p)/n)

20
Q

Describe sampling situation A

A

The proportions originate from independent samples

21
Q

Describe sampling situation B

A

The same sample gives rise to two (or more) proportions where the same individual can only choose one of the options

22
Q

Describe sampling situation C

A

The same sample but an individual can choose more than one category

23
Q

Describe the odds ratio

A

An odds ratio is a relative measure of effect, whcih allows, for example, the comparison of an intervention group of a study relative to a control, or placebo group

24
Q

What is the numerator in the odds ratio?

A

Odds in the intervention arm

25
Q

What is the denominator in the odds ratio?

A

Odds in the control arm

26
Q

What can you say if the OR is greater than 1?

A

The intervention is better than the control

27
Q

What can you say if the OR is less than 1?

A

The control is better than the intervention

28
Q

What are the odds of success defined to be?

A

p(success)/p(failure) so p/1-p

29
Q

What is the probability of success from the odds?

A

odds/odds + 1

30
Q

Why do we use the logs of odds ratio as the centre of the associated confidence intervals?

A

Sampling distribution for the odds ratio is highly skewed

Logs tend to be more symmetrical

31
Q

What is the chi square distribution indexed by?

A

The degrees of freedom

32
Q

What is the null hypothesis for a chi square distribution?

A

H0 : expected count = total x specified cell probability

33
Q

What are the degrees of freedom for chi square goodness of fit test?

A

number of categories - 1

34
Q

What are the two types of chi square tests?

A
  • Goodness of fit

- Test for independence

35
Q

What is the expected count for the chi square test for independence?

A

(row total x column total)/(grand total)

36
Q

What are the degrees of freedom for the independence chi squared distrbution?

A

(number of columns - 1) x (number of rows - 1)

37
Q

What are the rules of thumb for chi squared test?

A
  • no more than 20% of the expected counts are less than 5

- all expected counts are one or greater