Topic 7: Chance Variability (The Box Model) Flashcards

1
Q

Describe the nature of chance processes that we do.

A

Every time we do something related to probability, there is chance variability which will cause some of our results to differ each time we do it.

I.e. everytime we toss a fair coin 10 times, we might get different number of heads for different times. This is due to chance variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the equation for observed value?

A

Observed value = expected value + chance error

I.e. Number of heads = half of the number of tosses + chance error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Law of Averages / Law of Large Numbers?

A

Repeating an experiment multiple times in the long run will decrease the % size of chance error. It will also cause the absolute size of chance errors to increase.

In other words, the percentage difference between the observed and the theoretical value will decrease, whereas the absolute value (i.e. difference between the numbers) will increase

As the number of repetitions of experiment increases, the proportion of the event occurring will converge to theoretical/expected proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a way to model/describe the chance processes that occur?

A

The Box model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What information do we need to know for the box model?

A

Distinct numbers that go in a box (“tickets”)

The number of each kind of ticket in the box (refers to probability of drawing the tickets)

The number of draws from the box (How many times are we pulling / sample size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some important notes to keep in mind when looking at the box model

A

Think the box as a summary of the population –> as a result our sd has to calculate population sd

Take draws from the box to create the sample

Consider the sum(i.e. no. of heads) or mean of the sample (i.e. % of heads)

chance error = observed value - expected value, which is modelled by standard error (SE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the chance error equation?

A

Chance error = observed value - expected value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the expected value?

A

Expected value is the expected sum/mean after certain number of draws from a box

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the observed value?

A

The value experimentally obtained by sampling/repetition etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the gambler’s fallacy?

A

The false belief that a random event is less or more likely to happen based on the results from a previous series of events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the equation for observed value (common across sum/mean of draws?)

A

Observed value = expected value + chance error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the equation for expected value for the sum of draws?

A

number of draws x mean of box

n x mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the equation for standard error for the sum of draws?

A

Sqrt(number of draws) x SD of box

sqrt (n) x SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for expected value for the mean of draws?

A

mean of the box

mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the equation for standard error for the mean of draws?

A

sd of box / sqrt(number of draws)

SD / sqrt (n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Do we use sample or population to calculate sd of a box?

A

We use popsd or population because the box represents a population, thus the sd of the whole box is the sd of the population

17
Q

What is the shortcut SD calculation for binary boxes (2 options)

A

SD = (big score - small score) x sqrt(proportion of big x proportion of small)

where proportions is the probability of it occurring

18
Q

Can the normal curve be used with the box model?

A

For large number of draws / sample size from the box, the observed value of the sum / mean often follows the normal curve

We can model the sum/mean of the normal given the EV and SE of a box model

19
Q

How do we calculate the mean of the normal curve using box model?

A

EV of sum or mean = mean of normal curve

20
Q

How do we calculate the SD of the normal curve using box model?

A

SE of sum or mean = SD of normal curve

21
Q

What trick can we use if we are just interested in 1 particular ‘ticket’ in the box?

A

We can set it so the two options are getting a ticket or not, where 1 is the ticket that we want or 0 is the ticket we dont want

I.e. toss a dice 100 times, count no. of 6’s. The box would have a 1 (representing “6”) and 5 x 0’s (representing “non 6’s”)

22
Q

What are the 3 types of histograms?

A

Data histogram

Probability histogram

Simulation (empirical) histogram

23
Q

What is a data histogram

A

Representes data by area

24
Q

What is a probability histogram

A

Represents chance by area

Probability histogram is normal (i think)

25
Q

What is a simulation (empirical) histogram?

A

Conveys in shape to the probability histogram

Represents chance by area for a simulation of the chance process

As there are more replicates, the sample data / simulation histogram becomes more normal –> converges to the probability histogram

26
Q

What is the central limit theorem?

A

States that when drawing at random with replacement from a box, if the sample size for the sum (or average) of draws is sufficiently large, then the probability histogram for the sum or avg will closely follow the normal curve, even if the contents of the box doesn’t

Generally, the distribution for the sum or average will closely follow the normal curve

27
Q

What are the conditions for central limit theorem to be true?

A

The no. of draws must be reasonably large (especially if histogram of the box differs from the normal curve massively)

How large of draws we need depends on the initial shape of the histogram - if its already close to normal, we need less, if it starts off asymetrical, it would take more

Common convention (for symmetric distributions with no obvious outliers) is that the no. of draws has to be larger than 30

28
Q

Why might we have to use continuity correction?

A

Often on the normal curve, there is a missing part of the area calculated by the data histogram. To remedy this, we adjust by 0.5 on either side.

I.e. lower threshold = 6 –> 5.5
Upper threshold = 8 –> 8.5

View slide 27

To work out if we need to add or minus 0.5, we draw a sketch of the histogramWhat

29
Q

What is the difference between sample size and replicates?

A

Sample size refers to how many tickets are drawn, replicates is how many times the experiment with that sample size is repeated

30
Q

What are the effects of increasing replicates or sample sizes on the normal curve in a skewed box?

A

In a skewed box, increasing replicates will approach the box distribution, not the normal curve. However, increasing sample size will approach the normal curve