Topic 7: Chance Variability (The Box Model) Flashcards
Describe the nature of chance processes that we do.
Every time we do something related to probability, there is chance variability which will cause some of our results to differ each time we do it.
I.e. everytime we toss a fair coin 10 times, we might get different number of heads for different times. This is due to chance variability
What is the equation for observed value?
Observed value = expected value + chance error
I.e. Number of heads = half of the number of tosses + chance error
What is the Law of Averages / Law of Large Numbers?
Repeating an experiment multiple times in the long run will decrease the % size of chance error. It will also cause the absolute size of chance errors to increase.
In other words, the percentage difference between the observed and the theoretical value will decrease, whereas the absolute value (i.e. difference between the numbers) will increase
As the number of repetitions of experiment increases, the proportion of the event occurring will converge to theoretical/expected proportions
What is a way to model/describe the chance processes that occur?
The Box model
What information do we need to know for the box model?
Distinct numbers that go in a box (“tickets”)
The number of each kind of ticket in the box (refers to probability of drawing the tickets)
The number of draws from the box (How many times are we pulling / sample size)
What are some important notes to keep in mind when looking at the box model
Think the box as a summary of the population –> as a result our sd has to calculate population sd
Take draws from the box to create the sample
Consider the sum(i.e. no. of heads) or mean of the sample (i.e. % of heads)
chance error = observed value - expected value, which is modelled by standard error (SE)
What is the chance error equation?
Chance error = observed value - expected value
What is the expected value?
Expected value is the expected sum/mean after certain number of draws from a box
What is the observed value?
The value experimentally obtained by sampling/repetition etc.
What is the gambler’s fallacy?
The false belief that a random event is less or more likely to happen based on the results from a previous series of events
What is the equation for observed value (common across sum/mean of draws?)
Observed value = expected value + chance error
What is the equation for expected value for the sum of draws?
number of draws x mean of box
n x mean
What is the equation for standard error for the sum of draws?
Sqrt(number of draws) x SD of box
sqrt (n) x SD
What is the equation for expected value for the mean of draws?
mean of the box
mean
What is the equation for standard error for the mean of draws?
sd of box / sqrt(number of draws)
SD / sqrt (n)
Do we use sample or population to calculate sd of a box?
We use popsd or population because the box represents a population, thus the sd of the whole box is the sd of the population
What is the shortcut SD calculation for binary boxes (2 options)
SD = (big score - small score) x sqrt(proportion of big x proportion of small)
where proportions is the probability of it occurring
Can the normal curve be used with the box model?
For large number of draws / sample size from the box, the observed value of the sum / mean often follows the normal curve
We can model the sum/mean of the normal given the EV and SE of a box model
How do we calculate the mean of the normal curve using box model?
EV of sum or mean = mean of normal curve
How do we calculate the SD of the normal curve using box model?
SE of sum or mean = SD of normal curve
What trick can we use if we are just interested in 1 particular ‘ticket’ in the box?
We can set it so the two options are getting a ticket or not, where 1 is the ticket that we want or 0 is the ticket we dont want
I.e. toss a dice 100 times, count no. of 6’s. The box would have a 1 (representing “6”) and 5 x 0’s (representing “non 6’s”)
What are the 3 types of histograms?
Data histogram
Probability histogram
Simulation (empirical) histogram
What is a data histogram
Representes data by area
What is a probability histogram
Represents chance by area
Probability histogram is normal (i think)
What is a simulation (empirical) histogram?
Conveys in shape to the probability histogram
Represents chance by area for a simulation of the chance process
As there are more replicates, the sample data / simulation histogram becomes more normal –> converges to the probability histogram
What is the central limit theorem?
States that when drawing at random with replacement from a box, if the sample size for the sum (or average) of draws is sufficiently large, then the probability histogram for the sum or avg will closely follow the normal curve, even if the contents of the box doesn’t
Generally, the distribution for the sum or average will closely follow the normal curve
What are the conditions for central limit theorem to be true?
The no. of draws must be reasonably large (especially if histogram of the box differs from the normal curve massively)
How large of draws we need depends on the initial shape of the histogram - if its already close to normal, we need less, if it starts off asymetrical, it would take more
Common convention (for symmetric distributions with no obvious outliers) is that the no. of draws has to be larger than 30
Why might we have to use continuity correction?
Often on the normal curve, there is a missing part of the area calculated by the data histogram. To remedy this, we adjust by 0.5 on either side.
I.e. lower threshold = 6 –> 5.5
Upper threshold = 8 –> 8.5
View slide 27
To work out if we need to add or minus 0.5, we draw a sketch of the histogramWhat
What is the difference between sample size and replicates?
Sample size refers to how many tickets are drawn, replicates is how many times the experiment with that sample size is repeated
What are the effects of increasing replicates or sample sizes on the normal curve in a skewed box?
In a skewed box, increasing replicates will approach the box distribution, not the normal curve. However, increasing sample size will approach the normal curve