Computing As Experiment Flashcards
Reasoning
Sometimes experimentation works better rather than concrete mathematics, like machine-learning for example.
Population
Set of items from which objects to test are derived from. We can denote this as Ω.
Population examples
- a set of permutations of <1,2,3,…,n>
- a set of graphs with n nodes and m edges
- a set amount of students attending a school
We can focus on certain subsets of these, like male students or even permutations.
Sampling
Taking a random section of the population.
There could be many reasons, like limiting fake results or because the population is too large.
How do we then choose in a unbias way?
P[X] and Unbiased Selection
Probability of each element being chosen, with each elements probability fulfilling this equation:
0 <= P[X] <= 1
P[X] = 1/Ω.
Lets say we had 100 students, each given probability of 0.01.
One student would be chosen with a probability of 1/100.
Sampling
Taking a random section of the population.
There could be many reasons, like limiting fake results or because the population is too large.
How do we then choose in a unbiased way?
Random Variable
r : Ω -> R
Maps a population to a value.
This essentially means that all of the members of the population have some sort of random variable assigned to them, and we can pick from that instead of the probability. The random variable for students, for example, could be a class test score.
Expected Value E[X]
Denoted as E[X], the expected value is:
Σ P[X] . r(X)
Essentially, the sum over all the probabilities of x multiplied by the random variables of x. This gives us what we expect the value to return when choosing a random entry in a population, or in other words, is the average we expect over a set amount of entries.
E[X] of a fair die
We have the sum over:
1 . 1/6, with 1 being the number on the die (the random variable), and 1/6 being the probability of getting 1 (the probability)
2 . 1/6
3 . 1/6
4 . 1/6
5 . 1/6
6 . 1/6
The summation of all of these gives us 3.5, which is the expected value of rolling a fair die.
Bias E[X] of a unfair die
What if we had different probabilities?
We simply incorporate this into our summations:
1/4 if X exists in {1,2,3}
1/12 if X exists in {4,5,6}
Then the summation of the top definition gives us (1 . 1/4) + (2 . 1/4) + (3 . 1/4) = 6/4 = 1.5.
The summation of the bottom gives us (4 . 1/12) + (5 . 1/12) + (6 . 1/12) = 15/12 = 5/4 = 1.25.
This gives us the expected value 1.5 + 1.25 = 2.75
Problems with Mean, Median and Mode
Normally, when we have all of these statistics in front of us, we see similarities. For examples, take this below example as scores from students:
- 10 score 20%
- 35 score 50%
- 25 score 60%
- 30 score 90%
The mode is 50%, the median is 60% and the mean is 61.5%.
However, messing with data can give us a massive distance between mode/median and mean:
Lets say we have 61 who score 25%, and 39 who score 100%.
The mean is roughly 54.25%, but the mode and median are way lower, giving us 25% for both.
Variance
Variance, or Var(X), is the summation of the values of the square of the random variables subtract the expected values. Essentially:
Σ (r(X_i) - E[X])^2
The higher the variance is, the most likely the data is spread out and the more likely the random variable deviates from the expected one, therefore demonstrating how impactful this could be on the mean.
Exact Standard Deviation
This is essentially the square root of the variance, and quantifies how much the individual data points in a dataset deviate, on average, from the expected value of the dataset (same as variance).
Estimated Standard Deviation
Since there can be a lot of variables and expected values going into this summation of variance/ExactSD, we can just take a sample instead of <y_1,y_2,…,y_n>, which these values equate to the random variables of <x_1, x_2,…,x_n> from the original saple, giving us the equation:
sqrt( (Σ (r(y_i) - E[y] )^2) / N)
This seems very hard to read, so lets break it down.
We first find out the random variable of y_1, and subtract it from the expected value of Y, since this does not change due to it being the expected value of the whole set of Y.
We square this answer, and then repeated for y_2, y_3 and so on…
We then take this answer, and divide it by N (changed later to N-1 in Bessel’s Correction). This is so we can reduce the impact of invalid results as heavily as it would do (we want a good idea of what the deviation is, not a completely chaotic number).
We then take all of this, and square root it like we would do in exact standard deviation.
Bessel’s Correction
Essentially, instead of dividing by N, we divide by N-1. This is explained further in data science.