new Flashcards
(38 cards)
Associative Law for Union and Intersection
(EUF)UG = EU(FUG)
E or F or G = E or F or G
EF(G) = (E)FG
E and F and G = E and F and G
Chubyshovs inequality
If we are trying to identify how much of a dataset lies between the values of x̄ +-ks where s = standard deviation and k = some number then
% min = 100(1- 1/(k2))
k>1 otherwise the probability = 0
Communitive law for union/intersection
E U F = F U E
Event E or F = Event F or E
EF=FE
Event E and F = Events F and E
Compliment
If we have an event, E, within the sample space, S then Ec = compliment of E and includes everything that is not E
Correlation Coefficient
r = [Σ (xi - x̄)( yi - ȳ)]/(n-1)sxsy = [Σ (xi - x̄)( yi - ȳ)]/[Σ (xi - x̄)2( yi - ȳ)2].5
This says that if we have a paired dataset such that xi,yi are the pairs and are described by their respective means such that y = mx + b then this statistic will indicate the linearity of the pairs of data
Cumulative Frequency
This shows the bins as a function of an additive frequency.
These are also called Ogives
Demorgans Laws
(EUF)C = ECFC
E or F do not occur = E not occurring and F not occurring
(EF)C = EC U FC
E and F not occurring = E not occurring or F not occurring
Gini Coefficient
The gini coefficient (G) is the integral of the area between L(p) = 1 and the Lorenz Curve. It has a maximum value of .5 and a minimum value of 0
G=1-2B where B = area under Lorenze curve, L(p)
How to use the weighted probability of E?
If tasked with finding P(F|E) where E is the second event
Then P(F|E) = P(FE)/(P(E)
where P(FE) = P(F)P(E|F) and P(E) = P(F)P(E|F) + P(Fc)P(E|Fc)
Independent events
If P(E|F) = P(E) then E and F are independent and E is not a function of event F
and P(EFG)=P(E)P(F)P(G)
These occur when two events occur at the same time or in distinct independent localles.
Ex: Covid cases in Mexico and Greenland on some day, two balls being selected from a jar at once (if not at the same time their chances are not independent)
Lorenz Curve
This is a cumulative curve showing the income distribution
mean
x bar = Σx/n = Σ v*f/n
where v = bin value and f = frequency
Mean vs. Median when to use
Generally mean gives a better understanding of the dataset in terms of describing the data. The median should be used when probabilities are involved and/or the value is being used to understand the order of a group.
Ex: Housing. The mean income would be best for determining what the average person in an area can spend on a home but if we want to design housing where we could expect 50% of the population could live (P(Affordable)=.5) then the median is more useful.
Median
This is the middle value of a sample when data is arranged from least to greatest
If n is odd then the median value occurs at n = (n+1)/2
If n is even then the median is the average of (n/2)+1 and n/2
number of unique groups in a set
If we have a collection of n = # of objects and we want to know how many unique combinations of size r can be made when order matters
= n!/[(n-r)!r!]
This says that for a sample size=n that we can arrange “r” elements this many ways uniquely. If we have a sample space this will define the number of potential outcomes that are possible.
If we want to know the possibility of a subset occurance within a group this will be given by (# of combinations in subset 1)*(#comb in 2)/(total # comb)
P(E) = ? as a weighted average
P(E) = P(E|F)P(F) + P(E|Fc)(1-P(F))
The P(E) = The weighted average of E occurring if F has occurred and if E occurs and F does not occur
Where E occurs as the consequence of F.
P(E) Expansion using compliments
P(E) = P(EF) + P(EFC))
Probability of E = Prob of E and F + Probability of E and not F
P(E|F) = ?
P(E|F) = P(EF)/(P(F)
Probability of E occurring given F has occurred = the probability E and F occur divided by the probability F occurs
P(E|Fc) = ? (expand)
P(E|FC) = P(EFC)/P(Fc)
This is says that the probability of E occurring given that F does NOT occur equals the probability of E occurring and F not occurring divided by the probability F does not occur.
Permutations
This is a specific arrangement of a set of objects where the total number of permutations available to a subset of things is equal to n! where n is the total number of things in the subset
r meaning
If the slope relating y and x is <0 then r <0 and vice versa. the absolute value of r indicates the linearity of the relationship
If r is for (x, y) where w = a + bx and z = c + dy then
r(x,y) = r(w,z)
Sample 100p percentile
The data point equal to where less than 100*p% of data lies. It includes that data point
p=probability as a decimal
Sample Space
S = sample space = all possible outcomes to some experiment. This can be both discrete or nominal data. The subset of data is the event
ex: an experiment predicting the gender of children
S = {g,b} and E={g} F={b}
Sample spaces with equally likely outcomes
This refers to a sample space where each outcome has an equal probability of occurring, aka there is no weight to a particular outcome
In this scenario P(E) = 1/N = p
These sample spaces have a total number of outcomes given by n! where n is the number of objects in the sample space and n! gives the total number of unique combinations of these objects. If there is a number of experiments, m, each with n number of outcomes then the total number is m*n