new Flashcards
Associative Law for Union and Intersection
(EUF)UG = EU(FUG)
E or F or G = E or F or G
EF(G) = (E)FG
E and F and G = E and F and G
Chubyshovs inequality
If we are trying to identify how much of a dataset lies between the values of x̄ +-ks where s = standard deviation and k = some number then
% min = 100(1- 1/(k2))
k>1 otherwise the probability = 0
Communitive law for union/intersection
E U F = F U E
Event E or F = Event F or E
EF=FE
Event E and F = Events F and E
Compliment
If we have an event, E, within the sample space, S then Ec = compliment of E and includes everything that is not E
Correlation Coefficient
r = [Σ (xi - x̄)( yi - ȳ)]/(n-1)sxsy = [Σ (xi - x̄)( yi - ȳ)]/[Σ (xi - x̄)2( yi - ȳ)2].5
This says that if we have a paired dataset such that xi,yi are the pairs and are described by their respective means such that y = mx + b then this statistic will indicate the linearity of the pairs of data
Cumulative Frequency
This shows the bins as a function of an additive frequency.
These are also called Ogives
Demorgans Laws
(EUF)C = ECFC
E or F do not occur = E not occurring and F not occurring
(EF)C = EC U FC
E and F not occurring = E not occurring or F not occurring
Gini Coefficient
The gini coefficient (G) is the integral of the area between L(p) = 1 and the Lorenz Curve. It has a maximum value of .5 and a minimum value of 0
G=1-2B where B = area under Lorenze curve, L(p)
How to use the weighted probability of E?
If tasked with finding P(F|E) where E is the second event
Then P(F|E) = P(FE)/(P(E)
where P(FE) = P(F)P(E|F) and P(E) = P(F)P(E|F) + P(Fc)P(E|Fc)
Independent events
If P(E|F) = P(E) then E and F are independent and E is not a function of event F
and P(EFG)=P(E)P(F)P(G)
These occur when two events occur at the same time or in distinct independent localles.
Ex: Covid cases in Mexico and Greenland on some day, two balls being selected from a jar at once (if not at the same time their chances are not independent)
Lorenz Curve
This is a cumulative curve showing the income distribution
mean
x bar = Σx/n = Σ v*f/n
where v = bin value and f = frequency
Mean vs. Median when to use
Generally mean gives a better understanding of the dataset in terms of describing the data. The median should be used when probabilities are involved and/or the value is being used to understand the order of a group.
Ex: Housing. The mean income would be best for determining what the average person in an area can spend on a home but if we want to design housing where we could expect 50% of the population could live (P(Affordable)=.5) then the median is more useful.
Median
This is the middle value of a sample when data is arranged from least to greatest
If n is odd then the median value occurs at n = (n+1)/2
If n is even then the median is the average of (n/2)+1 and n/2
number of unique groups in a set
If we have a collection of n = # of objects and we want to know how many unique combinations of size r can be made when order matters
= n!/[(n-r)!r!]
This says that for a sample size=n that we can arrange “r” elements this many ways uniquely. If we have a sample space this will define the number of potential outcomes that are possible.
If we want to know the possibility of a subset occurance within a group this will be given by (# of combinations in subset 1)*(#comb in 2)/(total # comb)