new Flashcards
Associative Law for Union and Intersection
(EUF)UG = EU(FUG)
E or F or G = E or F or G
EF(G) = (E)FG
E and F and G = E and F and G
Chubyshovs inequality
If we are trying to identify how much of a dataset lies between the values of x̄ +-ks where s = standard deviation and k = some number then
% min = 100(1- 1/(k2))
k>1 otherwise the probability = 0
Communitive law for union/intersection
E U F = F U E
Event E or F = Event F or E
EF=FE
Event E and F = Events F and E
Compliment
If we have an event, E, within the sample space, S then Ec = compliment of E and includes everything that is not E
Correlation Coefficient
r = [Σ (xi - x̄)( yi - ȳ)]/(n-1)sxsy = [Σ (xi - x̄)( yi - ȳ)]/[Σ (xi - x̄)2( yi - ȳ)2].5
This says that if we have a paired dataset such that xi,yi are the pairs and are described by their respective means such that y = mx + b then this statistic will indicate the linearity of the pairs of data
Cumulative Frequency
This shows the bins as a function of an additive frequency.
These are also called Ogives
Demorgans Laws
(EUF)C = ECFC
E or F do not occur = E not occurring and F not occurring
(EF)C = EC U FC
E and F not occurring = E not occurring or F not occurring
Gini Coefficient
The gini coefficient (G) is the integral of the area between L(p) = 1 and the Lorenz Curve. It has a maximum value of .5 and a minimum value of 0
G=1-2B where B = area under Lorenze curve, L(p)
How to use the weighted probability of E?
If tasked with finding P(F|E) where E is the second event
Then P(F|E) = P(FE)/(P(E)
where P(FE) = P(F)P(E|F) and P(E) = P(F)P(E|F) + P(Fc)P(E|Fc)
Independent events
If P(E|F) = P(E) then E and F are independent and E is not a function of event F
and P(EFG)=P(E)P(F)P(G)
These occur when two events occur at the same time or in distinct independent localles.
Ex: Covid cases in Mexico and Greenland on some day, two balls being selected from a jar at once (if not at the same time their chances are not independent)
Lorenz Curve
This is a cumulative curve showing the income distribution
mean
x bar = Σx/n = Σ v*f/n
where v = bin value and f = frequency
Mean vs. Median when to use
Generally mean gives a better understanding of the dataset in terms of describing the data. The median should be used when probabilities are involved and/or the value is being used to understand the order of a group.
Ex: Housing. The mean income would be best for determining what the average person in an area can spend on a home but if we want to design housing where we could expect 50% of the population could live (P(Affordable)=.5) then the median is more useful.
Median
This is the middle value of a sample when data is arranged from least to greatest
If n is odd then the median value occurs at n = (n+1)/2
If n is even then the median is the average of (n/2)+1 and n/2
number of unique groups in a set
If we have a collection of n = # of objects and we want to know how many unique combinations of size r can be made when order matters
= n!/[(n-r)!r!]
This says that for a sample size=n that we can arrange “r” elements this many ways uniquely. If we have a sample space this will define the number of potential outcomes that are possible.
If we want to know the possibility of a subset occurance within a group this will be given by (# of combinations in subset 1)*(#comb in 2)/(total # comb)
P(E) = ? as a weighted average
P(E) = P(E|F)P(F) + P(E|Fc)(1-P(F))
The P(E) = The weighted average of E occurring if F has occurred and if E occurs and F does not occur
Where E occurs as the consequence of F.
P(E) Expansion using compliments
P(E) = P(EF) + P(EFC))
Probability of E = Prob of E and F + Probability of E and not F
P(E|F) = ?
P(E|F) = P(EF)/(P(F)
Probability of E occurring given F has occurred = the probability E and F occur divided by the probability F occurs
P(E|Fc) = ? (expand)
P(E|FC) = P(EFC)/P(Fc)
This is says that the probability of E occurring given that F does NOT occur equals the probability of E occurring and F not occurring divided by the probability F does not occur.
Permutations
This is a specific arrangement of a set of objects where the total number of permutations available to a subset of things is equal to n! where n is the total number of things in the subset
r meaning
If the slope relating y and x is <0 then r <0 and vice versa. the absolute value of r indicates the linearity of the relationship
If r is for (x, y) where w = a + bx and z = c + dy then
r(x,y) = r(w,z)
Sample 100p percentile
The data point equal to where less than 100*p% of data lies. It includes that data point
p=probability as a decimal
Sample Space
S = sample space = all possible outcomes to some experiment. This can be both discrete or nominal data. The subset of data is the event
ex: an experiment predicting the gender of children
S = {g,b} and E={g} F={b}
Sample spaces with equally likely outcomes
This refers to a sample space where each outcome has an equal probability of occurring, aka there is no weight to a particular outcome
In this scenario P(E) = 1/N = p
These sample spaces have a total number of outcomes given by n! where n is the number of objects in the sample space and n! gives the total number of unique combinations of these objects. If there is a number of experiments, m, each with n number of outcomes then the total number is m*n
Three axioms of Probability
0
2: P(S) = 1
3: P(Uin Ei) =Σin P(Ei) = P(E1) + P(E2) +… +P(En)
This is assuming that Ei and Ei+1 are mutually exclusive and says that if this is true that the probability of one of the events occurring is equal to the summation of each individual events.
Union
E U F = E or F which means that any outcomes within either subset or event E or F are valid.
When is conditional probability particularly useful?
It is used when there is limited information within a problem (you are attempting to derive the probability of an event based on other events)
or
It is the easiest way to find the probability of a cause or input to an event with new information (backwards reasoning)
P(E or Ec) = ?
P(E or Ec) = P(S) = 1
P(E or F) if E and find are not mutually exclusive
P(E or F) = P(E) + P(F) -P(EF)
Odds of an event
The odds of an event occuring is the ratio of the event occuring to not occuring
odds = P(E)/P(Ec)
Mass density function
This is the probability of a function being equal to or less than a given value hence it is the integral from the lower bound to some value.
density function
this is a function which describes the probability of a random variable equalling a specified value. The integral of the density function is the mass function.
If given a joint density function and asked to find P(x>y) what is the procedure?
the P(x>y)=∫ ∫ 0x f dy dx
This says that the probability of y being less than x is equal to the mass probability function of y=x where the bounds on x are negative and positive infinity
Procedure for finding the joint probability function for discrete events?
- ID variables
- ID if independent (if independent then the probabilities can be separately evaluated and then multiplied using “and” statements)
- Logic through the first few probabilities
- Try to identify a pattern that can be used for combination/permutation notation
- If possible create an equation. If not, work through each probability.
How to use combinations to find the discrete chance of an event occurring?
P of an event occuring is equal to the combination of the number of variables within the scenerio into the number of anticipated responses. This is divided by the total number of options per event.
In other words this is the total number of unique combinations of events divided by the number of those that meet the criterion divided by total events. This results in the number of events that meet the criterion specific over the total number of events or the probability.
Random Variable
A random variable is a variable that does not describe a discrete event but the meaning of an event. It is a numerical value which is derived from the result of an experiment.
Expectation formula
The E[X] = ∫ P(X=x)*X
This is the weighted average of the variable’s values