Econometrics Final Flashcards
Measures of Central Tendency + Advantages+Limitations
Info on the center/average of the data values
Mean (Most Commonly Used Unless Outliers Exist): arithmetic average, sum divided by number, affected by extreme values (outliers) used
* For a population of N values mean=x1+x2+…+xNN=Population Values/Population Size
* For a sample size (n) mean=x1+x2+…+xn/n=Observed Values/Sample Size
* population mean & sample mean aren’t equal as sample size varies
Median: midpoint of ranked values, 50% above & below, not affected by extreme values (outliers)
Median and Mode put together is useful to visualize distribution (ex.Skewed)
Mode: most frequently observed value not affected by extreme values (outliers) used for discrete, numerical or categorical data (none, one or many)
Sample Size vs Population Size
Sample size is a sample of the population used to generalize the entire population. Data drawn from this sample size is not enough info on random assignment and size of sample to understand certainty of stats.
Population size accounts for every single person in population.
Skew of graph if Mean < Median & Mean > Median
Left skewed & Right skewed
Geometric Mean Vs Geometric Mean Rate of Return
GM=(X_1 x X_2 x … x X_n)^(1/n)
GMRR=(X_1 x X_2 x … x X_n)^(1/n)-1
Suppose you invested $100 in stocks and, after 5 years, the value of stocks becomes $125 worth. What is the average annual compound rate of returns?
$100(1+r)^5=125
r=4.6%
Summation Operator + 5 Properties
i=1nEx_i=x_1+x_2+…+x_n sum of a sequence of numbers {x1,x2,…,xn}
- Common factor/coefficient can be factored out i=1ncxi=cx1+cx2+…+cxn=c(x1+x2+…+xn)=ci=1nxi
- If xi=1 then, i=1nc=ci=1n1=c(1+1…+1)=cn
- Addition/Subtraction in pattern rule can be split into individual summationsi=1Enxi+yi=i-1nxi+i-1nyi=(x1+y1)+(x2+y2)…(xn+yn)=(x1+x2+…xn)+(y1+y2+…+yn)
-
Double Summations: i=1nj=1mxiyj=i=1nxij=1myj
i=12j=12xiyj=i=12xij=12yj=i=12(xiy1+xiy2)=(x1y1+x1y2)+(x2y1+x2y2) -
Sum of sequence subtract mean:i=1En(x_i-mean of x)=0
(x1-x)+(x2-x)+(x3-x)+…+(xn-x)
(x1+x2+x3+xn)-nx
i=1nxi-n(i=1nxi)n=i=1nxi-i=1nxi=0
What increases the Certainty/Confidence/Accuracy of Statistical Test
Size: larger sample size, more representative of population distribution
Random Assignment: no systematic confounding variable, more representative of population distribution
2-Sample T-Test
Rejects and supports a hypothesis simultaneously by a statistical test
Variability + Ways to Measure
Info on the spread/variability/distribution of the data values
1. Range: difference between largest & smallest observations=largest x-smalles x
* D: Ignores distribution & sensitive to outliers
2. Interquartile Range: midspread, middle 50%, difference between the 75th and 25th percentiles x_75%-x_25%
3. Variance: dispersion of data points from the mean on average, weighted average distance squared b/w data point & mean. (E (x_i-E[X]))/N vs n-1
* A: each value in data set accounted for as and their weight, avoids -ve data points canceling out
* D: units uninterpretable
4. Standard Deviation: variation about the mean with same units as original data, most common = square root of variance
* D: hard to compare 2+ different datasets with different units, no sense of spread
5. Coefficient of Variation measures variation relative to mean to compare 2+ sets of data in different units as they cancel out and becomes a unit free measure = (standard deviation/mean) x 100%
6. Empirical Rule: without plotting gives lots of info on where the majority of the data distribution is, if data distribution is approximated by normal distribution then the interval
*E[X] +/- 1standard deviation contains 68% of the values in the data set
* E[X] +/- 2standard deviation contains 95% of the values in the data set
* E[X] +/-3standard deviation contains 99.7% of values in the data set
7. Weighted Mean: x=i=1En wixi+x2w2+wnxn,w=weight of ith observation, for data paired into n classes, all weight sums to 100%=1
8. Covariance how dependent to each other, direction of linear relationship b/w 2 variables, sign matters, 0 means unrelated linearly, +ve move in same direction, -ve move in different direction= weighted average of product of x & y from their respective means (-inf,+inf). Cov(x,y)=xy=i=1N(xi-x)(yi-y)/Nor(n-1)
* D: units are meaningless, uninterpretable
9. Coefficient of Correlation: relative strength and direction of linear relationship b/w 2 variables with different units, unit free, deviation from y=x, stronger correlation means data points are close to the line
Sign depends on covariance since standard deviation is always positive (-1,1)
=Cov(x,y)/SxSy
Compare coefficient of Variation:
Stock A: Avg Price=$50, SD=$5
Stock B: Avg Price=$100, SD=$5
A: CV=(550)100%=10%
B: CV=(5100)100%=5%
Both have the same standard of deviation, however Stock B is less variable relative to its price.
Avg stock price $800, standard deviation $100, what interval will 95% of stock price be in?
mean +/- 2standard deviation contains 95% of the values in the data set
(800-2(100),800+2(100))
(600,1000)
Calculate final grade given Exam(45%)=70%, Participation(30%)=90%, Iclicker(5%)=0, Quiz(20%)=100%
Final Grade=
i=14wixi=20+0+27+31.5=78.5
Illustrate a correlation of r= -1, -0.6, 0, 1, 0.3
A1: Given (x_1,y_1)=(11,52) (x_2,y_2)=(13,72) (x_1,y_1)=(15,62) calcluate a)Sample Variance b)Sampe Covariance c)Sample correlation coefficient
a.4
b.10
c.1/2
A1: What will be the price range of 95% of stock given avg price=$650 & standard deviation=$100
450-850
A1: n=5, x_i{1,2,3,4,5} a)sum b)mean c)variance d)sum of x_i-x mean
a.15
b.3
c.2.5
d.0
A1: prove using summation operator a)Eax=aEx b)E(x+y)=Ex+Ey
c)E(ax+by)=aEx+bEy d)EEabxy=abExEy
See doc
Probability
A set of outcomes whose likelihood is defined by a function, relative frequency of an outcome occurring when random experiment repeated infinitely many times (Formal Definition: a function from the space of sets to the space of real values between 0 and 1)
RV + Basic Outcome + Sample Space + Event
Random Experiment: a process leading to an uncertain outcome (ex.Dice, coin flip)
Basic Outcome: a possible outcome of a random experiment (ex. 1,2,3,4,5,6)
Sample Space: collection of all possible outcomes of a random experiment (ex. S={1,2,3,4,5,6})
Event: any subset of basic outcomes from the sample space (ex. Let A be the event “Number rolled even”, then A={2,4,6}), if outcome of experiment is in A, then event A has occurred
Outcomes of rolling 2 dice. a)Identical Dice b)Diff dice
a)36
b)21
6 Types of Probability Set Relationships + Draw
- **Empty Set **, no element is in it, defines **Mutually Exclusive **AB=
- Subset AcB any element of A is also in B so AUB=B and AnB=A
-
Intersection of Events (n): set of all outcomes that belong to A & B in S, if A & B are events in a sample space S
1.** Union of Events** (u): set of all outcomes that belong to all of A & B in S, if A & B are events in a sample space S - Complement A (hat A): set of all basic outcomes that don’t belong to A in S, so that S=hatA+A
- Collectively Exhaustive: collection of events completely cover S, AuB=S
4 Properties of Set Operations + Draw
- Commutative (Order): AuB=BuA
- Associative (Order of Multiplication): (AuB)uC=Au(CuB)
- Distributive Law: An(BuC)=(AnB)u(AnC)
- De Morgan’s Law: hat(AuB)=hatAnhatB and hat(AnB)=hatAuhatB
Given S={1,2,3,4,5,6} A={2,4,6},B={4,5,6},C={4,5} find complements, intersections, unions, subset
See doc
Probability as Relative Frequency
P(A)=limn>inf n_A/n=# of events in population that satisfy A/total # of events in population
Repeating the experiment n approaching times, counting the number of times event A occurred as nA gives the ratio/relative frequency of event A occurring
Factorial & Combination & Permutation Formula
Factorial Formula: n!, number ways to order n objects
- How many ways to order n=#=8 runners in a sequence
Combination Formula: Ckn=(n!(n-k)!)k!=n!/k!(n-k)! number of unordered ways in which k objects can be selected from n objects
- How many ways to pick 3 (k=3) out of 8 (n=4) runners, who gets a medal
- True Combination lock accepts 1-2-3, 2-1-3, 3-2-1
- Has less outcomes, groupings < orders for each grouping
Permutation Formula: Pkn=n!/(n-k)!, n=total, k=limited spots = Total # of groupings/Limited # of orders
- How many ways to pick k=3=1st,2nd,3rd places, who gets what medal
- True Permutation lock only accepts 1-2-3
- Has more outcomes, orders for each grouping > groupings
Q. 5 Candidates, 2 positions, 3 men, 2 women, every candidate likely to be chosen, probability that no women will be hired:
- Total # of combinations: C25=5!2!(5-2)!=10
- Total combinations that only men are hired: C23=3!/2!(3-2)!=3
- Probability=# of events in population that satisfy A/total # of events in population=3/10=30%
Probablity as a Set Function + 3 Properties
Probability as a Set Function: real-valued set function P that assigns to each event A in the sample space S, a number P(A) that satisfies the following 3 properties:
1. Always positive
2. P(S)=100%=1=probability of all outcomes
3. Mutually exclusive events probability is the sum of each (addition rule)P(A1uA2u…uAk)=P(A_1)+P(A_2)+…+P(A_k) or P(AuB)=P(A)+P(B)
5 Probabiltiy Rules + Draw
- Complement Rule: P(A)=1-P(A), 1=P(A)+P(A)
- Addition Rule: P(AuB)=P(A)+P(B)-P(AnB) draw diagram see notes
P((AuB)(AuhatB))=P(AuB)+P(AuhatB)
Mutually Exclusive Addition Rule: P(AuB)=P(A)+P(B)
For any A & B Addition Rule: P(AuB)=P(A)+P(B)-1 - P(empty)=0
- If AcB, P(A)<P(B)
- P(AnB)>=P(A)+P(B)-1
- P(AuhatA)=1
- P(hatA|hatA)=1
Draw Probability Table + Table of Cards AcevsnonAce + Table of P(A)=P(AnB)+P(AnhatB)
See doc
Conditional Probability & Multiplication Rule
Conditional Probability: probability of one event A, given another event B is true/has occurredB new total sample space for A within B to be contained in
P(A|B)=P(AnB)/P(B)=# of A that satisfy spacetotal # of events in space
P(B|A)=P(AnB)/P(A)=# of B that satisfy space/total # of events in space
** Multiplication Rule**: rearranging conditional probability P(AnB)=P(A|B)P(B) or P(AnB)=P(B|A)P(A)
Outcome is an even number. What is the probability of having rolled a 6
S={1,2,3,4,5,6}, A{2,4,6}, B{6},P(A|B)=P(AB)P(B)=P(16)/P(2,4,6)=16/12=1/3
Probability that at least one die is equal to 2 when the sum of two numbers is less than or equal to 3
S={1…61…6}, A{2}, B{(1,1),(1,2),(2,1)},P(A|B)=P(AnB)/P(B)=2/36/3/36=1/18/1/12=2/3
Basic Outcomes=36, A{(2,xi)…(yi,2)},B{(1,1)(1,2)(2,1)},P(A|B)=2/36,P(B)=3/36
Probability of getting a red ace using multiplication rule.
+
Does P(A)=P(AnB)+P(AnhatB)=P(A|B)P(B)+P(A|hatB)P(hatB)
P(RednAce)=P(Red|Ace)P(Ace) =2/4 4/52=2/52
Statistical Independent:
Not correlated if either is true, both probability are unaffected (ex.Shape of coin vs Flipping heads)
P(AnB)=P(A)P(B)
P(A|B)=P(A) because the condition of B has no effect on probability of A
P(B|A)=P(B) because the condition of A has no effect on probability of B
A{2,4,6}, B{1,2,3,4} Statistically independent?
Yes cause
P(AnB)=P(A)P(B)
26=3/6 4/6=26
Bivariate Probabilities & Joint Distribution of X & Y & Marginal Probabilities
Draw Table + Diagram
Bivariate Probabilities: probabilities (A & B) that a certain event will occur when there are two independent random variables in your scenario
Joint Distribution of X{xi} & Y{yi}: described by bivariate probabilities
Marginal Probabilities: the probability of a single event occurring, independent of other events
(Ai,Bi) mutually exclusive Bi collectively exhaustiveP(A)=P(AnB1)+P(AnB2)+…+P(AnBk)
See doc for table & diagram
Difference b/w Joint Probability, Marginal Probability & Conditional Probability
Joint probability is the probability of two events occurring simultaneously.
Marginal probability is the probability of an event irrespective of the outcome of another variable.
Conditional probability is the probability of one event occurring in the presence of a second event that has occured.
Total Law of Probability + Draw
Total Law of Probability: Bi mutually exclusive & exhaustive events partitions A into k number mutually exclusive & exhaustive events such that
A=(AnB1)(AnB2)…(AnBk) therefore using addition rule:P(A)=P(AnB1)+P(AnB2)+…+P(AnBk)=kEi=1 P(AnBi)
If Bi mutually exclusive & collectively exhaustive(BiBj=, S=B1B2…Bk) subbing in multiplication rule → kE i=1 P(A|Bi)P(Bi) for any A
Bayes’ Theorem + Proof
Bayes’ Theorem: combines all previous concepts into one expression, how old info new info changes probability
P(B|A)=P(AB)/P(A)=P(AB)/P(AB)+P(AB)=P(A|B)P(B)/P(A)=P(A|B)P(B)/P(A|B)P(B)+P(A|B)P(B)
Proof:
1. Conditional Probability: P(B|A)=P(AB)P(A)
1. **Sub in Multiplication Rule **P(AB)=P(A|B)P(B) P(B|A)=P(A|B)P(B)P(A)
1. Mutually exclusive & collectively exhaustive B & hatB P(A)=P(AB)+P(AB) since =(AB)(AB), B=(A ∩ B) ∪ (A ∩ B¯)
P(B|A)=P(A|B)P(B)/P(AnB)+P(AnhatB)
1. Sub in Multiplication Rule P(AB)=P(A|B)P(B) & P(AB)=P(A|B)P(B)
General Theorem: see doc P(Bi|A)=P(A|Bi)P(Bi)P(A|B1)P(B1)+…+P(A|Bk)P(Bk)=P(A|Bi)P(Bi)i=1kP(A|Bi)P(Bi)
1. Conditional Probability: P(B|A)=P(AB)iP(A)
1. Total Law of Probability: P(A)=i=1kP(A|Bi)P(Bi) P(Bi|A)=P(ABi)i=1kP(A|Bi)P(Bi)
1. Multiplication Rule: sub in P(ABi)=P(A|Bi)P(Bi)
P(Bi|A)=P(A|Bi)P(Bi)i=1kP(A|Bi)P(Bi)
Your probability of having covid antibody (
B) if 10% of population has covid antibody (P(B)=10%) & your test is positive (A is true)
True Positive
P(A|B)=97.5%, if you have (B) covid antibody → probability of positive test (A) is 97.5%
False Positive P(A|B)=12.5%, if you don’t have (no B=B) → probability of positive test (A) is 12.5%
P(B|A)=?
P(A|B)=97.5%
P(A|B)=12.5%,
P(B)=10%
P(B)=90%
P(B|A)=P(A|B)P(B)P(A|B)P(B)+P(A|B)P(B)=(97.5%)(10%)(97.5%)(10%)+(12.5%)(90%)=46.4%
A2:
a)Prove that AcB, then P(A)<=P(B)
b) For any A & B, P(AnB)>=P(A)+P(B)-1
a)Addition rule, mutually exclusive, Anb=A, first property of probability P(hatAnB)>=0
b)Cancel out, divide both sides, it becomes a true statement/property
See doc
A2: Given P()\A=0.3 P(B|A)=0.6 P(B|hatA) find P(hatA|hatB)
Find elements of P(hatA|hatB)=P(hatAnhatB)/P(hatB)
or
P(A|B)=P(BA)P(B)P(A|B)=P(BA)P(B)=(1-P(B|A))P(A)P(B)=(1-P(B|A))(1-P(A))1-P(B)=(1-0.6)(1-0.3)1-0.6=0.7 see doc
0.7
A2: 8 candidates, 2 jobs, 4 women, 4 men, 1 set of brothers
a) Total combinations where only men are hired
b) Total combinations of where brothers are hired
c) Total combinations of where only men and only brothers are hired
a)(C 4 2)/(C 8/2)=only men combos/total combos=3/14
b)1/28
c) 1/28 since B c A AnB=B
A2: See 5 6 7 in doc
See doc
Random Variable
Random Variable (X): a function which maps outcome of an experiment s to X, represents a possible numeral value from a random experiment
Discrete: with limited countable outcomes (Dice, coin)
P(XA)=XAP(X=x)=P(X=0)+P(X=1)+…+P(X=n), X=RV, x=constant
Continuous: with infinite outcomes (Height)
Space of X: {x:X(s)=x,sS}=Sx
Probability Mass Function vs Cumulative Distribution Function
a) Flip 2 coints X=# of heads
b)Rolling dice X=# of dice
Probability Mass Function f_x(x)=P(X=x)
Discrete Properties:
1. Always positive be/w 0-1
1. XSfx(x)=P(XA)=1=100%
2. If results independent, you can sum up probabilities
f_x(x)=1/4 if x=0 …
1/2 if x=1
1/4 if x=2
Cumulative Distribution Function F(x_0)=P(X<=x_0)=XEX<=0 f_x(x)
F(x_0)= 1/6 if x=0
2/6 if x=1 …
…
6/6 if x=6
Expected Value
Q. 2 Coins + Rolling dice
E(X)=Ex (xfx(x))
Two Coins Expected Value E(X)=0(14)+1(12)+2(14)=1
Rolling Die Expected Value fx(i)=P(X=i)=1/6
E(x)=i=16i(16)=1(16)+2(16)+3(16)+4(16)+5(16)+6(16)=3.5
Variance + Standard Deviation
Q. 2 Coins + Rolling dice
Variance: (sigma)^2=E(X-E[X])^2=Ex (x-E[X])^2f_x(x) measure of spread/distance^2 from mean, uninterpretable units but prevents cancelling
Standard Deviation: =2=E(X-)2=x(x-)2fx(x) measure of spread/distance from mean, original units interpretable
2 Coins =(0-1)2(14)+(1-1)2(12)+(2-1)2(14)=0.707
Functions of Discrete Random Variables
Q. 1 coin X=1 heads, X=0 tail, g(1)=100, g(0)=0 Find expected value
E[g(x)]=xEg(x)f_x(x)
E[g(X)]=xxfx(x)=g(0)(0.5)+g(1)(0.5)=0+50=50
Bernoulli Probability Distribution vs Binomial Probability DIstribution
Bernoulli Probability Distribution: random variables with only 2 possibilities
Binomial Distribution: sequence of n independent Bernoulli Random Variables/multiple sets of 2 possibility random variables
Y=i=1EnXi P(Y=y)= probability of y=# of successes in n=sample size trials with p=probability of success on each trial
Bernoulli PMF, Cases, Mean, Variance, SD
Probability Mass Function: f(x)=px(1-p)1-x X has Bernoulli distribution
X=0,1
P(X=x)=P(X=1)+P(X=0)=1
P(X=1)=p
P(X=0)=1-p
Cases Options^Sets=2(options 0,1)3(sets/votes)
Power of p & 1-p tell how many successes & failures occur, sums up to the total # of cases
Mean =p=E(X)=X=0,1xpx(1-p)1-x=0(1-p)+1(p)
If X=0, weight=P(0)=1-p
If X=1, weight=P(1)=p
Variance 2=p(1-p)=E[(X-)2]=X=0,1(x-)2px(1-p)1-x=(0-p)2(1-p)+(0-1-p)2p
If X=0, distance from mean=(0-p)2, weight=P(0)=1-p, E[0]=p
If X=1, distance from mean=(1-p)2, weight=P(0)=p, E[1]=p
Standard Deviation =(p(1-p))^(1/2)=Var[X]^1/2
Y=X_1+X_2, Independent RVs
a) Bernoulli Probability Mass Function of Y?
b) Expectation & Variance of Y
c) Expectation of Y conditional on X1
d) Expectation of X1 conditional on Y
a.p^2
b.2p(1-p)
c.1+p
d.1/2
Binomial Distribution PMF+derivation, Mean, Variance, SD
Average Bernoulli PMF, Mean+Proof, Variance+Proof, SD
- Probability Mass Function of Binomial Distribution: P(Y=y)=n!/y!(n-y)! p^y(1-p)^n-y
- Mean =E(Y)=E(i=1nXi)=i=1nE(Xi)=E(X1)+E(X2)+…+E(Xp)=p+p+…+p=np
- Variance 2=Var(Xi)=Var(X1)+Var(X2)+…+Var(Xn)=p(1-p)+…+p(1-p)=np(1-p)
- Standard Deviation =np(1-p)
- Average of n Independent Bernoulli Random Variable X=1ni=1nXi=1nY=1n(X1+X2+…+Xn)
- Expected Value of Avg n Independent Bernoulli Random Variable = Avg of Sample Means = Good estimator of Population Fractions E(X)=E(X1n)=E(X2n)+E(X3n)+…+E(Xnn)=npn=p
- Variance of Avg n Independent Bernoulli Random Variable: np(1-p)
Prove p|(X,a+bX)|=1
See doc
Prove p|(X,X-E[X]/Var(X))|=1
See doc
Prove E[X-E[X]/Var(X)]=0 Var[X-E[X]/Var(X)]=1
See doc
Prove Prove X & Y are uncorrelated & mean independent if they are stochastically independent
- f(x,y)=fx(x)fy(y) or pijXY=piXpjY
- EY|X[Y|X]= yyf(x,y)fx(x)=yyf(x)f(y)fx(x) =EY[Y]
- Cov(X,Y)=0 –> 0/sigmaXsigmaY=0
Prove using summation operator Cov(X,Y)=E[XY]-E[X]E[Y]
See doc
Prove Law of Iterated Expectation
See doc
Var[(X1+X2)/2] if X1 & X2 are Stochastically Independent
=p(1-p)/2
Prove Var(ah(X)+bg(Y))=a2Var(X)+b2Var(Y)+2abCov(X,Y)
See doc
Stochastic vs Mean vs Uncorrelatdness Independence
- Stochastically Independent: captures conditional dependency based on mean & variance, does one thing occurring have an impact on the other’s mean & spread
f(x,y)=fx(x)fy(y) - Mean Independent: captures conditional dependency based on mean only, does one thing occurring have an impact on the other’s mean
EX|Y[X|Y]=EX[X] or EY|X[Y|X]=EY[Y] - Uncorrelated: captures linear(direction+spread) relations only
Cov(X,Y)XY=0
Prove that when X & Y are independent, for any function g(x) and h(y) Cov(g(X),h(Y))=0 always holds
See doc
Prove Var[a+bX]=b^2Var[X]
see doc
Prove Cov(a_1+b_1X,a_2+b_2Y)=b_1b_2Cov(X,Y)
see doc
Prove E[a+bg(X)]=a+bE[g(X)]
see doc
Flip coin 4 times Y=# of heads. P(Y=2), n=4, p=0.5. Probability of getting 2 heads.
C(yn)n!y!(n-y)!py(1-p)n-y=C(24)4!2!(4-2)!(0.5)2(1-(0.5))4-2=3/8
Winning one game p=0.5, Y=# of games won,
* a) Probability of winning all 5 games
* b) Probability of winning majority of games
* c) If won first game, probability they will win majority of the five games=win 2 out of 4 games left
- a.P(Y=y)=(55)n!y!(n-y)!py(1-p)n-y=1/32
- b.P(Y3)=P(Y=3)+P(Y=4)+P(Y=5)C(35)(0.5)2(1-(0.5))4-3+C(45)(0.5)2(1-(0.5))5-4+1/32=1/2
- c. P(W2)=P(W=2)+P(W=3)+P(W=4)=C(24)(0.5)2(1-(0.5))4-2+..+C(44)(1-(0.5))4-4=11/16
Joint Probability + Mass Function
Marginal Probabilities + Stochastic Independence + Law of Iterated Expectations
Draw Table
- Joint Probability Mass Function: P(XY)=f(x,y)=P(X=x,Y=y) a function that expresses probability that X=x & simultaneously Y=y
- Marginal Probabilities: function that expresses probability of an event irrespective of outcome of another variable integrated over all possible values of other variable(s) P(X=x)=fX(x)=yf(x,y), P(Y=y)=fY(x,y)=xf(x,y)
- Stochastic Independence: all pairs of x & y must satisfy f(x,y)=fX(x)fY(y) or all random variable must satisfy f(x1,x2,…,xk)=fX1(x1)fx2(x2)…fXk(xk). Derived from Statistically Independent P(AB)=P(A)P(B)=P(A|B)P(B|A)
- Law of Iterated Expectations EX[EY|X[Y|X]]=EY[Y] all cases of X gives all cases of Y
Page 24 Table Questions
See doc
Conditional PMF, Conditional Mean, Conditional Variance
-
Conditional Probability Mass Functions: function that expresses probability of a probability mass fY|X(y) or fX|Y(x) given a marginal probability X=x or Y=y has occurred
fY|X(y)=f(x,y)fX(x)=P(XY)P(X) or fX|Y(x)=f(x,y)fY(y)=P(XY)P(Y)
Derived from Conditional Probability P(A|B)=P(AB)P(B) or P(B|A)=P(AB)P(A) -
Conditional Mean: is a random variable because it depends on the realization of , essentially there is an input & an uncertain outputY|X=x=EY|X[Y|X=x]=yyfY|X(y) or X|Y=y=EX|Y[X|Y=y]=xxfX|Y(x)
Essentially: weight f(x)fX|Y(x)=f(x,y)fY(y) conditional expectation function
weight f(y)fY|X(y)=f(x,y)fX(x) conditional expectation function - Conditional Variance: 2Y|X=x=EY|X[(Y-Y|X=x)2|X=x]=y(y-Y|X=x)2fY|X(y) or 2X|Y=y=EX|Y[(X-X|Y=y)2|Y=y]=x(x-X|Y=y)2fX|Y(x)
Covariance vs Correlation
Covariance: average product=direction (+/-) of relation b/w 2 random variables, expected value of the product of the spread of X & Y from their mean (magnitude doesn’t tell you anything, unit is uninterpretable)
* Cov(X,Y)=E[(X-X)(Y-Y)]=xy(x-X)(y-Y)f(x,y)
* Cov=0 no linear relation
*Cov>0 positive linear relation
* Cov<0 negative linear relation
Correlation: unitless measurement of the strength (spread+direction) of the linear relation b/w X & Y (-1 to 1), covariance divided by the product of X & Y standard deviation (weaker/large spread/denominator → closer to 0 vs stronger/small spread/denominator → closer to -1/1)
* p=Corr(X,Y)=Cov(X,Y)XY
* p=0 no linear relation
* p>0 positive linear relation (1=perfect positive linear dependency)
*p <0 negative linear relation (-1=perfect negative linear dependency)
Applying Covariance to Investment Table
See Doc
A4: Prove Cov(g(X),Y)=0
See doc
A4: Prove E[(x-b)^2]=E[X^2]-2bE[X]+b^2
See doc
A4: Prove Corr(x,z)=1 & Corr(x+a+bx)=+/-1
See doc
Prove E[X+Y]=E[X]+E[Y]
See doc
Prove Cov(X,c)=0
See doc
Prove Cov(X,X)=Var(X)
See doc
Prove Cov(X,Y)=E[XY]-E[X]E[Y]
See doc
Prove sum(x_i-hatx)=0 and sum(x_ixhatx)(y_i-haty)
See doc
Is E[g(X)]=g(E[X]) always true?
Only equal if g(x) is linear
A3: 3 Tosses of Coin a)PMF, b)Cumulative Function c)E[Y] d)Var(Y)
See doc
A3: Derive Bernoulli a)E(X) & Var(Y) b)PMF c)Stochastically Independent d)E(Y|X_1=1)
A3: 100 Tosses of fair coin binomial PMF function
See doc
A3: 3,5,6 see doc
See doc
Discrete v. Continuous Random Variables
- CDF
- Expectation
- Variance
Continuous Random Variable: variable that assumes any value in an infinite interval/outcomes depending on ability to measure accurately (ex.Thickness, time, height)
- Probability Density Function: probability that X=outcome lies between a & b, differentiating CDF fX(x)=P(axb)=abfX(t)dt=P(X<b)-P(X<a)=FX(b)-FX(a)=dFX(x)dx
- Cumulative Distribution Function: probability that X=outcome doesn’t exceed the value x, integrating PDF FX(x)=P(Xx)=-xfX(t)dt
- Expectation: x=E(X)=aSbxfx(t)dt
- Variance: x2=E[(X-x)2]=aSb(x-E(x))^2fx(t)dt
Discrete Random Variables: variable assumes any value within limited countable outcomes (Dice, coin)
- Probability Mass Function: P(X=x)
- Cumulative Distribution Function: P(X<x_0)=Ef_x(x)
- Expectation: ExP(X=x)
- Variance: E(x-E(X))^2P(X=x)
Distributions
- When to use
- Expectation
- Variance
- Sample –> Unstandardized
- Confidence Interval
Uniform Distribution: probability distribution with equal probabilities for all possible outcomes of the random variable uniformly distributed on [a,b] XU[a,b]
Normal Distribution: approximate probability distributions of wide range of RV in empirical applications XN(,2)
* Bernoulli
* CLT n big/Normal and population variance known
T-Distribution:
* n too small, population variance unknown, can’t be bernoulli
Chi-Square:
* Estimating population variance
Transitioning from Z –><– X
X=mean-Z
Z=X-mean/
SD
Z-N(0,1)
X-N(mean,variance)
Random Variables + Linear Combinations
- Covariance Cov(X,Y)=E[(X-x)(Y-y)]=E(XY)=xy=0 (if X & Y are independent)
- Correlation Cor(X,Y)=Cov(X,Y)x y
- Expectation E[X1+X2+X3+…+Xn]=1+2+3+…+n
- Variance: Var[X1+…+Xn]=21+…+2n+2Cov(X1,X2)+…+2Cov(Xn-1,Xn)=21+…+2n if independent
- Jointly Normally Distributed (independent with identical mean & variance): X=N(,2/n)
- W=aX+bY
- Expectation E[aX+bY]=aX+bY
- Variance: Var[aX+bY]=a22X+b22Y+2abCov(X,Y)=0 if independent
- Jointly Normally Distributed: aX+bYN(w,w)N(aX+bY , a22X+b22Y+2abCov(X,Y) )
Population + Sample + Inferential Stats (Types) + Sampling Distribution
Population: set of all items/individuals/variables of interest
Sample: subset of population observed, less time consuming & costly than census of entire population
Random Sampling: every item in population has equal chance of being selected and are selected independently (thrown back into population and could be drawn again)
Inferential Statistics: making statements about population parameters (unknown) by examining sample results (known)
* Estimation: make a claim about population mean using sample mean/evidence
* Hypothesis Testing: test a claim about population using sample mean/evidence
Sampling Distribution: plots the frequency of all possible sample means, each sample has n=# of items/people, the larger the n the smaller the variance=more accurate to the population mean
Sample Mean, Expectation, Variance, Standard Deviation
See ipad
Central Limit Theorem
If population is not Normal apply Central Limit Theorem: as n increases the distribution converged to normal distribution, sample means from population will be approximately normal as long as sample size is large enough n(Xn-)dN(0,2)
Q. n=14, Var(p)=16, upperlimit? So that probability of exceeding limit is less than 0.05
27.52 See doc
Point Estimator v. Estimate
Point Estimator omega hat of population parameter is a unbiased random variable; function of random sample, realized value of point estimator (random variable) is point estimate
f(X1…Xn)=(X1…Xn)
Unbiased Estimator + Efficient + Consistent with Graph + Words + Example
Unbiased Estimator E()=, mean of estimator=mean of true parameter (E(0)-0=0)
Efficiency: spread of variance, preferably the smaller=more efficient Var(1)<Var(2)
Consistency: point estimator converges in probability to true parameter as sample size increases by law of large numbers
Point & Interval Estimates + Draw
a single number within an interval (range of values) providing info about variability based on observation from 1 sample with limits as functions of sample P(L[X1,…,Xn]«U[X1,…,Xn])=0.95
Confidence Interval, Level, Significance Level
1. Formulas
2. Meaning
Confidence Level 1-alpha[0«1]: percentage of probability that true population parameter is within the interval of values. 95% of time true value will be in interval. 95% of the sample dataset’s point estimate will be within the interval. However, 5% chance true value isn’t in interval & 5% of time sample data set’s point estimate won’t be in interval
P(Point Estimate-Reliability Factor(Standard Error)<True Value<Point Estimate+Reliability Factor(Standard Error))=Confidence Interval
Confidence Interval (0-ME,0+ME): range of values that holds unkown population parameter 1-alpha% of the time
Significance Level (alpha): probability of making an error when the null hypothesis is true
Margin of Error + Formula + How to Reduce
The uncertainty/amount of random sampling error in the results: +/- Z_alpha/2 SD/sqrt(n)=ME\
Reduce ME=Z/2n: reduce population standard deviation , reduce confidence interval (1-alpha), increase sample size n
Population difference Confidence Interval + Derivation <>= 0 & #
See doc
Difference in Differences Confidence Interval
See doc
Hypothesis: types & Alt + Null
Hypothesis (mew or p or var): claim about population parameter
* Population Mean mew
* Population Proportion p
* Population Variance var
Null Hypothesis/Counterfactual (H0): assumption to be tested in population parameter, status quo
Alternative Hypothesis: hypothesis researcher is trying to support, challenges status quo
Hypothesis Testing + Methods/Process
Testing: Assume null is true =,, (innocent until proven guilty), where does sample fall within its probability distribution
1) Find distribution
2) Choose technique depending on info given and parameter of interest
* Z-Test: normal/CLT + known population variance
* T-test: n small or unknown population variance
* Chi-Square: estimating population variance
3) Choose upper/lower/double tail rejection region, compare realized sample with:
* Significance level or Critical value
* P-value
* Confidence Interval
Rejection Region, Critical Value, Significance Level
- Significance Level (alpha=%)
- Critical Value (C=X_C) determined by significance level
- Rejection Region [X=+Zn,]/Range of Values Unstandardized
P-Value + Info Required + Process
P-Value/Observed Level of Significance: probability of getting more extreme test statistic than the realized sample within the null hypothesis H0 true probability distribution, smallest value of that can be rejected
Required Information To Calculate: a realized sample and a distribution even if n=small or distribution is not normal
P(ZZX=X-/n)=p value
P(Z1.96)=5%
Process:
1. Convert sample into test statistic
1. Use Z-table to find P(ZX)=p value
1. Compare p value & significance level
p value>=do not reject outside of rejection region
p value<=reject within rejection region
2 Types of Errors + Graph + Process
Type 1 Error (α) - False Positive: rejecting true H0 which we can never know given just a realized sample therefore there is always a probability that it exists of level of significance=α (ex.1%, 5%, 10%)
Guilty before innocent = Serious, convicting innocent person.
Calculating Type 2 Error: is the significance level chosen %
P(reject H0|true H0)=alpha
Type 2 Error (β) - False Negative: failing to reject false H0 with probability β
P(fail to reject H0|false H0)=β
Innocent before guilty=less serious, letting go guilty person
Calculating Type 2 Error: P(ZXc=X-/n|true)=n=64, =6, =0.05, H0: μ52, H1: μ<52, True μ*=50
1. Calculate Critical Value
1. Standardize Critical Value in terms of true distribution
1. Using Z-table find probability/integral/area under the curve
Error/Power Tradeoff + How to Reduce
Type 1 & 2 Tradeoff: moving rejection region alters the size of error, however cannot decrease both errors at same time, decreasing one will increase the other as Type 1 Error occurs when H0 is true, Type 2 Error occurs when H0 is false
Smaller Rejection Region → smaller type 1 error → larger type 2 error → smaller power
Larger Rejection Region → larger type 1 error → smaller type 2 error → Larger power
Larger Rejection Region: larger type 2, type 1 same, large power (more evident what is true rejection)
n Sample Size/Variance Increase: smaller type 2 error, type 1 error same, larger power
Power + Calculate
Power (1-) - True Positive: probability to successfully rejecting false H0
P(reject H0|false H0)=1-P(fail to reject H0|false H0)=1-β=type 2 error
Power of the test increases, sample size increases
Calculate:
1. Find Critical Value & standardize into Z score
1. Use Z-table to find probability of rejection area created by critical value in true distribution
A4-A7 + Worksheets
lol goodluck bro