Probability And Statistics Equations Flashcards
Axioms of Probability
- P(A)>=0 for all A is a member of F
- P(omega)=1
- If A1, A2, … are members of F and are mutually exclusive P(A1 U A2 U…) = P(A1) + P(A2) + …
Complement rule
P(Ac)= 1- P(A)
Probability of the Union of Two Events Rule
P(AUB) = P(A) + P(B) - P(AmetszetB)
Bounds on Probabilities Rule
P(AUB) =< P(A) + P(B)
Logical Consequence Rule
If B logically entails A, then P(A)>= P(B)
Conditional Probability
P(AIB) = P(AmetszetB)/P(B)
Axioms revised based on conditional probability
- 0=< P(AIB) =< 1
- P(BIB) = 1
- If A1, A2, … are mutually exclusive given B, then P(A1, A2, … IB) = P(A1IB) + P(A2IB)…
Law of Total Probability
P(A) = P(AmetszetB1) + P(AmetszetB2) = P(AIB1)P(B) + P(AIB2)P(B)
Two events are independent if
P(AmetszetB) = P(A)P(B) OR P(AIB)= P(A)
Bayes’ Rule
P(AIB) = P(BIA)P(A)/P(B)
PMF definition
- f(x) = P(X=x)
- f(x) >= 0
- SUMf(x)=1
PMF of N independent Bermoulli (coin toss) variables with parameter p
f(x) = Kp^x(1-p)^N-x
K is a scaling factor independent of N or p
PDF definition
f(x) = lim(h->0) P(x=< X =< x+h)/h
1. f(x) >= 0
2. Integral fx dx = 1
Uniform distribution U(a,b) PDF
If a=<x=<b f(x) = 1/b-a
Otherwise f(x) = 0
Normal Distribution PDF
f(x) = 1/sigma*gyokalatt2pi exp(-1/2 (x-nu)^2/sigma^2)
CDF Definition
F(x) = P(X=<x) for all x in support of X
1. 0=<F(x)=<1
2. F(x) is a non-decreasing function in x
3.P(X>x) = 1 - P(X=<x)
4. P(a<x=<b) = F(b) - F(a)
Uniform Distribution CDF
If x<a F(x)= 0
If x is a member of Ia,bI F(x)=x-a/b-a
If b<x F(x)=1
Connection of CFD and PDF
f(x)= dF(x)/dx
Standardization N(0,1) Standard Normal Distribution
z=(X-nu)/sigma
P(X=<z) in SN Table
Joint CDF
F(x,y) = P(X=<x, Y=<y)
Joint PDF/PMF
Discrete f(x,y) = P(X=x, Y=y)
Continous f(x, y) = d^2F(x,y)/dxdy
Marginal PDF (of x)
f(x) = f(x, whatever the value of y)
If X and Y are independent (relationship between joint PDF and product of marginals)
f(x,y) = f(x)g(x) - joint PDF = product of marginals
F(x,y)= F(x)G(y) - joint CDF = product of marginals
Conditional PDF, CDF
Slices (impact of variation in one variable on the probability of another)
Expected value E[X]
Discrete E[X]=SUMi xif(xi)
Continous E[X]=integralxf(x)dx
Probability weighted sums of possible values of x
Uniform Expected value - U(a,b)
E[X]=(a+b)/2
Normal - Expected Value N(nu, sigma^2)
E[X]=nu
Expected value of a function of a variable
Discrete E[g(X)] = SUMi g(xi)f(xi)
Continuous E[g(X)] = integral g(x)f(x)dx
Affine functions (E[a+bX])
E[a+bX]=a+bE[X]
Addition of expectations g(x) and h(y) is a function of another or the same variable
E[g(x)+h(y)]=E[g(x)] + E[h(y)]
Multiplication of expectations IF INDEPENDENT
E[g(x)h(y)]= E[g(x)]E[h(y)]
Jensen’s inequality
If f is concave E[f(X)] =< f(E[X])
If f is convex E[f(X)] >= f(E[X])
Variance
Var(X) = E([X-E(X)]^2) = E[X^2] - (E[X])^2
Variance with nu (expected value of the variable)
X: discrete Var(X) = SUMi(xi-nu)^2f(x)
X: continous Var(X) = integral (X - nu)^2f(x) dx
Standard Deviation
SD = gyok alatt Var(X)
Variance of Uniform
Var(X) = (b-a)^2/12
Variance of Normal
Var(X) = sigma^2
Variance of Binary Variable
Var(X)=p(1-p)
Conditional Expectation
Y discrete E(YIX=x) = SUMi yi f(yiIX)
Y continous E(YIX=x) = integral yf(yIx)dy
If E[h(X)YIX] =
Conditioning on X —> same as if it was a constant (linear transformation)
E[h(X)YIX] = h(X)E[YIX]
E[YIX] =
If X and Y are independent
E[YIX] = E[Y]
Law of Iterated Expectations
E[Y] = E(E[YIX])
Covariance
Cov(X,Y) = E[(X-E(X)(Y-E(Y))] = E[XY] - E[X]E[Y]
Cov(X,X)
Cov(X,X)=Var(X)
Cov(X,Y)
Cov(Y,X)
Cov(X,a)
0
Cov(aX,Y)
aCov(X, Y)
Cov(X,Y+Z)=
Cov(X, Y) + Cov(X, Z)
Var(aX+-bY)
a^2Var(X) + b^2Var(Y) +- 2abCov(X,Y)
If X and Y are independent Cov(X, Y)= E[XY] - E[X]E[Y] =
0
E[XY] = E[X]E[Y]
Correlation
Corr(X, Y) =
Cov(X, Y)/gyokVar(X)gyokVar(y) =
Cov((x-E[X])/gyokVar(X), (y-E[Y])/gyokVar(Y))
Population
Complete enumeration of a same set of interest
Sample
Subset of a population
Sample frame
Source material or device from which a sample is drawn
Simple Random Sampling (SRS)
Selects the pre-determined number of respondents to be interviewed from a target population with each potential respondent having an equal non-zero chance of being selected
“Representative” sample
If the sampling procedure is repeated many times, the features of the sample would on average (across all the samples) match those of the population
Quota sampling
Fixed quotas of certain types of respondents to be interviewed such that the resulting sample characteristics resemble those of the population
Random variable
Deterministic functions which assign numbers to uncertain events which are generated by random sampling
Independently and Identically Distributed (iid)
When sample values are all drawn from the same population and have the same distribution
Parameter
Numerical measure that describes a specific characteristic of a population
Statistic
Numerical measure that describes a specific characteristic of a sample. Formally a statistic is a function of a random variable (subject to sampling variation)
Estimand
Parameter in the population which is to be estimated in a statistical analysis
Estimator
A function for calculating an estimate of a given population parameter based on randomly sampled data. An estimator is a function of a sample of data which is drawn randomly. Different random samples result in different values for the estimator. They are themselves random variables and therefore have distributions, expected values etc.
Estimate
An estimate is the numerical value of the estimator given a specific sample is draen; it is a nonrandom number (eg. The sample mean)
Sampling distribution
Distribution of the estimator
Standard error
A measure of variation in the sampling distribution; it is equal to the square root of the variance of the statistic
Standard Deviation
A measure of variation in data it is equal to the square root of the variance of the data
Law of Large Numbers
As the sample size grows the sample mean converges, in a certain sense, to the population mean
Central Limit Theorem
As the sample size grows the sampling distribution of the standardised sample mean converges to a standard normal N(0,1)
Nuisance parameter
Any parameter which is not of immediate interest but which must be accounted for in the analysis of those parameters which are of interest
General form of confidence interval
“Sample statistic +- a number of std errors * std error of the statistic”
Standard error for 0.9 probability
+-1.645
Number of standard errors for probability 0.95
+- 1.96
Number of standard errors for probability 0.99
+-2.58
Interpreting confidence intervals in terms of probabilistic behavior of the interval
“There is a 95% probability that the interval [a, b] will contain the population parameter”
Hypothesis
Statement that some population parameter is equal to a particular value or lies in some set of values
Type 1 error
Rejecting the null hypothesis when in fact it is true
Type 2 error
Failing to reject the null when it is in fact false
5 steps of hypothesis test
- State the hypotheses - the Null and Alternative
- Construct a ‘test’ statistic:
Z = ((sample statistic) - (hypothesised population parameter))/SE of the sample statistic - State the sampling distribution of the test statistic under the provisional assumption that the Null is true Z~N(0,1)
- Use the SN distribution to control the probability of a Type 1 error
- Make a Decision - reject or fail to reject the Null
Time series data
Sequence of data points recorded in chronological order. Observations are often taken at equally-spaced points in time
Aims are (1) provide a simple model of the evolution of a variable as an aid to understanding (2) to provide a basis for forecasting/prediction
Data are not independent
Example of how to linearize time series data
Exponential growth —> take their logarithm
Breaks
Variations that occur due to sudden causes and are usually ex ante unpredictable
Seasonality
Predictable periodic pattern that reoccurs or repeats over regular intervals
Cycles
A series follows an up and down pattern that is not seasonal
Deterministic time series
One which can be expressed explicitly by an analytic expression, it has no probabilistic or random aspects
Stochastic time series
Non-deterministic time series is one which cannot be described by an analytic expression
Reasons for randomness (1) all the information necessary to describe it explicitly is not available, although it might be in principle or (2) the nature of the generating process is inherently random
Stationarity
The idea that there is nothing statistically special about the segment of history that you observed in the sense that the statistical properties of the process generating the data are invariant to shifts in the window of observation
Strong stationarity
All statistical features of a distribution are invariant to time-shifts
Weak stationarity
E[Xt] and Var(Xt) do not vary with time
Cov(Xt, Xt-h) and Corr(Xt, Xt-h) do not vary with time only h
Transforming a time series to stationarity
1) differencing the data
2) trend -> fitting a curve to the data
3) non-constant variance -> taking logarithm or square root of the variance
Causal effect
Difference between potential outcomes
Observed difference between groups=
ATT+Selection Bias
Selection bias
Average difference in the no treatment outcome between the treated and un-treated groups
Reflects the idea that this bias will arise if individuals are selected for treatment on the basis of potential outcomes
Randomisation makes treatment independent of potential outcomes
The mean potential outcomes are identical for the treated and untreated groups
Selection bias is zero
Internal validity
Findings for the sample are credible
External validity
Its findings can be credibly extrapolated to the population or Real World policy of interest
Threats to internal validity
1) Contamination: People in the control group access the treatment anyway
2) Non-compliance: individuals who are offered the treatment refuse to take it
3) Hawthorne Effect: a phenomenon in which participants alter their behaviour as a result of being part of an experiment or study
4) Placebo effect: the placebo effect impacts outcomes because of perceived changes
Threats to external validity
1) small/local nature of RCT (geographic area, institutional environment, demographic group)
2) spillover effects
3) short durations —> don’t know long term impact
Conditional Independence Assumption
Assignment to treatment id independent of potential outcomes conditional on covariates
Problems with conditional independence
1) credibility of CIA (more factors)
2) the common support problem/curse of dimensionality - few or no observations for certain groups
3) “bad controls” - controlling for variables which are themselves outcomes