Misc econometrics Flashcards
p-value
(In statistical hypothesis testing) The probability of obtaining test results at least as extreme as the results observed, assuming that the null hypothesis is correct.
In other words, The p-value is the largest significance level at which we could carry out our test and still fail to reject H₀
In still other words, the probability associated with our calculated test statistic (Z-statistic corresponding to our observed value (and the distribution assuming H₀ is true))
Z-statistic
Z-statistic (or Z-score or Standard score) is a number representing how many standard deviations an observed value (raw score) is away from the mean (of what is being observed).
Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.
The Z-statistic is distributed normally with mean = 0 and variance = 1 (ie, it has a Standard Normal Distribution)
Z ~ N(0, 1)
So Z = (Observed Sample Value - Assumed Population Mean) / Standard Deviation of Sample Distribution
(Note: in hypothesis tests, the observed value is often the mean observed from our sample - in other words, we are testing whether the mean. The Z-statistic may also be used to estimate the probability that X could take a certain value (the observed value, x), given the assumed population mean value.)
(Note: Calculating z using this formula requires the population mean and the population standard deviation, not the sample mean or sample deviation. But knowing the true mean and standard deviation of a population is often unrealistic except in cases such as standardized testing, where the entire population is measured.
When the population mean and the population standard deviation are unknown, the standard score may be calculated using the sample mean and sample standard deviation as estimates of the population values.)
Z-statistic (conversion)
If X ~ N (μ, σ²) then
Z = (𝐗−𝛍)/𝛔 ~ N(𝟎, 𝟏)
In other words: For any continuous and randomly distributed variable X with mean μ and variance σ² (X ~ N (μ, σ²)), all probabilities can be converted to the Standard Normal Distribution using the Z Normal (0, 1) transformation: Z = (𝐗−𝛍)/𝛔
The Z-statistic is distributed normally with mean = 0 and variance = 1 (ie, it has a Standard Normal Distribution)
Therefore the Z Normal transformation, Z = (𝐗−𝛍)/𝛔 converts the variable X into a Standard Normal distribution. We can thus use standard normal tables to find relevant probabilities for P(X ≤ x).
Note: the Z Normal (0, 1) transformation is so called because it serves to transform the distribution of X to a normal distribution centered on 0 with a Variance of 1, by way of shifting the normal distribution leftward by μ units (aligning mean with x-axis) and compressing(/stretching if σ²<1) horizontally by σ units (setting σ²=1)
Thus, Z ~ N(0, 1)
So, to convert any value of X to its corresponding Z value, subtract the value of the mean and divide by the standard deviation.
Z-statistic (hypothesis tests)
is a number representing how many (of the sample distribution’s) standard deviations the observed (sample) value is away from the assumed (population) mean.
Random Sample
A sample of n observations of a RV Y, denoted Y₁, Y₂, …, Yₙ is said to be a random sample if the n observations are drawn independently from the same population and each element in the population is equally as likely to be selected
Random Sample as ‘A set of Independently and Identically Distributed (IID) RVs’
We describe such a sample as being a set of Independent and Identically distributed (IID) Random Variables (RVs)
So, if a random sample of n elements is taken,
the sample elements constitute a set of IID RVs, Y₁, Y₂, …, Yₙ, each of which have the same PDF as that of Y
The random nature of Y₁, Y₂, …, Yₙ reflects the fact that many different outcomes are possible before the sampling is actually carried out (ie, each element of the sample is a (IID) RV (with the same PDF as Y (population)) BECAUSE they are randomly (and independently) selected from the population, meaning that each element from the sample follows a PDF identical to the population
Sample data
Once the sample is obtained, we have a set of numbers, say y₁, y₂, …, yₙ which constitute the data we work with.
This are different types of data:
• Cross-sectional data
• Time-series data
• Panel data
Sample Statistics
A sample statistic is any quantity computed from values in a sample that is used for a statistical purposes.
(Statistical purposes include estimating a population parameter, describing a sample, or evaluating a hypothesis)
The two most often used sample statistics are the sample mean, denoted by Y̅, and the sample variance, denoted by S².
Sampling Distribution
A sample statistic (eg the sample mean) will have its own probability distribution called the sampling distribution.
Since each observation in a random sample is itself a RV, then any statistics calculated from a sample, called a sample statistic, is also a RV.
And since the sample statistic is an RVs, it will have its own probability distribution
The sampling distribution reflects the fact that a random sample (of size n) drawn from the population could materialise into a range of different manifestations, each with a corresponding probability. It is this probability distribution (that contains the information) of the all the possible samples that we could draw of size n from the population, that we call the sampling distribution (and we will see shortly that it distributes normally with mean μ and variance σ²/n)
Population > Sampling Distributions
So, to tie sampling distributions in with their wider context,
- There is a POPULATION (of size N)
- Y is a RV representing this population, with a PDF
- θ is an unknown population parameter (such as the expected value E(Y) or variance V(Y) (or) σ²)
- Note: these population parameters are unknown, fixed values
- A random sample (of n observations) of the RV Y is drawn, denoted Y₁, Y₂, …, Yₙ
- (Once the sample is obtained, we have a set of numbers, say y₁, y₂, …, yₙ, which constitute the data we work with)
- Each Yᵢ has a PDF (identical to the PDF of Y)
- From the sample we can calculate sample statistics
- (Two sample statistics of interest: sample mean, Y̅ and the sample variance, S²)
- Note: these sample statistics are RVs, with their own probability distribution, the SAMPLING DISTRIBUTIONS
The sampling distribution of the sample mean (Y̅)
Suppose Y ~ N (μ, σ²) and we have an IID sample of n observations from it: {Y₁, Y₂, …, Yₙ},
Then we say that Yᵢ ~ IIDN (μ, σ²)
In other words, each element of the sample is a RV with the same PDF as Y.
From these observations we can calculate the
sample mean, Y̅, as: Y̅ = 1/n Σ (Yᵢ)
Since Y̅ is a RV itself, it has a probability distribution.
It turns out that the sampling distribution of the sample mean is: Y̅ ~ N (μ, σ²/n)
(We’ll break this down in the next three cards)
The mean (or expected value) of the sampling distribution of Y̅
The mean of the sampling distribution of Y̅ is defined as:
E[Y̅] = μ
Interpretation:
If a sample of n random and independent observations are repeatedly and independently drawn from a population, then as the number of samples becomes very large (approaches infinity), the mean of the sample mean (Y̅) approaches the population mean
The variance of the sampling distribution of Y̅
The (population) variance of the sampling distribution of Y̅ is defined as:
V[Y̅] = σ²/n
Interpretation:
As the sample size (n) increases, the variance of Y̅ decreases. So the sampling distribution of the sample mean will have lower variance the larger the sample size.
The Sampling Distribution (of Y̅ ~ )
Thus, if we assume that the samples are taken from a normal RV, Y, we can deduce that:
Y̅ ~ N (μ, σ²/n)
Standardisation of Y̅
We can compute the standard normal for Y̅ to calculate probabilities:
Z = [ ( Y̅ - μ ) / ( σ/√n ) ] ~ N(0, 1)