Misc econometrics Flashcards

Question

Hypothesis testing

Answer 1

See LUBS2570; Part 1; topic 6 for really good notes

Answer 2

The Z-score that cuts the distribution off at the significance level; the z-score that corresponds to the significance level of the test

Answer 3

The Z-statistic associated to the observed sample mean and assuming H₀ is correct H₀ can be accepted or rejected solely by comparing the critical value with the test statistic

Answer 4

1) Reject H₀ if the calculated test statistic < critical value: z < z꜀ 2) Reject H₀ if the p-value associated with the test statistic is less than the significance level: p_value < α

Answer 5

When a) the population variance σ is unknown AND b) the sample size is small the t distribution must be used rather than the normal Z distribution, so a t-test statistics should be conducted instead of a Z test. ➢ The hypothesis tests with small samples and unknown population variances is similar as before, but now we need to consult the t distribution to obtain the critical values. ➢ For large samples t is typically not required; the Z-test can be used instead. (Recall that if n is large, the t distribution approaches the standard normal distribution (diagram in LecNotes))

Answer 6

There are two ways in which an estimate of a population parameter (using random samples) can be presented: 1. As a Point Estimate: a single value is used to estimate an unknown population parameter (like how we have seen that we can use the sample mean to estimate the population mean) 2. As an Interval Estimate or Confidence Interval: a range of values is used to estimate an unknown population parameter. This range of values is probably where the population parameter lies. (Part 1; Topic 7 explains confidence intervals of population mean µ (when σ is known), the general random interval estimator, interpretations, and confidence intervals of population mean µ when σ is not known)

Answer 7

σ² = Σ(xᵢ - μ)² / N ``` σ² = population variance μ = population mean N = population size (xᵢ = value of ith element) ``` The variance (σ²), is defined as the sum of the squared distances (squared to make all distances positive, and so not cancel each other out) of each term in the distribution from the mean (μ), divided by the number of terms in the distribution (N); the average distance squared of a value in the distribution from its mean.

Answer 8

S² = Σ(xᵢ - x̅)² / n-1 ``` S² = sample variance x̅ = sample mean n = sample size (xᵢ = value of ith element) ``` https://www.onlinemathlearning.com/variance.html

Answer 9

'll have a go at explaining the intuition between the "N" and "n-1": Think of the whole equation as the average amount of variation. If this is truly what the equation is measuring then it should be (total amount of variation)/(number of things that can vary). Since the average i.e. mean is always Total/(Number of things). Look at the numerator and the denominator in the sample variance equation. Is the following true? - The numerator is a measure of the total amount of variation - The denominator is the amount of things that are able to vary. Yes. Why!? I mean surely there are N things that can vary about xbar i.e. the sample mean. Well actually no there aren't. There are N things that can vary about the population mean but only N-1 that can vary about the sample mean. Here's an example of why this is so: Say you have 3 data points. - You calculate the sample mean and it comes out to be 2. - The first data point could be anything, let's say it is 1. - The second data point could be anything, let's say it is 3. - What can they second data point be? It absolutely MUST be 2. It is not free to vary - the sum of the three scores must be 6 or else the sample mean is not 2. Knowing n-1 scores and the sample mean uniquely determines the last score so it is NOT free to vary. This is why we only have "n-1" things that can vary. So the average variation is (total variation)/(n-1). Does this also has a connection with the degrees of freedom? Yes. The reason n-1 is used is because that is the number of degrees of freedom in the sample. The sum of each value in a sample minus the mean must equal 0, so if you know what all the values except one are, you can calculate the value of the final one.

Answer 10

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive.

Answer 11

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effect. More generally, the correlation between two variables is 1 (or –1) if one of them always takes on a value that is given exactly by a linear function of the other with respectively a positive (or negative) slope. https: //rb.gy/f4yofh https: //rb.gy/nuwdto

Answer 12

Population covariance: Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / N Sample covariance Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / n-1 https://byjus.com/covariance-formula/ Notice how the covariance formula relates to the variance formula. They are essentially the same but for one variable vs for two: Cov(x,y) = Σ(xᵢ - x̅)(yᵢ - ȳ) / N Cov(x,x) = Σ(xᵢ - x̅)(xᵢ - x̅) / N = Σ(xᵢ - x̅)² / N = Var(x) = σ² So σ² (pop var) is essentially the covariance of the variable with itself Another way of presenting the covariance formula is: Cov (X, Y) = E [(X - μₓ)(Y - μᵧ)] ``` (This can lead us to the result:) = E [(X - μₓ)(Y - μᵧ)] = E [(X - E[X])(Y - E[Y])] = E [XY - XE[Y] - YE[X] + E[X]E[Y]] = E[XY] - E[X]E[Y] - E[X]E[Y] + E[X]E[Y] = E[XY] - E[X]E[Y] ``` So given that we know that Cov(X,Y) = E[XY] - E[X]E[Y], and that Cov(X,X) = Var(X), Var(X) = E[X²] - (E[X])² Note: E is the expected value operator. (In probability theory, the expected value of a random variable is intuitively the arithmetic mean of a large number of independent realizations of X; By definition, the expected value of a constant random variable X = c is c.

Answer 13

The correlation coefficient is also known as the Pearson product-moment correlation coefficient, or Pearson’s correlation coefficient. As mentioned earlier, it is obtained by dividing the covariance of the two variables by the product of their standard deviations. Therefore the correlation between two variables is a normalised version of their covariance. The mathematical representation of the same can be shown in the following manner: ``` ρₓᵧ = Cov(x,y) / σ(X)σ(Y) ρₓᵧ = σₓᵧ / σₓσᵧ ``` = [Σ(xᵢ - x̅)(yᵢ - ȳ) / N] / [√{Σ(xᵢ - μₓ)²/ N}√{Σ(yᵢ - μᵧ)²/ N} (Note that the N denominators (or n-1 denominators in the sample case) cancel out) https: //rb.gy/f4yofh https: //rb.gy/nuwdto

Answer 14

* In simple words, both the terms measure the relationship and the dependency between two variables * “Covariance” indicates the direction of the linear relationship between variables * “Correlation” on the other hand measures both the strength AND direction of the linear relationship between two variables * Correlation is a function of the covariance * What sets them apart is the fact that correlation values are standardized whereas, covariance values are not * You can obtain the correlation coefficient of two variables by dividing the covariance of these variables by the product of the standard deviations of the same values * If we revisit the definition of Standard Deviation, it essentially measures the absolute variability of a datasets’ distribution * When you divide the covariance values by the standard deviation, it essentially scales the value down to a limited range of -1 to +1 * This is precisely the range of the correlation values. Also: • Notably, correlation is dimensionless while covariance is in units obtained by multiplying the units of the two variables • Although the values of the theoretical covariances and correlations are linked in the above way, the probability distributions of sample estimates of these quantities are not linked in any simple way and they generally need to be treated separately As we see from the formula of covariance, it assumes the units from the product of the units of the two variables. On the other hand, correlation is dimensionless. It is a unit-free measure of the relationship between variables. This is because we divide the value of covariance by the product of standard deviations which have the same units. The value of covariance is affected by the change in scale of the variables. If all the values of the given variable are multiplied by a constant and all the values of another variable are multiplied, by a similar or different constant, then the value of covariance also changes. However, on doing the same, the value of correlation is not influenced by the change in scale of the values. Another difference between covariance and correlation is the range of values that they can assume. While correlation coefficients lie between -1 and +1, covariance can take any value between -∞ and +∞. https: //rb.gy/nuwdto https: //rb.gy/weuttg https: //rb.gy/7jgsb8

Answer 15

Each of a number of independently variable factors affecting the range of states in which a system may exist, in particular any of the directions in which independent motion can occur. (ie. the number of explanatory variables) In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary. The number of independent ways by which a dynamic system can move, without violating any constraint imposed on it, is called number of degrees of freedom.