Week 2: Ordinary least squares (OLS) and bivariate regression Flashcards
The 6 steps of hypothesis testing
- Ensure that assumptions are met
- Formulate hypotheses
- Determine the critical area from the appropriate sampling distribution
- Calculate the test statistic
- Make decision
- State conclusions
Step 1: Ensure that assumptions are met
- Random samples
- Independt samples
- Interval-ratio level of measurement
- Sampling distribution is normally distributed
What is the probability that we would observe a particular sample statistic given this population? How unusual is this? Does this suggest the null hypothesis is false?
Step 2: Formulate hypotheses
- Null is always no difference, no (positive/negative) effect etc.
- Alternative is a difference OR a particular difference (greater than, less than, etc.)
Step 3: Determine the critical area from the appropriate sampling distribution
Decision rule: Reject H0 if the found Z-Score (Z*) is less than the critical z-value.
Critical value of z
A critical value of z (Z-score) is used when the sampling distribution is normal, or close to normal. Z-scores are used when the population standard deviation is known or when you have larger sample sizes.
How to find critical value
Step 1: Subtract the confidence level from 100% to find the α level: 100% – 90% = 10%.
Step 2: Convert Step 1 to a decimal: 10% = 0.10.
Step 3: Divide Step 2 by 2 (this is called “α/2”).
0.10 = 0.05. This is the area in each tail.
Step 4: Subtract Step 3 from 1 (because we want the area in the middle, not the area in the tail):
1 – 0.05 = .95.
Step 5: Look up the area from Step in the z-table. The area is at z=1.645. This is your critical value for a confidence level of 90%.
For a 90% confidence level (Two-tailed test), for a one-tailed test; step 3 can be skipped
Step 4(1): Calculate the test statistic
- For every sample statistic, there is a formula for its test statistic (not so important)
- The test statistic allows us to make probability statements in terms of the “standard” distribution
Step 4(2): Using P-values
Reject H0 if p < 0,05 (ignore critical value)
P-value
Definition
The probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed.
P-value < 0.05 lower is generally considered statistically significant
The P stands for probability ands measures how likely it is that any observed difference between groups is due to chance
Step 5 and Step 6: Decision and conclusion
- Decision: H0 can/cannot be rejected
- Conclusion: With 95% certainty… (this only counts if an alpha-level was chosen of 0.05)
Regression equation
Y = a + bX + e
- a = intercept
- b = slope
- X = the amount of the independent variable used
- Y = the amount of the dependent variable used
- e = error term; the error in predictiong the value of Y, given the value of X
Minimizes squared distances between point and line
Ordinary Least Squares Estimation
- Logic: Minimize the sum of the squared residuals
- Residual is the difference between the actual value of Y and the predicted value of Y
Limits of the Ordinary Leat Squares regression
- Collinearity between X-variables leads to misinterpretation of the coefficients
- More observations than X-variables are required
- Only one Y-variable can be modeled