Lecture 2 Flashcards
How is the intercept and slope coefficient calculated in univariate cross-sectional ordinary least squares regression?
Intercept Beta_0 is where the line hits the y-axis. To find it, imagine the line shifting up or down until it best fist the data. The intercept is where it crosses the y-axis.
(formula)
Slope Beta_1 is how steep the line is. To find it, think about the line tilting more or less to fit the data. The slope measure this tilt (leaning the line to match the overall trend of the data).
(formula)
Explain the intution behind ordinary least squares estimation (OLS)
Imagine you have a bunch of dots on a graph. Each dot has an x-value and y-value. Now, think of drawing a straight line through these dots. OLS is like finding the line that makes the sum of the squared vertical distances between the dots and the line as small as possible.
Ordinary Least Squares (OLS) estimation aims to find the straight line that minimizes the sum of squared vertical distances between the observed data points and the predicted values on the line, providing the best fit for the data.
Explain what a p-value is used for in regression analysis
A p-value helps us to determine whether a variable is statistically significant. The p-value is the probability of committing a Type 1 error.
Want to be as low as possible
p-value: like a measure of evidence. It tells you if the relationship you see between your independent variable and the dependent variable is likely real or just a coincidence.
Small value: is suggests that there is a good chance that the relationship you are seeing it not due to luck
Large value: is suggests that the obeserved relationship could be just a fluke (lykketreff)
Explain the use and interpretation of R-sqr in regression analysis
R-sqr is the most common measure of goodness of fit. The statistic is the fraction of the sample variance of y_i, explained by X_i. That is, R-sqr, is the ratio between the variance of ^y_i and y_i.
The measure ranges from 0 to 1, the higher value, the more of the data is explained by the data.
Want the value to be as high as possible
R-sqr: measure that tells you how well the independent variable(s) explain the variation in the dependent variable.
R-sqr = 0: zero percent of the daata is explained by the model.
R-sqr = 1: hundred percent of the data is explained by the model.
To calculate R-sqr we need to know thee intermediate measure:
- Total sum of Squares (TSS)
- Explained sum of squares (ESS)
- Sum of squared residuals (SSR)
Discuss whether R-sqr is a good measure of fit in regression analysis
Good side: a high R-sqr value is generally good because it means your model does a good job of explaining or predicting the outcome.
Caution: R-sqr does not tell you if your model is the right one of if your predictions are accurate in an absolute sense. You could have a high R-sqr and still make inaccurate predictions.
So, it is useful, but don’t rely on R-sqr. Always check the context and other measures to make sure your model is doing what you want it to do.
What is the difference between independent and depentent variables in regression analysis?
Independent variables: thing you think is causing a change in something else. It is like the “cause” or the “input” that you are testing. For example, if you are looking at how studying time (independent variable) affects exam scores.
Dependent variable: the thing you are trying to explain or predict. It is the “outcome” or “result” that you are interested in. For example, exam score would be the dependent variable because you believe they depend on how much time you study.
The independent variable is what you change or manipulate, and the dependent variable is what you observe or measure to see if the change had an effect.
What is omitted variable bias?
Omitted variable bias is an endogeneity issue that causes the population orthogonality condition to be violated.
(Omitted variable bias referer til en feil i en analyse hvor vi har utelatt eller ikke tatt hensyn til en viktig variabel. Dette kan føre til feilaktige resultater i studien vår. Når dette skjer har vi utelatt en viktig variabel som påvirker resultatene våre, og dette fører til at betingelsen om gjensidig uavhengighet mellom variablene blir brutt. Dette kan føre til feilaktige konklusjoner i studien vår.
Eks: ønsker å undersøke studietid påvirker eksamensresultatene. Vi tar med variabler som antall lesetimer per dag og deltakelse i aktiviteter. Men vi glemmer å inkludere en viktig variabel, som feks søvnkvalitet)
What is data mining or p-value hacking in the context of regression analysis?
Datamining and p-value hacking is a result of wanting the R-sqr to be as high as possible and the p-value to be as low as possible - mindlessly trying to achieve this. As a result, we have a garbage model that looks nice in terms of R-sqr and p-value.
What is the population orthogonality condition and why is it important?
The population orthogonality condition is the most important assumption for the OSL estimation.
It is crucial for ensuring the validity of regression analysis, as it helps maintain the integrity of the estimation process and the reliability of the results.Violating this condition can comprompise the accuracy and interpretation of regression coefficients.
(Sier at feilleddet i en regresjonsmodell, altså det som modellen ikke klarer å forklare, ikke skal ha en fast sammenheng med feilene du gjør når du prøver å forutsi det. Hvis denne betingelsen blir brutt, kan det føre til feilaktige and upålitelige resultater i analysen. Så, det er viktig å sørge for at de uavhengige variablene i analysen ikke er knyttet til feilene på en bestemt måte. Hvis de er det, kan det påvirke hvor godt analysen din fungerer)
Issues that violates this are called endogeneity issues. There are thee main types of endogeneity issues:
- Omitted variables bias
- Systematic measrement error
- Incorrect specification of causality
Outline the econometric process
Econometric process: entails (innebærer) using statistical methods to analyze data, estimate parameters, and draw conclusions about economic relationships based on a formulated question.
(Image)