regression Flashcards
regression analysis
a statistical method for examining relationships among variables
linear regression
a statistical model that assumes a linear relationship between two variables
population linear regression model
decribes the relationship that holds between Y and X in the population
X
the independent variable or regressor
Y
the dependent variable
Beta 0
the intercept, it measures the point at which the regression line intercepts the Y axis.
Beta 1
the slope of the regression line. It measures the difference in Y associated with a one unit change in X
u i
regression residual
Prediction
using the observed values of a given variable to predict the value of another variable
causal inference
to determine whether and to what extent a cause-and-effect relationship exists between variables
causality
an action is said to cause an outcome in the outcome is the direct consequence of that action
treatment group
recieve the treatment
control group
does not recieve treatment (counterfactual)
observational data
surveys, administrative records, financial reports
cross-sectional data
- data collected at a single point in time for different entities
- reflects a snapshot of variables at that point
- we can use this data so study differences across intities in a single time period
panel data
- data collected for multiple entities at multiple points in time
- captures the dynamics of change over time
-allows for the analysisi of temporal effects across entities
time series
- data collected for a single entity at multiple time points
- allows for the analysis of temporal effects and forecasting
ordinary least squares (OLS)
it identifies the prameters that minimize the sum of the squared residuals
residual
the vertical distance from the regression line
the sign (±) on Beta 1 for an independent variable
the direction of its association with the dependent variable
Central limit theorem
when the sample is large and properly drawn, the sample mean is distributed normally around the true mean
standard error
it represents the average distance that the observed values fall from the regression line
t-statistic
the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error
95% confidence interval
an interval that is a function of the data that contains the true parameter value 95% of the time in repeated samples
omitted variable bias (OVB)
the bias in the OLS estimator that occurs as a result of an omitted variable (Z)
statistical inference entails
- estimation of the coefficients of interest
- hypothesis testing and confidence intervals
R-squared
a measure of the regression model fit
R-squared value of 1
perfect explanation of variance
R-squared value of 0
the model explains none of the variance
adjusted R-squared
a modified version of the R-squared that does not necessarily increase when a regressor is added to the regression
control variable
controls for an omitted causal factor in the regression but itself not necessarily have a causal effect on Y
endogeneity
a situation where the explanatory variable is correlated with the error term
omitted variable bias (endogeneity)
when a model fails to include one or more relevant variables that influence the dependent variable
selection bias (endogeneity)
the data sample is not randomly selected
reverse causality or simultaneity (endogeneity)
two-way causation exists between independent and dependent variables
measurement error (endogeneity)
if an independent variable is measured with error
difference in difference (DiD)
this is used to asses the causal effect of an event by comparing the set of units where the event happened (treatment group) in relation to units where the event did not happen (control group).
knowledge diffusion
disclosure of technical knowledge in the patent document