2B Flashcards
what is an estimation method
a method that you (or spss) can use to estimate the parameters of the model
parameters = b0 and b1
the two criteria for a good estimation method
unbiased
the parameters are not systematically estimated too small or too large
In other words, there is a lack of bias
the two criteria for a good estimation method
efficient
- the parameter estimates are as precise as possible
- the estimation method makes optimal use of the available information
- the SE are as small as possible
In other words; there is a lack of error
estimation methods broadly fall in two categories
estimation methods
Algebraic solutions that only use formula’s (e.g., ordinary least squares)
Methods that additionally use ‘trial and error
examples of estimation methods
for this course
ordinary least squares (linear regression)
maximum likelihood (logistic regression)
Ordinary Least Squares (OLS)
- we use the method of least squares to estimate the parameters (b) that define the regression model for which the sum of squared residuals is the minimum it can be
- If all assumptions are satisfied, this means that the points in the figure are as close as possible to the regression line
Strength OLS
Simple to understand
Small computational load, so basically no loading times even with huge datasets
Requires fewer assumptions than some more complex methods
Limitations OLS
Unable to estimate anything beyond linear regression models (e.g., logistic regression, multilevel regression, etc.)
Sensitive to outliers (extreme values in the data)
Only observations with complete observations on all variables in the model are included in the analysis (listwise deletion)
Conclusions about the standard error you can make based on the formula for estimating the standard error with OLS
- As the dispersion around the regression line increases (SSR), the standard error increases (this is logical because more variation between your predicted and observed values means the model has a larger error).
- As the sample size (n) increases, the standard error decreases (this is logical because if you divide by a larger number, the answer you get from divided the SSR by n-2 becomes smaller).
The t-value becomes larger and the p-value becomes smaller when…
- The regression coefficient is larger.
The larger the coefficient, the less likely it is that you would coincidentally find this coefficient in the population if there is no association in the population. - The residuals are smaller (e.g., points are closer to the line).
If the residuals are smaller, the p-value becomes smaller. If your model gives a very precise prediction of all the points then it is also less likely you would coincidentally find these residuals in the population if there is no association. - The sample size is larger.
The larger the sample, the smaller the p-value. It is less likely you would find something in the sample by coincidence if the sample size is large, it makes the sample more reliable.