Summa Week 7 Flashcards
regression
What is regression?
a way of predicting the value of one variable from another
Regression is a _____ model of the relationship between ____ variables
hypothetical
two
The regression model is a ____ one.
Linear or curvilinear?
linear
We describe the relationship of a regression using the equation of a ________ _____
straight line
_______ association can be summarized with a line of best fit
bivariate
Bivariate association can be summarized with a ______________________
line of best fit
The _____________ would have the least amount of errors in a regression line
the line of best fit
What do we also call the “line of best fit”?
the regression line
What else do we also call the “line of best fit”?
the prediction line
What is the formula for a best fit line?
Yi = bo + b1X1 + E
or
Yi = B0 +B1X1 + Ei
What is bi in regression?
the regression coefficient for the predictor
what is the predictor?
the horizontal axis of a scatterplot used to find a regression line
what is another name of the gradient of the regression line?
slope
what is another name of the slope of the regression line?
gradient
What is the slope symbolized by?
bi
What does bi suggest regarding the relationship of a regression line?
the direction and/or strength of the relationship
What does b0 mean in a regression line?
the intercept (value of Y when X = 0)
When using b0 in a regression line, the value of Y is determined by X = ?
0
What also is b0?
the point at which the regression line crosses the Y-axis
What is another name of the point at which the regression line crosses the Y-axis?
the ordinate
When the regression line is properly fitted, the error sum of squares is ____ than that which would obtain with any other straight line.
smaller
When the regression line is properly fitted, the error sum of squares is smaller than that which would obtain with any other straight line. What is this describing?
the least squares criterion for determining the line of best fit/regression
What is the least squares approach?
the least squares line has a sum of errors (SE), and sum of squared errors (SSE) which is smallest of all straight line models
What does SE signify?
sum of errors in a least squares line
What does SSE refer to?
the sum of Squared errors in the least squares line approach
How good is the least squares line model?
only as good as the data given
do we need to test how well the least squares model fits the observed data in a regression?
hell yeah
What is another way of understanding regression (and by that token, ANOVA)?
total variation = explained variation + unexplained variation
What is the formula for regression?
Sum(Y-Y_)^2 = Sum(Y’-Y_)^2 + Sum(Y-Y’)^2
What is the sum of squares?
the proportion of variance accounted for by the regression model
the proportion of variance accounted for by the regression model
the sum of squares
What is a symbol for the sum of squares?
r^2
What is r^2?
the Pearson Correlation Coefficient Squared
What is te formula for the Pearson Correlation Coefficient Squared / proportion of variance accounted for by the regression model / r^2?
r^2 =
sum(Y’-Y_)^2/Sum(Y - Y_)^2
= Explained Variation / Total Variation
A regression allows you to predict Y values given a set of X values, however it does not allow you to attribute causality to the relationship. To or F?
True
The variability in Y is caused by X. Is this t or f in a Pearson’s correlation coefficient squared?
It’s false!
The variability can be accounted for by the variability in X, but NOT necessarily caused by X
A regression allows you to predict __ values given a set of __ values, however it does not allow you to attribute _________ to the relationship
Y
X
causality
What are two methods of identifying extreme outliers?
- using a boxplot
- determining using z-scores
How do you find extreme outliers in a boxplot?
SPSS: Graphs - Legacy Dialogs - Boxplot
right click on individual * (extreme outlier), and select “Clear”
How do you identify extreme outliers using z-scores?
SPSS: analyze - descriptives - descriptives and select the “Save standardized values as variables” option
Eliminate cases with a z-scores +-3 SD from the mean
What do +-3 z-scores refer to?
extreme outliers more than 3 SD away from the mean
Why are extreme outliers important in regression?
they could influence the entire results of the study away from the estimated population parameters
What are residuals?
the differences between the values of the outcome predicted by the model and the values of the outcome observed in the sample (extreme outliers)
What is another term for residuals?
influential cases, or extreme outliers
Influential cases are what?
those with an absolute value of standardized residuals greater than 3
What are standardized residuals?
those that are divided by an ESTIMATE of their standard deviation
What in SPSS looks at linear regression?
SPSS Casewise diagnostics
Other methods to identifying influential cases in SPSS include:
Areas under Distances and Influence statistics in the Linear REgression form of SPSS
Do we assume linearity is robust in regression analysis?
hell naw. Who can say?
Do we assume errors are independent in a regression analysis?
nahhhhhhh. there could be a third or fourth variable
Do we assume errors are normally distributed?
Yeah, as long as the sample size is large enough
Do we assume homoscedasticity in regression analysis?
the residuals at each level of the predictor should have the same variance, but not as big of a deal if violated
What is homogeneity of variance in arrays?
the variance of Y for each value of X is constant in the population
What is normality in arrays/
in the population, the values of Y corresponding for any specified value of X are normally distributed around the predicted Y
spooled^2 formula?
spooled^2 = df1/dftotal (s1^2) + df2/dftotal (s2^2)
What are variable types for regression analysis?
the predictor variable must be quantitative or categorical, and the outcome variable must be quantitative, continuous and unbounded
What is non-zero variance?
the predictor should have some variation in value
What are predictors that are uncorrelated with “external variables”?
external variables are variables that haven’t been included in the regression model which influence the outcome variable
What is the minimum sample size for regression analysis?
10 or 15 cases per predictor variable
How do you visually inspect the linearity through the scatterplot of the predictor and the outcome variable?
SPSS Graphs legacy dialogs, scatter/dot, simple scatter – x-axis is the predictor, the y-axis is the outcome variable. Add the “line of best fit” to assist in checking linearity. If the scatterplot follows a linear pattern (versus a curvilinear pattern) then the assumption nis met
What shape does the scatterplot of a regression analysis need to be for the assumption of linearity to be met?
it needs to be a line, rather than a curvilinear pattern
How do yu check for assumptions for indpendent errors?
using the Durbin-Watson test for serial correlations between errors
how can the Durbin-Watson test vary?
the test stat can vary between 0 and 4 with a value of 2 meaning that the residuals are uncorrelated
Values less than __ or ___ than ___ are definitely cause for concern; however, values closer to 2 may still be problematic depending on your sample and model
1 or greater than 3, however values closer to 2
Normally distributed error are found in a regression analysis by
visually inspect the normality through the Q-Q plot of the residuals
statistically inspect the normality: conduct z tests on skew and kurtosis of the residuals
How do you visually inspect a scatterplot of the standardized residuals?
ZRESID, versus the standardized predicted values, ZPRED
The standardized residuals ____ vary as a function of the standardized predicted values: the trend is centered around zero but also that the variance around _____ is _____ uniformly and randomly
MUST NOT; zero, scattered
What is the difference between homoscedasticity and heteroscedasticity?
a sequence or a vector of random variables is homoscedastic /ˌhoʊmoʊskəˈdæstɪk/ if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity
How do get a regression analysis in SPSS
Analyze - Regression - Linear - predicted as DV and predictor as IV, OK
What columns are the line of best fit from?
y-intercept = unstandardized coefficient/ B (constant)
y = b0 + b1 X
b1 = unstandardied coefficient std. error for the 2nd line item (IV)
What is the B or bivariate correlation in regression analysis?
was under standardized coefficients Beta, at the IV row
What is B?
the standardized coefficient for the predictor variable, or the percentage associated with
How to test if a sample b is different from the hypothesized b* (b*=0) use df = N -2 for the formula…
t = b - b*/sb
If H0 is rejected it means that in the population the ______ ______ is significantly different from zero
regression slope
It can be shown that b is normally distributed about b* with a standard error approximated by the formula
sb = sYX / sx * square root of (N - 1)
CI(b*) =
b + - (t a/2) [(sY*X) / sx square root of (N - 1)], with df = N - 2
How do you find the adjusted R ^ 2?
under the Model Summary, adjusted R squared
Adjusted R^2 = .33, F(1,198) - 99.59, p < .001
(N = 200)
What does adjusted R^2 refer to?
approximately (adjusted R^2) of the variance of the DV was accounted for by its linear relationship with the IV
SSt
total variability between scores and the mean (how the individual stats vary from the sample mean)
SSr
residual/error variability, between the regression model and the actual data (how the individual stats vary from the regression line)
SSm
model variability between the model and the mean (how the mean value of U differs from the regression line)
What are the purpose of the sums of squares?
SS uses the differences between the observed data and the mean value of Y
If the model results in better prediction than using the mean, then we expect SSm to be much ______ than SSr
greater!
Mean squared error is…
the sums of squares that are total values
mean squared error can be expressed as…
averages
Mean squared errors are called
mean squares, MS
What is the formula for F-stat for regression analysis?
MSm / MSr