Statistics: Regressions and associations Flashcards
What does the b in the line equation represent
The slope equals the amount that y changes when x increases by one unit.
What is the function of the absolute value of the slope?
Describes the magnitude of the change in y^ for 1 unit change in x, the larger it is the steeper the slope
What is meant by prediction error? Give another word for these
The difference between the actual y value and the predicted y value. These are also known as residuals.
When is it a positive residual
When the actual y is larger than the predicted y
What is meant by the least squares method?
This chooses the best possible regression line that has the smallest value of the residual sum of squares.
Apart from making errors small as possible name two characteristics of the regression line
- Has positive and negative residuals
- passes through the mean point
Why can’t we just use a slope to measure correlation
Different units of measurement for the variables
Name two connections between correlations and regressions
- They are both appropriate when the relationship between two quantitative variables can be approximated by a straight line
- The correlation and the slope of the regression line have the same sign. If one is positive, so is the other one.
Name three key differences between regression and correlation
With regression you must identify explanatory and response variables as this will affect the direction of the slope, this is not the case with correlation. The regression line also depends on the measurement units. Finally the correlation falls between -1 and 1 while the slope can take on any figure.
What is the typical way to interpret r^2
The proportion of the variation in the y-values that is accounted for by the linear relationship of y with x.
Name 4 pitfalls of association analysis
Extrapolation (using a regression line to predict y values for x values outside of the range of data)
Influential regressive outliers
Implying causality
lurking variables
What two conditions are required for an outlier to be influential
- x value is relatively loe or high compared to the rest of the data
- Falls far from the trend that the rest of the data follows (regression outlier)
What are predictions of the future using time series data called
forecasts
What is meant by non resistant in data
prone to distortion by outliers
What is meant by a lurking variable?
One that influences the association between two other variables