Regression Flashcards
what is regression analysis used for in statistics?
regression analysis is used to explore relationships between variables, allowing for the prediction of one variable based on another
what is a residual in regression analysis?
a residual is the difference between the observed value and the predicted value for a given data point
what does the line of best fit represent in regression analysis?
the line of best fit represents the linear relationship between the dependent and independent variables, minimising the sum of squared residuals
what is the method of least squares?
the method of least squares is a technique used in regression the find the line that minimises the sum of the squared residuals
how can you interpret the regression equation y = b0 + b1x?
in the equation y = b0 + b1x, bo is the y-intercept, representing the value of y when x = 0, and b1 is the slope, indicating how much y changes for a one-unit increase in x.
what does a negative residual indicate?
a negative residual indicates that the predicted value is higher that the observed value for that data point
what is the significance of the sum of squared residuals in regression analysis?
the sum of squared residuals is minimised in the method of least squares to find the best-fitting line, ensuring the model’s predictions are as close as possible to the observed data
what is extrapolation, and why is it dangerous in regression analysis?
extrapolation involves using a regression line to predict y values for x values outside the observed data range. it is risky because the trend might change, leading to poor predictions
what are forecasts, and what assumption is made when using a regression line to predict future values?
forecasts are predictions about the future using time series data. the assumption made is that the past trend will remain the same in the future, which can be risky
what is an influential outlier in regression analysis, and how does it affect results?
an influential outlier is a data point that significantly impacts the regression line and correlation, especially when the point is both far from the trend and has an extreme x value
what is the difference between an outlier and a regression outlier?
an outlier is a point far from others in terms of x and y values, but a regression outlier is a point that is far from the overall trend, even if not an outlier on its own x or y values
what should you do when you encounter an influential regression outlier?
investigate the observation to see if it was recorded incorrectly or if it is genuinely different. it may be useful to refit the regression line without the outlier to check its impact on results
does correlation imply causation? why or why not?
no, correlation does not imply causation. an association between two variables may be due to a third variable, or there could be other explanations for the observed relationship.
what is a lurking variable?
a lurking variable is an unobserved variable that influences the association between the response and explanatory variables
what is simpson’s paradox?
simpson’s paradox occurs when the direction of an association between two variables changes after including a third variable and analysing data at separate levels of that variable
how can lurking variables affect the interpretation of correlations?
lurking variables can create spurious associations or distort the apparent relationship between two variables, making it seem as if one causes the other
what is confounding in statistics?
confiding occurs when two explanatory variables are associated with both the response variable and each other, making it difficult to determine which variable is causing the observed effect
what is the difference between a lurking variable and a confounding variable?
a lurking variable is unmeasured and affects the relationship between the explanatory and response variables.
how can confounding affect the interpretation of a study’s results?
confounding can distort the apparent association between variables, making it seem as though one causes the other when, in fact, a third variable is influencing the results.
what role do statistical methods play in analysing confounding variables?
statistical methods can adjust for confounding variables, isolating the effect of the explanatory variable, but there’s always a risk of omitting important confounders
what is the response variable inn regression analysis?
the response variable (y) is the variable you want to predict