week 5 & 6 Regression Flashcards
What is the primary difference between ANOVA and regression in terms of the types of studies they are used for?
ANOVA is for experimental studies while regression is for observational studies.
What does regression fundamentally begin with?
Regression begins with correlation.
What is the relationship between correlation and causation?
Correlation does not imply causation.
What is the common measure of effect used in both correlation and regression?
The correlation coefficient (r²) which represents the proportion of variance explained.
What can restriction of range on the independent variable lead to regarding the relationship?
It can underestimate the relationship.
How can extreme cases or outliers affect linear models in regression?
Outliers can skew the correlation, inflating or deflating results.
What is a potential issue with using poor or proxy measures in correlation?
It may underestimate the correlation.
What does the strength of effect in regression correspond to in terms of r² values?
r² values indicate the proportion of variance explained, with higher values indicating stronger relationships
A statistical method for predicting the value of one variable from another, using one or more predictors.
Regression
A measure that quantifies the direction and strength of a linear relationship between two variables.
Correlation Coefficient (r)
The portion of variability in a dependent variable that can be attributed to the independent variable(s) in a regression model.
Variance Explained
What is the difference between simple regression and multiple regression?
Simple regression uses one predictor (independent variable), while multiple regression uses two or more predictors.
A data point that differs significantly from other observations and can substantially affect the results of statistical analysis.
Outlier
The variable that is manipulated or varied in an experiment or regression analysis to assess its impact on the dependent variable.
Independent Variable (IV)
Why is it beneficial to use regression over simple correlation?
Regression allows for prediction of the outcome variable while accounting for multiple predictors, enhancing the understanding of variable relationships.
What does a linear relationship express in the context of regression?
A linear relationship is expressed as a straight line.
What is the relationship between a line and a model in regression?
Your line is your model.
What are the two fundamental features that all lines possess in regression analysis?
All lines have a slope and an intercept.
What does the term ‘error’ refer to in the context of regression?
Error refers to the difference between your modeled line and the actual data points.
What does b1 represent in regression analysis?
b1 is the regression coefficient for the predictor and represents the gradient (slope) of the regression line, indicating the direction and strength of the relationship.
What is represented by b0 in a regression equation?
b0 is the intercept, which is the value of Y when X = 0, marking the point where the regression line crosses the Y-axis.
How can one estimate the outcome using multiple predictors in regression?
By entering the value of the predictor, multiplied by the coefficient, and adding the intercept, one can estimate the outcome.
A statistical process for estimating the relationships among variables, allowing for the prediction of one variable based on the values of others.
Regression
The slope of a regression line, represented by b1, indicates the direction and strength of the relationship between the independent and dependent variables.
Slope
The intercept of a regression line, represented by b0, is the value of the dependent variable when all independent variables are set to zero.
Intercept
The discrepancy between predicted values from the regression model and the actual observed values.
Error
A variable that is used in a regression model to predict the outcome of another variable.
Predictor
What is the method of least squares used for?
It is used to find the line of best fit for a set of data by minimizing the sum of the squares of the residuals.
What is a residual in the context of least squares?
A residual is the difference between the observed data and the predicted values generated by the model.
A straight line that best represents the data points in a scatter plot, minimizing the sum of the squares of the vertical distances of the points from the line.
Line of best fit
A statistical method used to measure the total variability in a dataset, often decomposed into different components such as total variability, model variability, and residual variability.
Sums of Squares
How do you assess the quality of a regression model?
By analyzing how well the model fits the observed data through metrics like Sums of Squares, ANOVA output, and Mean Squared Error
Total Sum of Squares, representing total variability in the data, calculated as the variability between individual scores and the mean.
SST
Sum of Squares for Residuals, indicating the variability between the actual data and the values predicted by the regression model.
SSR
Sum of Squares for the Model, measuring the improvement in variability explained by fitting the regression model compared to the mean.
SSM
What does a higher SSM compared to SSR indicate
It suggests that the model provides better predictions than simply using the mean of the observed data
What does ANOVA stand for in the context of regression analysis?
ANOVA stands for Analysis of Variance, which tests the differences between the means of several groups and is used to evaluate the performance of the regression model.
A metric that quantifies the average squared difference between the observed values and the values predicted by the model, reflecting the error of the model.
Mean Squared Error (MSE)
What does the F-ratio represent in regression analysis?
The F-ratio is a statistic calculated to compare the mean of the sums of squares from the model to the mean of the sums of squares from the residuals, indicating whether the model effectively explains the variance.
A measure that represents the proportion of variance accounted for by the regression model, indicating the strength of the relationship between the predictors and outcome variable.
r² (R-squared)
How is r² similar to Pearson Correlation
r² is similar to squaring the r value obtained from Pearson Correlation, as it provides an understanding of the proportion of variance explained, but it can include multiple predictors in regression.
The measure of how much the values in a dataset differ from the mean of that dataset, reflecting the spread of the data points.
Variance
What role does the residual play in evaluating the fit of the regression model?
The residual indicates how well the model predicts the observed data; smaller residuals signify a better fit.
A line that best describes the relationship between independent and dependent variables, derived from the least squares method.
Regression Line
What factor determines whether a regression model is preferred over the mean?
If the model results in significantly lower error and better predictions compared to the mean, it is considered preferred.
The process of using a regression model to estimate future values of the dependent variable based on new values of independent variables
Prediction
What does a regression analysis output typically include?
It includes statistics such as R-squared, F-ratio, coefficients of the predictors, p-values, and sums of squares (SST, SSR, SSM) to evaluate model performance
Write up the regression equation:
Yi = b0 + b1Xi
outcome = intercept + slope * predictor
What is multiple regression?
Multiple regression is a model to predict the value of one variable from multiple predictors, extending simple regression to include several variables.
How does multiple regression relate to simple regression?
Multiple regression is a natural extension of the simple regression model, which involves predicting a variable from just one predictor.
What type of relationship does multiple regression model?
Multiple regression models the hypothetical relationship between several variables.
What is the structure of a multiple regression equation?
The equation of multiple regression is similar to that of simple regression but includes additional predictors, forming a straight line
In a regression equation, what does b0 represent?
In a regression equation, b0 represents the intercept, which is the value of the Y variable when all X variables are equal to zero.
What role do regression coefficients play in a multiple regression equation?
Regression coefficients (bi) quantify the effect of each predictor (Xi) on the outcome variable (Y).
Can a multiple regression model have more than one predictor?
Yes, a multiple regression model can include multiple predictors.