week 5 & 6 Regression Flashcards

1
Q

What is the primary difference between ANOVA and regression in terms of the types of studies they are used for?

A

ANOVA is for experimental studies while regression is for observational studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does regression fundamentally begin with?

A

Regression begins with correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the relationship between correlation and causation?

A

Correlation does not imply causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the common measure of effect used in both correlation and regression?

A

The correlation coefficient (r²) which represents the proportion of variance explained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can restriction of range on the independent variable lead to regarding the relationship?

A

It can underestimate the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can extreme cases or outliers affect linear models in regression?

A

Outliers can skew the correlation, inflating or deflating results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a potential issue with using poor or proxy measures in correlation?

A

It may underestimate the correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the strength of effect in regression correspond to in terms of r² values?

A

r² values indicate the proportion of variance explained, with higher values indicating stronger relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A statistical method for predicting the value of one variable from another, using one or more predictors.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A measure that quantifies the direction and strength of a linear relationship between two variables.

A

Correlation Coefficient (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The portion of variability in a dependent variable that can be attributed to the independent variable(s) in a regression model.

A

Variance Explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between simple regression and multiple regression?

A

Simple regression uses one predictor (independent variable), while multiple regression uses two or more predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A data point that differs significantly from other observations and can substantially affect the results of statistical analysis.

A

Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The variable that is manipulated or varied in an experiment or regression analysis to assess its impact on the dependent variable.

A

Independent Variable (IV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is it beneficial to use regression over simple correlation?

A

Regression allows for prediction of the outcome variable while accounting for multiple predictors, enhancing the understanding of variable relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a linear relationship express in the context of regression?

A

A linear relationship is expressed as a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relationship between a line and a model in regression?

A

Your line is your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two fundamental features that all lines possess in regression analysis?

A

All lines have a slope and an intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the term ‘error’ refer to in the context of regression?

A

Error refers to the difference between your modeled line and the actual data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does b1 represent in regression analysis?

A

b1 is the regression coefficient for the predictor and represents the gradient (slope) of the regression line, indicating the direction and strength of the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is represented by b0 in a regression equation?

A

b0 is the intercept, which is the value of Y when X = 0, marking the point where the regression line crosses the Y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can one estimate the outcome using multiple predictors in regression?

A

By entering the value of the predictor, multiplied by the coefficient, and adding the intercept, one can estimate the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A statistical process for estimating the relationships among variables, allowing for the prediction of one variable based on the values of others.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The slope of a regression line, represented by b1, indicates the direction and strength of the relationship between the independent and dependent variables.

A

Slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The intercept of a regression line, represented by b0, is the value of the dependent variable when all independent variables are set to zero.

A

Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The discrepancy between predicted values from the regression model and the actual observed values.

A

Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A variable that is used in a regression model to predict the outcome of another variable.

A

Predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the method of least squares used for?

A

It is used to find the line of best fit for a set of data by minimizing the sum of the squares of the residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a residual in the context of least squares?

A

A residual is the difference between the observed data and the predicted values generated by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

A straight line that best represents the data points in a scatter plot, minimizing the sum of the squares of the vertical distances of the points from the line.

A

Line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

A statistical method used to measure the total variability in a dataset, often decomposed into different components such as total variability, model variability, and residual variability.

A

Sums of Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you assess the quality of a regression model?

A

By analyzing how well the model fits the observed data through metrics like Sums of Squares, ANOVA output, and Mean Squared Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Total Sum of Squares, representing total variability in the data, calculated as the variability between individual scores and the mean.

A

SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Sum of Squares for Residuals, indicating the variability between the actual data and the values predicted by the regression model.

A

SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Sum of Squares for the Model, measuring the improvement in variability explained by fitting the regression model compared to the mean.

A

SSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does a higher SSM compared to SSR indicate

A

It suggests that the model provides better predictions than simply using the mean of the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does ANOVA stand for in the context of regression analysis?

A

ANOVA stands for Analysis of Variance, which tests the differences between the means of several groups and is used to evaluate the performance of the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

A metric that quantifies the average squared difference between the observed values and the values predicted by the model, reflecting the error of the model.

A

Mean Squared Error (MSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does the F-ratio represent in regression analysis?

A

The F-ratio is a statistic calculated to compare the mean of the sums of squares from the model to the mean of the sums of squares from the residuals, indicating whether the model effectively explains the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

A measure that represents the proportion of variance accounted for by the regression model, indicating the strength of the relationship between the predictors and outcome variable.

A

r² (R-squared)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How is r² similar to Pearson Correlation

A

r² is similar to squaring the r value obtained from Pearson Correlation, as it provides an understanding of the proportion of variance explained, but it can include multiple predictors in regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

The measure of how much the values in a dataset differ from the mean of that dataset, reflecting the spread of the data points.

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What role does the residual play in evaluating the fit of the regression model?

A

The residual indicates how well the model predicts the observed data; smaller residuals signify a better fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

A line that best describes the relationship between independent and dependent variables, derived from the least squares method.

A

Regression Line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What factor determines whether a regression model is preferred over the mean?

A

If the model results in significantly lower error and better predictions compared to the mean, it is considered preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

The process of using a regression model to estimate future values of the dependent variable based on new values of independent variables

A

Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What does a regression analysis output typically include?

A

It includes statistics such as R-squared, F-ratio, coefficients of the predictors, p-values, and sums of squares (SST, SSR, SSM) to evaluate model performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Write up the regression equation:

A

Yi = b0 + b1Xi
outcome = intercept + slope * predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is multiple regression?

A

Multiple regression is a model to predict the value of one variable from multiple predictors, extending simple regression to include several variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How does multiple regression relate to simple regression?

A

Multiple regression is a natural extension of the simple regression model, which involves predicting a variable from just one predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What type of relationship does multiple regression model?

A

Multiple regression models the hypothetical relationship between several variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the structure of a multiple regression equation?

A

The equation of multiple regression is similar to that of simple regression but includes additional predictors, forming a straight line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

In a regression equation, what does b0 represent?

A

In a regression equation, b0 represents the intercept, which is the value of the Y variable when all X variables are equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What role do regression coefficients play in a multiple regression equation?

A

Regression coefficients (bi) quantify the effect of each predictor (Xi) on the outcome variable (Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Can a multiple regression model have more than one predictor?

A

Yes, a multiple regression model can include multiple predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the significance of the intercept in regression analysis?

A

The intercept indicates the expected value of the dependent variable when all independent variables are zero.

57
Q

General Linear Model

A

A statistical framework that encapsulates multiple regression, involving the relationship between a dependent variable and multiple independent variables.

58
Q

Multiple Regression

A

A statistical technique used to model the relationship between one dependent variable and two or more independent variables.

59
Q

A value that represents the effect of a unit change in a predictor variable on the dependent variable in a regression analysis.

A

Regression Coefficient

60
Q

The point at which the regression line crosses the Y-axis, indicating the expected value of the dependent variable when all predictors are zero.

A

Intercept

61
Q

What dimensionality does multiple regression introduce?

A

Multiple regression introduces additional dimensions as it includes multiple predictors, visualized as a regression plane.

62
Q

The outcome variable that a researcher aims to predict or explain in a regression analysis.

A

Dependent Variable

63
Q

A variable that is hypothesized to influence or predict the dependent variable in a regression analysis.

A

Independent Variable

64
Q

How is the regression plane defined in a multiple regression model?

A

The regression plane in a multiple regression model is defined by the intercept and regression coefficients of the predictors

65
Q

Predictors

A

Variables that are used in a regression analysis to predict the outcome of the dependent variable.

66
Q

What remains constant in the structure of multiple regression?

A

Despite the number of predictors, the model’s structure remains linear.

67
Q

A statistical method that models the relationship between two variables by fitting a linear equation to observed data

A

Linear Regression

68
Q

Why is the General Linear Model considered a foundational aspect of statistical analysis?

A

The General Linear Model provides a comprehensive framework for analyzing and understanding relationships between variables in various types of data.

69
Q

What are the three main methods of regression?

A

Hierarchical, Forced Entry, Stepwise.

70
Q

A regression method where all predictors are entered into the model simultaneously, relying on strong theoretical reasons for variable inclusion.

A

Forced Entry Regression

71
Q

What is the purpose of Hierarchical Regression?

A

Hierarchical Regression allows the experimenter to decide the order in which known predictors are entered, thereby assessing their unique contributions to the outcome.

72
Q

A data-driven approach where predictors are selected based on their semi-partial correlation with the outcome, using mathematical criteria rather than theory.

A

Stepwise Regression

73
Q

Why is Hierarchical Regression considered the best method?

A

It is based on theory testing and allows the unique predictive influence of new variables to be assessed while holding known predictors constant.

74
Q

Variables that can take on nominal values, such as different species in a study, which can be recoded for mathematical analysis in regression.

A

Categorical Predictors

75
Q

What is a major drawback of Stepwise Regression?

A

It may depend on slight numerical differences in semi-partial correlations, leading to significant theoretical implications.

76
Q

A statistical measure that indicates the unique contribution of a predictor to the outcome variable after controlling for other predictors.

A

Semi-partial Correlation

77
Q

How does a researcher determine the order of variables in Hierarchical Regression?

A

The researcher uses their theoretical understanding based on past research to decide which predictors to enter first.

78
Q

The portion of variance in the outcome variable that is accounted for by a specific predictor after considering other predictors in the model.

A

Unique Variance Explained

79
Q

What is the first step in Stepwise Regression when using SPSS?

A

SPSS looks for the predictor that can explain the most variance in the outcome variable.

80
Q

A variable that has two possible levels, such as having a characteristic versus not having it, which can be used in regression analysis.

A

Dichotomous Variable

81
Q

What is a critical consideration when using Forced Entry Regression?

A

The importance of having strong theoretical justifications for the inclusion of specific variables in the model.

82
Q

The process of using statistical methods, like Stepwise Regression, primarily to discover relationships in the data without strong theoretical backing.

A

Data Exploration

83
Q

What does it mean to recode categorical data for regression

A

It involves transforming nominal variables into numerical form that can be analyzed mathematically in regression analyses.

84
Q

What is the main goal of generalizing a sample model in multiple regression?

A

The main goal is to generalize the findings from the sample model to the entire target population.

85
Q

The process of applying findings from a sample model to the entire population, provided that certain assumptions are met.

A

Generalisation

86
Q

What type of outcome variable must be present in multiple regression analysis?

A

The outcome must be continuous.

87
Q

A condition where predictors in a regression model must display variance; otherwise, no estimation can occur.

A

Non-Zero Variance

88
Q

What is the assumption regarding the linearity in multiple regression?

A

The modeled relationship must be linear; it should not be curvilinear.

89
Q

The assumption that all values of the outcome variable should come from different individuals.

A

Independence

90
Q

What does multicollinearity refer to in multiple regression?

A

Multicollinearity exists when predictors are highly correlated with each other.

91
Q

The assumption that the variance of the error term should remain constant for each value of the predictors.

A

Homoscedasticity

92
Q

What does Cook’s Distance measure in the context of multiple regression?

A

Cook’s Distance measures the influence of a single case on the overall model.

93
Q

The residuals that have been transformed into Z-scores; typically, 95% should lie between ±2 in a normal distribution.

A

Standardised Residuals

94
Q

How can we identify potential outliers in standardized residuals?

A

Any case with an absolute standardized residual value of 3 or more is considered likely to be an outlier.

95
Q

Cases that have a disproportionate impact on the overall fit of the regression model, as measured by metrics like Cook’s Distance.

A

Influential Cases

96
Q

What should you check regarding the multivariate outliers in regression?

A

Look for extreme combinations of scores across multiple variables, not just single variable outliers.

97
Q

A measure used to identify multivariate outliers based on the distance of a case from the mean of a distribution, considering the covariance of the variables

A

Mahalanobis Distance

98
Q

What is the typical threshold for a case’s value in Cook’s Distance to determine influence?

A

A value greater than 1 indicates that a case is likely to be influential on the model.

99
Q

The differences between observed values and the values predicted by the regression model; they represent the error in prediction.

A

Residuals

100
Q

What does it mean if predictors exhibit high correlation in a regression model?

A

It indicates the presence of multicollinearity, which can obstruct the estimation of individual predictor effects.

101
Q

An assumption that the error terms in a regression analysis should be uncorrelated for any pair of observations.

A

Independent Errors

102
Q

Why is it essential to check for no multicollinearity in multiple regression?

A

High multicollinearity can inflate standard errors and make it difficult to assess the individual contribution of predictors

103
Q

What is regression analysis used for?

A

To examine the relationship between one dependent variable and one or more independent variables.

104
Q

What is the difference between simple and multiple regression?

A

Simple regression involves one independent variable, while multiple regression includes two or more independent variables.

105
Q

What does the regression coefficient (B) represent?

A

The amount of change in the dependent variable for a one-unit change in the independent variable, holding other variables constant.

106
Q

What is the R-squared (R²) value?

A

It indicates the proportion of variance in the dependent variable explained by the independent variables.

107
Q

What is multicollinearity, and why is it problematic in regression?

A

Multicollinearity occurs when independent variables are highly correlated, making it difficult to determine their individual effects on the dependent variable.

108
Q

What assumptions must be met for linear regression?

A

Linear relationship between variables.
Homoscedasticity (constant variance of errors).
No multicollinearity.
Normally distributed residuals.
Independence of observations.

109
Q

What is the purpose of standardizing variables in regression?

A

To compare variables with different units and scales by converting them to z-scores, making the regression coefficients comparable.

110
Q

What does a p-value in regression output indicate?

A

Whether the relationship between the independent variable and the dependent variable is statistically significant.

111
Q

What is the intercept in a regression equation?

A

The predicted value of the dependent variable when all independent variables are zero.

112
Q

How is a regression line calculated?

A

Using the equation
𝑌
=
𝑎
+
𝑏
𝑋
+
𝜖
Y=a+bX+ϵ, where
𝑌
Y is the dependent variable,
𝑎
a is the intercept,
𝑏
b is the regression coefficient,
𝑋
X is the independent variable, and
𝜖
ϵ is the error term.

113
Q

What is a residual in regression analysis?

A

The difference between the observed and predicted values of the dependent variable.

114
Q

What is adjusted R-squared, and how does it differ from R-squared?

A

Adjusted R-squared accounts for the number of predictors in the model, providing a more accurate measure of model fit.

115
Q

What diagnostic tests can be used to check regression assumptions?

A

Scatterplots for linearity.
Residual plots for homoscedasticity.
Variance Inflation Factor (VIF) for multicollinearity.
Histogram or Q-Q plot for normality of residuals.

116
Q

What is the purpose of hierarchical regression?

A

To determine the incremental variance explained by adding new predictors to the model.

117
Q

What is a dummy variable in regression?

A

A variable coded as 0 or 1 to represent categorical data.

118
Q

What is the difference between standardized and unstandardized coefficients in regression?

A

Unstandardized coefficients (B): Represent the change in the dependent variable for a one-unit change in the independent variable, measured in the original units.
Standardized coefficients (β): Represent the change in the dependent variable in standard deviations, making them comparable across variables.

119
Q

What is stepwise regression?

A

A method of adding or removing predictors in a regression model based on statistical criteria, such as p-values or changes in R-squared.

120
Q

What is a regression interaction term?

A

A variable created by multiplying two predictors to test whether the effect of one predictor on the dependent variable depends on the level of another predictor.

121
Q

How do you test for multicollinearity in regression?

A

Use the Variance Inflation Factor (VIF):

VIF > 10 suggests high multicollinearity.
Tolerance < 0.1 also indicates multicollinearity.

122
Q

What is logistic regression, and how is it different from linear regression?

A

Logistic regression: Used for a binary dependent variable (e.g., yes/no, 0/1).
The outcome is predicted as a probability using a logistic function, not a straight line.

123
Q

What is the Akaike Information Criterion (AIC) in regression?

A

A metric used to compare models, balancing goodness-of-fit and model complexity. Lower AIC values indicate a better model.

124
Q

What is overfitting in regression models?

A

When a model performs well on the training data but poorly on new, unseen data due to being overly complex.

125
Q

How do you interpret an odds ratio in logistic regression?

A

The odds ratio indicates the change in odds of the outcome occurring for a one-unit increase in the predictor variable.

126
Q

What is the difference between hierarchical and stepwise regression?

A

Hierarchical regression: Predictors are added in a specific, theory-driven order.
Stepwise regression: Predictors are added/removed based on statistical criteria without theoretical guidance.

127
Q

What is cross-validation in regression?

A

A technique to evaluate the predictive performance of a model by splitting the dataset into training and testing subsets.

128
Q

How do you interpret a negative coefficient in regression?

A

A negative coefficient indicates that as the predictor variable increases, the dependent variable decreases.

129
Q

What is the Durbin-Watson statistic?

A

A test statistic used to detect autocorrelation (independence) in the residuals of a regression model. Values close to 2 indicate no autocorrelation.

130
Q

What is ridge regression?

A

A type of regression that adds a penalty term to the regression equation to reduce multicollinearity and prevent overfitting.

131
Q

When would you use a polynomial regression model?

A

When the relationship between the independent and dependent variables is non-linear.

132
Q

What is bootstrapping in regression analysis?

A

resampling method used to estimate the sampling distribution of a statistic and to improve the reliability of regression estimates.

133
Q

What is the purpose of interaction plots in regression?

A

To visualize how the relationship between one predictor and the dependent variable changes at different levels of another predictor.

134
Q

What is the F-statistic in regression?

A

A measure of the overall significance of the regression model, testing whether the predictors collectively explain a significant portion of the variance in the dependent variable.

135
Q

What does a confidence interval for a regression coefficient tell you?

A

The range within which the true value of the coefficient is likely to fall, with a specified level of confidence (e.g., 95%).

136
Q

What is heteroscedasticity, and how can it be detected?

A

Unequal variance of residuals across levels of a predictor. It can be detected using residual plots or statistical tests like the Breusch-Pagan test.

137
Q

How do you handle missing data in regression analysis?

A

Listwise deletion (removing rows with missing data).
Imputation (filling in missing values with mean, median, or model-based estimates).
Multiple imputation (generating multiple plausible values for missing data).

138
Q
A