week 5 & 6 Regression Flashcards

1
Q

What is the primary difference between ANOVA and regression in terms of the types of studies they are used for?

A

ANOVA is for experimental studies while regression is for observational studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does regression fundamentally begin with?

A

Regression begins with correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the relationship between correlation and causation?

A

Correlation does not imply causation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the common measure of effect used in both correlation and regression?

A

The correlation coefficient (r²) which represents the proportion of variance explained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can restriction of range on the independent variable lead to regarding the relationship?

A

It can underestimate the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How can extreme cases or outliers affect linear models in regression?

A

Outliers can skew the correlation, inflating or deflating results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a potential issue with using poor or proxy measures in correlation?

A

It may underestimate the correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the strength of effect in regression correspond to in terms of r² values?

A

r² values indicate the proportion of variance explained, with higher values indicating stronger relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A statistical method for predicting the value of one variable from another, using one or more predictors.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A measure that quantifies the direction and strength of a linear relationship between two variables.

A

Correlation Coefficient (r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The portion of variability in a dependent variable that can be attributed to the independent variable(s) in a regression model.

A

Variance Explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between simple regression and multiple regression?

A

Simple regression uses one predictor (independent variable), while multiple regression uses two or more predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A data point that differs significantly from other observations and can substantially affect the results of statistical analysis.

A

Outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The variable that is manipulated or varied in an experiment or regression analysis to assess its impact on the dependent variable.

A

Independent Variable (IV)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is it beneficial to use regression over simple correlation?

A

Regression allows for prediction of the outcome variable while accounting for multiple predictors, enhancing the understanding of variable relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a linear relationship express in the context of regression?

A

A linear relationship is expressed as a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the relationship between a line and a model in regression?

A

Your line is your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the two fundamental features that all lines possess in regression analysis?

A

All lines have a slope and an intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the term ‘error’ refer to in the context of regression?

A

Error refers to the difference between your modeled line and the actual data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does b1 represent in regression analysis?

A

b1 is the regression coefficient for the predictor and represents the gradient (slope) of the regression line, indicating the direction and strength of the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is represented by b0 in a regression equation?

A

b0 is the intercept, which is the value of Y when X = 0, marking the point where the regression line crosses the Y-axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How can one estimate the outcome using multiple predictors in regression?

A

By entering the value of the predictor, multiplied by the coefficient, and adding the intercept, one can estimate the outcome.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A statistical process for estimating the relationships among variables, allowing for the prediction of one variable based on the values of others.

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The slope of a regression line, represented by b1, indicates the direction and strength of the relationship between the independent and dependent variables.

A

Slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

The intercept of a regression line, represented by b0, is the value of the dependent variable when all independent variables are set to zero.

A

Intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

The discrepancy between predicted values from the regression model and the actual observed values.

A

Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A variable that is used in a regression model to predict the outcome of another variable.

A

Predictor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the method of least squares used for?

A

It is used to find the line of best fit for a set of data by minimizing the sum of the squares of the residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a residual in the context of least squares?

A

A residual is the difference between the observed data and the predicted values generated by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

A straight line that best represents the data points in a scatter plot, minimizing the sum of the squares of the vertical distances of the points from the line.

A

Line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

A statistical method used to measure the total variability in a dataset, often decomposed into different components such as total variability, model variability, and residual variability.

A

Sums of Squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do you assess the quality of a regression model?

A

By analyzing how well the model fits the observed data through metrics like Sums of Squares, ANOVA output, and Mean Squared Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Total Sum of Squares, representing total variability in the data, calculated as the variability between individual scores and the mean.

A

SST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Sum of Squares for Residuals, indicating the variability between the actual data and the values predicted by the regression model.

A

SSR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Sum of Squares for the Model, measuring the improvement in variability explained by fitting the regression model compared to the mean.

A

SSM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What does a higher SSM compared to SSR indicate

A

It suggests that the model provides better predictions than simply using the mean of the observed data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What does ANOVA stand for in the context of regression analysis?

A

ANOVA stands for Analysis of Variance, which tests the differences between the means of several groups and is used to evaluate the performance of the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

A metric that quantifies the average squared difference between the observed values and the values predicted by the model, reflecting the error of the model.

A

Mean Squared Error (MSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What does the F-ratio represent in regression analysis?

A

The F-ratio is a statistic calculated to compare the mean of the sums of squares from the model to the mean of the sums of squares from the residuals, indicating whether the model effectively explains the variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

A measure that represents the proportion of variance accounted for by the regression model, indicating the strength of the relationship between the predictors and outcome variable.

A

r² (R-squared)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How is r² similar to Pearson Correlation

A

r² is similar to squaring the r value obtained from Pearson Correlation, as it provides an understanding of the proportion of variance explained, but it can include multiple predictors in regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

The measure of how much the values in a dataset differ from the mean of that dataset, reflecting the spread of the data points.

A

Variance

43
Q

What role does the residual play in evaluating the fit of the regression model?

A

The residual indicates how well the model predicts the observed data; smaller residuals signify a better fit.

44
Q

A line that best describes the relationship between independent and dependent variables, derived from the least squares method.

A

Regression Line

45
Q

What factor determines whether a regression model is preferred over the mean?

A

If the model results in significantly lower error and better predictions compared to the mean, it is considered preferred.

46
Q

The process of using a regression model to estimate future values of the dependent variable based on new values of independent variables

A

Prediction

47
Q

What does a regression analysis output typically include?

A

It includes statistics such as R-squared, F-ratio, coefficients of the predictors, p-values, and sums of squares (SST, SSR, SSM) to evaluate model performance

48
Q

Write up the regression equation:

A

Yi = b0 + b1Xi
outcome = intercept + slope * predictor

49
Q

What is multiple regression?

A

Multiple regression is a model to predict the value of one variable from multiple predictors, extending simple regression to include several variables.

50
Q

How does multiple regression relate to simple regression?

A

Multiple regression is a natural extension of the simple regression model, which involves predicting a variable from just one predictor.

51
Q

What type of relationship does multiple regression model?

A

Multiple regression models the hypothetical relationship between several variables.

52
Q

What is the structure of a multiple regression equation?

A

The equation of multiple regression is similar to that of simple regression but includes additional predictors, forming a straight line

53
Q

In a regression equation, what does b0 represent?

A

In a regression equation, b0 represents the intercept, which is the value of the Y variable when all X variables are equal to zero.

54
Q

What role do regression coefficients play in a multiple regression equation?

A

Regression coefficients (bi) quantify the effect of each predictor (Xi) on the outcome variable (Y).

55
Q

Can a multiple regression model have more than one predictor?

A

Yes, a multiple regression model can include multiple predictors.

56
Q

What is the significance of the intercept in regression analysis?

A

The intercept indicates the expected value of the dependent variable when all independent variables are zero.

57
Q

General Linear Model

A

A statistical framework that encapsulates multiple regression, involving the relationship between a dependent variable and multiple independent variables.

58
Q

Multiple Regression

A

A statistical technique used to model the relationship between one dependent variable and two or more independent variables.

59
Q

A value that represents the effect of a unit change in a predictor variable on the dependent variable in a regression analysis.

A

Regression Coefficient

60
Q

The point at which the regression line crosses the Y-axis, indicating the expected value of the dependent variable when all predictors are zero.

A

Intercept

61
Q

What dimensionality does multiple regression introduce?

A

Multiple regression introduces additional dimensions as it includes multiple predictors, visualized as a regression plane.

62
Q

The outcome variable that a researcher aims to predict or explain in a regression analysis.

A

Dependent Variable

63
Q

A variable that is hypothesized to influence or predict the dependent variable in a regression analysis.

A

Independent Variable

64
Q

How is the regression plane defined in a multiple regression model?

A

The regression plane in a multiple regression model is defined by the intercept and regression coefficients of the predictors

65
Q

Predictors

A

Variables that are used in a regression analysis to predict the outcome of the dependent variable.

66
Q

What remains constant in the structure of multiple regression?

A

Despite the number of predictors, the model’s structure remains linear.

67
Q

A statistical method that models the relationship between two variables by fitting a linear equation to observed data

A

Linear Regression

68
Q

Why is the General Linear Model considered a foundational aspect of statistical analysis?

A

The General Linear Model provides a comprehensive framework for analyzing and understanding relationships between variables in various types of data.

69
Q

What are the three main methods of regression?

A

Hierarchical, Forced Entry, Stepwise.

70
Q

A regression method where all predictors are entered into the model simultaneously, relying on strong theoretical reasons for variable inclusion.

A

Forced Entry Regression

71
Q

What is the purpose of Hierarchical Regression?

A

Hierarchical Regression allows the experimenter to decide the order in which known predictors are entered, thereby assessing their unique contributions to the outcome.

72
Q

A data-driven approach where predictors are selected based on their semi-partial correlation with the outcome, using mathematical criteria rather than theory.

A

Stepwise Regression

73
Q

Why is Hierarchical Regression considered the best method?

A

It is based on theory testing and allows the unique predictive influence of new variables to be assessed while holding known predictors constant.

74
Q

Variables that can take on nominal values, such as different species in a study, which can be recoded for mathematical analysis in regression.

A

Categorical Predictors

75
Q

What is a major drawback of Stepwise Regression?

A

It may depend on slight numerical differences in semi-partial correlations, leading to significant theoretical implications.

76
Q

A statistical measure that indicates the unique contribution of a predictor to the outcome variable after controlling for other predictors.

A

Semi-partial Correlation

77
Q

How does a researcher determine the order of variables in Hierarchical Regression?

A

The researcher uses their theoretical understanding based on past research to decide which predictors to enter first.

78
Q

The portion of variance in the outcome variable that is accounted for by a specific predictor after considering other predictors in the model.

A

Unique Variance Explained

79
Q

What is the first step in Stepwise Regression when using SPSS?

A

SPSS looks for the predictor that can explain the most variance in the outcome variable.

80
Q

A variable that has two possible levels, such as having a characteristic versus not having it, which can be used in regression analysis.

A

Dichotomous Variable

81
Q

What is a critical consideration when using Forced Entry Regression?

A

The importance of having strong theoretical justifications for the inclusion of specific variables in the model.

82
Q

The process of using statistical methods, like Stepwise Regression, primarily to discover relationships in the data without strong theoretical backing.

A

Data Exploration

83
Q

What does it mean to recode categorical data for regression

A

It involves transforming nominal variables into numerical form that can be analyzed mathematically in regression analyses.

84
Q

What is the main goal of generalizing a sample model in multiple regression?

A

The main goal is to generalize the findings from the sample model to the entire target population.

85
Q

The process of applying findings from a sample model to the entire population, provided that certain assumptions are met.

A

Generalisation

86
Q

What type of outcome variable must be present in multiple regression analysis?

A

The outcome must be continuous.

87
Q

A condition where predictors in a regression model must display variance; otherwise, no estimation can occur.

A

Non-Zero Variance

88
Q

What is the assumption regarding the linearity in multiple regression?

A

The modeled relationship must be linear; it should not be curvilinear.

89
Q

The assumption that all values of the outcome variable should come from different individuals.

A

Independence

90
Q

What does multicollinearity refer to in multiple regression?

A

Multicollinearity exists when predictors are highly correlated with each other.

91
Q

The assumption that the variance of the error term should remain constant for each value of the predictors.

A

Homoscedasticity

92
Q

What does Cook’s Distance measure in the context of multiple regression?

A

Cook’s Distance measures the influence of a single case on the overall model.

93
Q

The residuals that have been transformed into Z-scores; typically, 95% should lie between ±2 in a normal distribution.

A

Standardised Residuals

94
Q

How can we identify potential outliers in standardized residuals?

A

Any case with an absolute standardized residual value of 3 or more is considered likely to be an outlier.

95
Q

Cases that have a disproportionate impact on the overall fit of the regression model, as measured by metrics like Cook’s Distance.

A

Influential Cases

96
Q

What should you check regarding the multivariate outliers in regression?

A

Look for extreme combinations of scores across multiple variables, not just single variable outliers.

97
Q

A measure used to identify multivariate outliers based on the distance of a case from the mean of a distribution, considering the covariance of the variables

A

Mahalanobis Distance

98
Q

What is the typical threshold for a case’s value in Cook’s Distance to determine influence?

A

A value greater than 1 indicates that a case is likely to be influential on the model.

99
Q

The differences between observed values and the values predicted by the regression model; they represent the error in prediction.

A

Residuals

100
Q

What does it mean if predictors exhibit high correlation in a regression model?

A

It indicates the presence of multicollinearity, which can obstruct the estimation of individual predictor effects.

101
Q

An assumption that the error terms in a regression analysis should be uncorrelated for any pair of observations.

A

Independent Errors

102
Q

Why is it essential to check for no multicollinearity in multiple regression?

A

High multicollinearity can inflate standard errors and make it difficult to assess the individual contribution of predictors

103
Q
A