unit 7 Flashcards

1
Q

What is a model in the context of data analysis?

A

A model is a theoretical and simplified approximation of reality that allows it to be explained, controlled, and predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the General Linear Model?

A

The General Linear Model is a set of parametric analyses that aim to predict a variable based on one or more variables, assuming a linear relationship between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some common statistical methods that are based on the General Linear Model?

A

Correlation, Student’s t-tests, ANOVA, and Linear regression are all variations of the General Linear Model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the least squares method?

A

The least squares method is a technique used to find the estimate that minimizes the difference between observed and predicted values (residuals).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a residual in the context of linear regression?

A

A residual is the difference between the observed and predicted value (𝜀 = Y - ෠Y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between a statistical model and a mathematical one?

A

A statistical model includes terms that represent the error/residue that can occur when making a prediction, while a mathematical model does not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the goal of linear regression?

A

Linear regression aims to predict changes in a dependent variable (Y) based on changes in an independent variable (X).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the minimum number of quantitative variables required for linear regression?

A

Linear regression requires at least two quantitative variables that are linearly associated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between a simple regression model and a multiple regression model?

A

A simple regression model has one predictor variable, while a multiple regression model has more than one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the equation of the linear regression model?

A

The equation of the linear regression model is ෠Y = β0 + β1 · Xi + ℇ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does Y’ represent in the equation of the regression line?

A

Y’ (or ෠) represents the predicted value of the outcome (dependent) variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does b0 represent in the regression line equation?

A

b0 represents the intercept (constant at origin), which is the value of Y when X is zero. It is also the value of Y not affected by X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does b1 represent in the regression line equation?

A

b1 represents the slope (regression coefficient), which indicates how much Y changes when X changes by one unit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the formula to calculate the slope (b1)?

A

The slope (b1) is calculated as b1 = SXY / S²X.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is the intercept (b0) calculated?

A

The intercept (b0) is calculated as b0 = ҧY − b1 · ҧX.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the process of fitting the regression line?

A

Fitting the regression line is the process by which the regression line is defined, which involves calculating the slope and intercept.

17
Q

How can you use a regression equation to predict values?

A

To predict a value of Y, you must replace the value of the independent variable (X) in the equation and solve for the predicted value of Y (෠Y).

18
Q

What does the coefficient of determination (R²) indicate?

A

The coefficient of determination (R²) indicates the proportion of the variability in Y that can be explained by the variability in X.

19
Q

How is the coefficient of determination (R²) calculated?

A

The coefficient of determination (R²) is calculated by squaring the correlation coefficient (r): R² = r²xy.

20
Q

What is the range of values for the coefficient of determination (R²)?

A

The coefficient of determination (R²) ranges from 0 to 1.

21
Q

What does an R² value of 0 indicate?

A

An R² value of 0 means that the model does not explain or predict any of Y.

22
Q

What does an R² value of 1 indicate?

A

An R² value of 1 means that the model explains or predicts 100% of Y.

23
Q

What does the value (1 – R²) represent?

A

The value (1 – R²) represents the proportion of Y that is explained by other variables not included in the model.

24
Q

Which contrast statistic is used in hypothesis testing in a regression model?

A

ANOVA (Snedecor’s F) is used as a contrast statistic in hypothesis testing in a regression model.

25
Q

What is the null hypothesis (H0) in hypothesis testing in the regression model?

A

The null hypothesis (H0) is that the model does not have a good fit or adjustment (i.e., all slopes are equal to zero) H0 : β1 = β2 = ⋯ βk = 0.

26
Q

What is the alternative hypothesis (H1) in hypothesis testing in the regression model?

A

The alternative hypothesis (H1) is that the model has a good fit or adjustment (i.e., at least one slope is not equal to zero) H1 : βk ≠ 0.

27
Q

Which statistic test is used to test the impact of X on Y?

A

Student’s t statistic is used to test the impact of X on Y.

28
Q

What is the difference between regression and multiple regression?

A

Regression predicts Y from one X, while multiple regression predicts Y from two or more X variables.

29
Q

What are some of the requirements/assumptions of regression models?

A

Regression models assume a) homoscedasticity, b) normal distribution of residuals, c) independence of errors, and d) absence of multicollinearity in multiple regression.

30
Q

What does multicollinearity refer to?

A

Multicollinearity refers to the absence of a relationship between the predictor variables (VI).