unit 7 Flashcards
What is a model in the context of data analysis?
A model is a theoretical and simplified approximation of reality that allows it to be explained, controlled, and predicted.
What is the General Linear Model?
The General Linear Model is a set of parametric analyses that aim to predict a variable based on one or more variables, assuming a linear relationship between them.
What are some common statistical methods that are based on the General Linear Model?
Correlation, Student’s t-tests, ANOVA, and Linear regression are all variations of the General Linear Model.
What is the least squares method?
The least squares method is a technique used to find the estimate that minimizes the difference between observed and predicted values (residuals).
What is a residual in the context of linear regression?
A residual is the difference between the observed and predicted value (𝜀 = Y - Y).
What is the difference between a statistical model and a mathematical one?
A statistical model includes terms that represent the error/residue that can occur when making a prediction, while a mathematical model does not.
What is the goal of linear regression?
Linear regression aims to predict changes in a dependent variable (Y) based on changes in an independent variable (X).
What is the minimum number of quantitative variables required for linear regression?
Linear regression requires at least two quantitative variables that are linearly associated.
What is the difference between a simple regression model and a multiple regression model?
A simple regression model has one predictor variable, while a multiple regression model has more than one.
What is the equation of the linear regression model?
The equation of the linear regression model is Y = β0 + β1 · Xi + ℇ.
What does Y’ represent in the equation of the regression line?
Y’ (or ) represents the predicted value of the outcome (dependent) variable.
What does b0 represent in the regression line equation?
b0 represents the intercept (constant at origin), which is the value of Y when X is zero. It is also the value of Y not affected by X.
What does b1 represent in the regression line equation?
b1 represents the slope (regression coefficient), which indicates how much Y changes when X changes by one unit.
What is the formula to calculate the slope (b1)?
The slope (b1) is calculated as b1 = SXY / S²X.
How is the intercept (b0) calculated?
The intercept (b0) is calculated as b0 = ҧY − b1 · ҧX.
What is the process of fitting the regression line?
Fitting the regression line is the process by which the regression line is defined, which involves calculating the slope and intercept.
How can you use a regression equation to predict values?
To predict a value of Y, you must replace the value of the independent variable (X) in the equation and solve for the predicted value of Y (Y).
What does the coefficient of determination (R²) indicate?
The coefficient of determination (R²) indicates the proportion of the variability in Y that can be explained by the variability in X.
How is the coefficient of determination (R²) calculated?
The coefficient of determination (R²) is calculated by squaring the correlation coefficient (r): R² = r²xy.
What is the range of values for the coefficient of determination (R²)?
The coefficient of determination (R²) ranges from 0 to 1.
What does an R² value of 0 indicate?
An R² value of 0 means that the model does not explain or predict any of Y.
What does an R² value of 1 indicate?
An R² value of 1 means that the model explains or predicts 100% of Y.
What does the value (1 – R²) represent?
The value (1 – R²) represents the proportion of Y that is explained by other variables not included in the model.
Which contrast statistic is used in hypothesis testing in a regression model?
ANOVA (Snedecor’s F) is used as a contrast statistic in hypothesis testing in a regression model.
What is the null hypothesis (H0) in hypothesis testing in the regression model?
The null hypothesis (H0) is that the model does not have a good fit or adjustment (i.e., all slopes are equal to zero) H0 : β1 = β2 = ⋯ βk = 0.
What is the alternative hypothesis (H1) in hypothesis testing in the regression model?
The alternative hypothesis (H1) is that the model has a good fit or adjustment (i.e., at least one slope is not equal to zero) H1 : βk ≠ 0.
Which statistic test is used to test the impact of X on Y?
Student’s t statistic is used to test the impact of X on Y.
What is the difference between regression and multiple regression?
Regression predicts Y from one X, while multiple regression predicts Y from two or more X variables.
What are some of the requirements/assumptions of regression models?
Regression models assume a) homoscedasticity, b) normal distribution of residuals, c) independence of errors, and d) absence of multicollinearity in multiple regression.
What does multicollinearity refer to?
Multicollinearity refers to the absence of a relationship between the predictor variables (VI).