Biostat | Prefinal - Regression Analysis Flashcards
A graph that shows the relationship between the 2 variables.
Scatter Plot
Also called a Regression Line is a straight line that best represents the data on a scatter plot.
LINE OF BEST FIT
REGRESSION EQUATION:
Y = bx + a
The single variable being explained by the regression model - criterion
DEPENDENT VARIABLE (Y)
The explanatory variables used to predict the dependent variables - predictors
INDEPENDENT VARIABLE (X)
The values computed by the regression tool: reflecting explanation to dependent variable relationship
COEFFICIENTS (b)
The portion of the dependent variable that isn’t explained by the model.
RESIDUALS
METHODS
Linear Regression
> Straight-line relationship
Form: y=mx+b
METHODS OF REGRESSION ANALYSIS:
> Straight-line relationship
> Form: y=mx+b
Linear Regression
METHODS OF REGRESSION ANALYSIS:
> Implies curved relationship
> Logarithmic relationships
Non-Linear
METHODS OF REGRESSION ANALYSIS:
> data gathered from the same time period
Cross-Sectional
METHODS OF REGRESSION ANALYSIS:
> Involves data observed over equally spaced points in time.
Time series
> Only one dependent variable, x
Relationship between x and y is described by a linear function.
Changes in y are assumed to be caused by changes in x.
SIMPLE LINEAR REGRESSION MODEL
Regression variability that is explained by the relationship b/w X and Y
SSR
Unexplained variability, due to factors than the regression
SSE
CORRELATION COEFFICIENT:
> the strength of the relationship between X and Y variables
r
Total variability about the mean
SST
CORRELATION OF DETERMINATION:
> Proportion of explained variation
r Square
SD of error around the regression line
Standard Error
Significance of the Regression Model
TEST FOR LINEARITY
Variation of Model
Variation of Model
Errors may be positive or negative.
VARIABILITY
- Measures the total variable in Y
Sum of Squares Total (SST)
– Less than SST bcoz the regression line reduced the variability
Sum of Squared Error (SSE)
- Indicated how much of the total variability is explained by the regression model.
Sum of Squared due to Regression (SSR)
The proportion of the variability in Y is explained by the regression equation.
COEFFICIENT OF DETERMINATION
TEST FOR LINEARITY:
If the significance level for the F test is low,,,
reject the null hypothesis and conclude there is a linear relationship.
An F test is used to statistically test the null hypothesis that there is no linear relationship between the X and Y variables.
TEST FOR LINEARITY
The mean squared error (MSE) is the estimate of the error variance of the regression equation
S^2 = MSE = SSE
n - k - 1
STANDARD ERROR
ASSUMPTIONS OF THE REGRESSION MODEL
Errors are independent
Errors are normally distributed
Errors have a mean of zero
Errors have a constant variance
Special variables that are created for qualitative data
The number of dummy variables must equal to 1 less than the number of categories of the qualitative variable.
BINARY/DUMMY VARIABLES