Chapter 3 Flashcards
study
Linear Regression Predicts
value of a variable based on the value of another variable
How do you know your dealing with linear regression?
- Outcome variable
- Predictor variable
multiple regression
two or more predictor variables
Linear Equation
Y = bX+a
In linear equations,
X and y are variables
a and b are fixed constants
The regression analysis in a linear equation is how we get
a and b are fixed constants
In a linear equation, b is
the slope- how much Y changes when X is increased by 1 point.
In a linear equation, a is
the Y-intercept- determines the value of Y when X = 0.
Regression is
a method of finding an equation describing the best-fitting line for a set of data.
The best fit line for the actual data is one that
minimizes prediction errors
y-hat is
value of Y predicted by regression equations
(Y- Y hat) is
Error of prediction
(Y- Y hat) is a method called
the least-squared-error solution
Using Regression for Prediction
be cautious when interpreting predicted values
When using Regression for prediction
do not use the regression equation to make predictions outside the existing range of X values
When it comes to using regression for prediction, you can only
predict within existing range of X values. The regression equation may change outside the existing range of X values
To test the regression significance, you use
Analysis of regression
Analysis of Regression
H0: the slope of the regression line (b) is zero
H1: at least one predictor has a slope (b) significantly different from zero.
Anova table tells you if regression equation (model) is significant.
Multiple Regression Assumptions
- Must be a linear relationship between two variables
- Homoscedasticity
- Residuals (errors) of the regression line approximately normally distributed
- No multicollinearity
To check for linear relationship with
scatter plot matrix
Graphs with scatter plot matrix
legacy
Dialogs with scatter plot matrix
scatter/dot…
Interpreting results for a multiple linear regression is to report
- Type of test (multiple linear regression)
- Predictor & Outcome variables
regression line equation - whether the model was ststistically significant (report F-test/ANOVA)
- Which predictors were significant (slope/beta/B)
- R^2
Linear Regression is the setup after correlation that
Predict value of a variable based on the value of another variable
- Outcome variable
- Predictor variable
Uses the equation
Multiple Regression
Introduction to Correlation
Measures and describes the relationship between two variables
- no manipulation
- must be measured on interval/ratio scale
Characteristics of relationship
- Direction (negative or positive; indicated by the sign, + or – of the correlation coefficient)
- Form (linear is most common)
- Strength or consistency (varies from 0 to 1)
In a Correlation, the direction can be
positive- both variables moving in the same direction
negative- variable move in opposite directions
In a Correlation, the strength can be
- Closer to 0, the weaker the correlation
- Closer tp -/+ the stronger the correlation
Examples of Correlations
- The relationship between income and happiness.
- The relationship between stress levels and hours worked per week,
- The relationship between family income and student GPA.
What does the Pearson Correlation do?
Measures the degree and the direction of the linear relationship between two variables
- Not appropriate for curvilinear relationships
Perfect linear relationship
- Every change in X has a corresponding change in Y
- Correlation will be –1.00 or +1.00
The Pearson Correlation equation
r = covariability of X and Y/ variability of X and Y separately
Correlations used for:
- Prediction
- Validity
- Reliability
- Theory verification
Interpreting Correlations
- Correlation does NOT equal causation
- Establishing causation requires an experiment where one variable is manipulated and others carefully controlled.
- The dangers of interpreting correlation.
Correlations & Restricted Range of Scores
-Correlation value is affected by range of scores in the data
Severely restricted range may provide a very different correlation than a broader range of scores
- Floor & Ceiling Effects
Never generalize a correlation beyond the sample range of data
Correlations and Outliers
Characterized by much larger (or smaller) score than others. in sample
outliers have disproportionately large impact on correlation coefficient
Clearly recognizable in a scatter plot
Correlation and Strength of Relationships
A correlation coefficient measures the degree of relationship on a scale from 0 to +1.00
- 100% predictability
It is easy to mistakenly interpret this decimal number as a percent or proportion
- Correlation is not a proportion
Squared correlation
May be interpreted as proportions of shared variability
is called the coefficient of determination
Coefficient of determination measures
proportion of variability in one variable that can be determined from the relationship with the other variable (shared variability)
Hypothesis Tests With Pearson Correlation
Pearson’s r can be used with hypothesis testing, to determine whether the r is statistically significant.
Ho: There is no relationship between the two variables.
H1 : There is a relationship between the two variables. (nondirectional/two-tailed)
H1 : There is a negative relationship between the two variables. (directional/one-tailed)
Interpreting results for a Correlations test
Report
- Statistical significance
- Effect size (weak, moderate, strong)
- Practical Significance (discussion)
Test results
- Type of test and variables
- Value of correlation (sign & value)
- df (n-2)
- p-value/significance level