Correlation and Regression Flashcards
relationship between variables
correlation
r = 0 to +1
positive correlation
r = 0 to -1
negative correlation
height = 50.75 + 0.9741 (femur)
what is the b
50.75
height = 50.75 + 0.9741 (femur)
what is the a
0.971
height = 50.75 + 0.9741 (femur)
what is the x
femur
height = 50.75 + 0.9741 (femur)
what is the y
height
height = 50.75 + 0.9741 (femur)
what does the slope tells us
the model predicts that each additional increase of femur length, is associated with 0.9741 increase of height
height = 50.75 + 0.9741 (femur)
what is the y intercept
50.75
height = 50.75 + 0.9741 (femur)
what does 50.75 mean
if there is 0 femur length, 50.75 will be the height
A measure of association between two numerical variables.
correlation
Typically, in the summer as the temperature increases people are thirstier.
what type of correlation
positive
measures the direction and the strength of the linear association between two numerical paired variables.
pearson’s sample correlation coefficient r
as the x variable increases so does the y variable
positive correlation
as the x variable increases, the y variable decreases.
negative correlation
As the price of an item increases, the number of items sold decreases.
what kind of correlation
negative
r value interpretation
1
perfect positive linear relationship
r value interpretation
0
no linear relationship
r value interpretation
-1
perfect negative linear relationship
The strength of the linear association is measured by the
sample correlation coefficient r
r value of
0.9
strong association
r value of
0.5
moderate association
r value of
0.25 weak association
weak association
Specific statistical methods for finding the “line of best fit” for one response (dependent) numerical variable based on one or more explanatory (independent) variables.
regression
Includes using statistical methods to assess the “goodness of fit” of the model. (ex. Correlation Coefficient)
regression
3 main purposes of regression
to describe
to predict
to control
model a set of data with one dependent variable and one (or more) independent variables
what purpose of regression
to describe
or estimate the values of the dependent variable based on given value(s) of the independent variable(s).
what function of regression
to predict
administer standards from a useable statistical relationship
what purpose of regression
to control
Statistical method for finding
the “line of best fit”
for one response (dependent) numerical variable
based on one explanatory (independent) variable.
simple linear regression
what is b
y = a + bx
slope
what is a
y = a + bx
y intercept
what is r
y = a + bx
correlation coefficient
what is r^2
y = a + bx
coefficient of determination
y=1.5*x - 96.9
1.5 oz of water drank
1 degree F increase in temp
what is the slope
for each 1 degree F increase in temperature, you expect an increase of 1.5 ounces of water drank.
y=1.5*x - 96.9
1.5 oz of water drank
1 degree F increase in temp
what is the y-intercept
when the temp is 0 degrees F, then the person would drink about -97 oz of water
y=1.5*x - 96.9
1.5 oz of water drank
1 degree F increase in temp
predict the amount of water when the temp is 95
45.6 oz
tells the percent of the variation in the response variable that is explained (determined) by the model and the explanatory variable.
coefficient of determination
coefficient of determination tells us
the percent of the variation in the response variable that is explained (determined) by the model and the explanatory variable.
r2 =92.7%.
what does it mean?
About 93% of the variability in the amount of water consumed is explained by the outside temperature using this model
Therefore, 7% of the variation in the amount of water consumed is not explained by this model using temperature
application of regression
predicting solar maximum
estimating seasonal sales
Predicting Student Grades Based on Time Spent Studying
for a regression /correlation problem, first thing to do is to:
check for normality
descriptives (Shapiro Wilk)
Amount of rainfall in area - 0.968
Quality of air pollution removed - 0.607
which data is normal
both is normal
hypothesis for the Amount of rainfall (x) and quantity of air pollution removed (y)
Ho: there is no significant relationship between the amount of rainfall in an area and the quantity of air pollution removed.
Ha: there is a significant relationship between the amount of rainfall in an area and the quantity of air pollution removed.
the Amount of rainfall (x) and quantity of air pollution removed (y)
correlation matrix p value is <.001
interpret
since the p value of correlation matrix is less than 0.05, we reject the Ho
Ho of correlation matrix
there is NO correlation between x and y
the Amount of rainfall (x) and quantity of air pollution removed (y)
pearson’s r value is -0.979
interpret
there is a strong, negative, and significant relationship between the amount of rainfall in an area and quantity of air pollution removed
the Amount of rainfall (x) and quantity of air pollution removed (y)
r^2 is = 0.958
interpret
95.8% of the variablity in the quantity of air pollution removed is due to the variability in the amount of rainfall in an area
the Amount of rainfall (x) and quantity of air pollution removed (y)
omnibus anova test p value = <.001
interpret
the model is signficant
the Amount of rainfall (x) and quantity of air pollution removed (y)
intercept 153.175
amount of rainfall in an area -6.324
create a slope
y = 153 -6.324(amount of rainfall)
the Amount of rainfall (x) and quantity of air pollution removed (y)
y = 153 -6.324(amount of rainfall)
interpret a
153.175 is the quantity of air pollution removed if the amount of rainfall in an area is zero
the Amount of rainfall (x) and quantity of air pollution removed (y)
y = 153 -6.324(amount of rainfall)
interpret b
for every 1 unit increase in the amount of rainfall in an area, there is a 6.324 decrease in the quantity of air pollution removed
the Amount of rainfall (x) and quantity of air pollution removed (y)
y = 153 -6.324(amount of rainfall)
how much pollution is removed if the amount of rainfall is 5.0?
y = 121.56 quantity of air pollution removed
Correlation analysis is a measure of causal relationship between two variables
True
Neither true nor false
False
Sometimes true
False
If the correlation coefficient is a positive value, then the slope of regression line must be
Either positive or negative
Negative
Neither negative nor positive
positive
positive
If there exist a negative strong correlation between variables X and Y, then we can conclude that
The increase in X causes Y to decrease
The increases in X causes Y to increase
As the value of X increases, the value of Y decreases
As the value of X increases the value of Y also increases
as the value of x increases, the value of y decrease
The correlation coefficient is used to determine
A specific value of the x-variable given a specific value of the y-variable
The strength of linear relationship between the x and y variables
A specific value of the y-variable given a specific value of the x-variable
The difference between the direction y-variable and x-variable
he strength of linear relationship between the x and y variables
In regression, the equation that describes how the response variable (y) is related to the explanatory variable (x) is:
The correlation model
Used to compute the correlation coefficient
The regression model
The coefficient of determination mod
regression model
In regression analysis, if the independent variable is measured in kilograms, the dependent variable
Must also be in kilograms
Cannot be in kilograms
Must be in some unit of weight
Can be any units
can be any units
In regression analysis, the variable being predicted is the
Intervening variable
Independent variable
Response variable
Predictor variable
response variable
The correlation coefficient is 0.8, and the percentage of variation in the response variable explained by the variation of the explanatory variable is
0.64%
64%
0.80%
80%
64%
Which of the following values of correlation coefficient r show strong correlation
-0.91
0.525
0.01
1.0
-0.91
If the coefficient of determination is 0.81, the correlation coefficient is
0.9 or -0.9
-0.651
90%
0.6561
0.9
The study “Determinants of Board exam results in engineering” specifically aims to determine the linear relationship of Board Exam Score (BScore) and Entrance Exam Score in College (EScore). A correlation and regression analyses were used in the study an obtained the following results
Correlation analysis:
r= 0.924
Simple linear regression:
a= 25.17
b= 0.677
What is the estimate of the regression line?
EScore(y)=25.17+0.667Bscore(x)
EScore(y)=0.667+25.17BScore(x)
B Score(y)= 0.667+25.17Escore(x)
B score(y) = 25.17+0.667Escore(x)
B score(y) = 25.17+0.667Escore(x)
A regression analysis between sales (in P1000) and price (in peso) resulted in the following equation: y(sales) = 50,000 - 8x(price). The above equation implies that an
Increase in P1 in price is associated with the decrease of P8000 in sale
Increase of P8 in price is associated with an increase of P8000 in sales
Increase of P1 in price is associated with a decrease of P8 in sales
Increase of P1 in price is associated with a decrease of P42,000 in sales
Increase in P1 in price is associated with the decrease of P8000 in sale
The study “Determinants of Board exam results in engineering” specifically aims to determine the linear relationship of Board Exam Score (BScore) and Entrance Exam Score in College (EScore). A correlation and regression analyses were used in the study an obtained the following results
Correlation analysis:
r= 0.924
Simple linear regression:
a= 25.17
b= 0.677
Which of the given statements best described the correlation coefficient
There is a positive correlation between Escore and Bscore
There is a very strong correlation between Escore and Bscore
There is a very strong positive correlation between Escore and Bscore
There is a very strong linear relationship between Escore and Bscore
There is a very strong positive correlation between Escore and Bscore
The study “Determinants of Board exam results in engineering” specifically aims to determine the linear relationship of Board Exam Score (BScore) and Entrance Exam Score in College (EScore). A correlation and regression analyses were used in the study an obtained the following results
Correlation analysis:
r= 0.924
Simple linear regression:
a= 25.17
b= 0.677
Which of the following best describe the slope of the regression line
The slope of the regression line suggest that a 1 unit increase in Bscore there is a 25.17 unit increase in E score
The slope of the regression line suggests that a 1 unit increase in Escore there is 25.17 unit increase in Bscore
The slope of regression line suggest that a 1 unit increase in the Bscore there is a 0.667 increse in Escore
The slope of the regression line suggests that 1 unit increase in the Escore that there is a 0.667 increase in Bscore
The slope of the regression line suggests that 1 unit increase in the Escore that there is a 0.667 increase in Bscore
If there is a very strong correlation between two variables then the correlation coefficient must be
Much smaller than 0, if the correlation is negative
Much larger than 0, regardless whether the correlation is negative or positive
Very near to zero if the correlation is positive
Any value larger than 1
Much larger than 0, regardless whether the correlation is negative or positive
Which of the following values of correlation coefficient r show weak correlation?
-1.0
0.11
0.89
-0.54
0.11
Regression modeling is a statistical framework for developing a mathematical equation that describes how:
A. one response and one or more explanatory variables are related
B. one explanatory and one or more response variables are related
C. one response and one explanatory variables are related
D. several explanatory and several response variables response are related
one response and one or more explanatory variables are related