regression Flashcards
A scatter plot
uses Cartesian coordinates to display the values for two variables so we can visualise the
relationship between them
Correlation
measures the relationship between these two continuous variables.
The Pearson correlation coefficient
describes the strength of (linear) association between them.
Hypothesis test for the correlation between two samples
using Pearson’s Correlation
Coefficient
correlation coefficient always takes on a value between -1 and 1 where:
-1: Perfectly negative linear correlation between two variables.
0: No linear correlation between two variables.
1: Perfectly positive linear correlation between two variables.
how to determine if a correlation coefficient is statistically significant,
you can calculate the corresponding t-score and p-value.
##
## data: Var_1 and Var_2
## t = 7.6064, df = 2, p-value = 0.01685
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4004041 0.9996629
## sample estimates:
## cor
## 0.9831516
How can we interpret the R output?
Since the correlation coefficient is postitve, it indicates that there is a postitve linear relationship between the two variables.
The p-value of the correlation coefficient is less than 0.05, the correlation is statistically significant.
difference between Pearson’s correlation coefficient and Spearman’s rank correlation
The Pearson’s correlation coefficient assumes normality for the two samples. However, Spearman’s rank correlation does not (as it is non-parametric).
what data is Spearman’s rank correlation appropriate for
both continuous and discrete variables.
Spearman’s rank correlation, rather than assuming a linear relationship, measures
the strength of a monotone relationship (i.e. the extent to which if one variable increases, the other one systematically increases/decreases
when is a regression analysis used
to try to predict the value of a dependent variable from one or more independent variables,
linear model
simplest form of regression model
the response (or dependent) variable is y and the continuous explanatory (or independent) variable
is x.
What can we
observed from the graph?
modeling the relationship between these two variables (which we will now call ‘fitting a model’), we
can use this model to predict the value of the dependent variable
Linear regression makes a series of assumptions
observations are independent
The residuals should not be predictable from the fitted values in any way.
If any feature of the residuals can be predicted once the fitted value is known, this assumption is violated.
## lm(formula = tannin_data$growth ~ tannin_data$tannin)
##
Coefficients:
## (Intercept) tannin_data$tannin
## 11.756 -1.217
how is this interpreted
β0=11.756 (intercept) and β1=-1.217 (slope)