Chapter 6: Simple Regression And Correlation Analysis Chapter 6: Simple Regression And Correlation Analysis Flashcards
What is a multivariate dataset
It is a set that consists of observations on two or more variables
What is a bivariate data set
It is a set that consists of two variables and the paired observations are denoted by (x1,y1), …, (xn,yn)
What is a dependent variable
It is a variable which can be partially explained by the independent variable. Therefore, the dependent variable is dependent on the behaviour of the independent variable
What is the assumption of changes in values of y and x
If it is assumed that changes in the value of y are explained by changes in the value of x, x is referred to as the independent variable and y is referred to as the dependent variable.
The independent variable is sometimes also called the predictor variable or the explanatory variable
We occasionally also refer to the independent variable is the response variable
What is a scatter plot
A scatter plot is a graphical representation of the values of the independent variable plotted against the dependent variable
The dependent variable is displayed on the vertical axis and the independent Variable on the horizontal axis. Every pair of (x, y) - values is plotted on the graph
What is interpolation
Estimates a y-value for a given x-value inside the interval of observed x-values
What is extrapolation
Estimates a y-value for given x-values outside the interval of observed x-values
What is correlation coefficient
The correlation coefficient is a measure of the strength of the linear relationship between two variables
What is the least squares curve
It makes the sum of the squared residuals as small as possible
What is a time series
A time series is the set of observations made at different points in time with equal duration between each observation.
Estimating the long-term trend in the timeseries enables the investigator to make predictions into the future
Irregular variation is present in a time series when they are unpredictable movements in the timeseries it as a result a difficult to describe mathematically
Regression methods (such as the method of least-squares) can be used to capture the long-term trend in a time series
What is the relationship as x increases and y increases
Positive linear relationship
What is the relationship as x increases and y increases but with greater variation
Weaker positive relationship
What is the relationship as x increases and y decreases but with greater variation
Weaker negative relationship
What is the relationship as x increases and y decreases
Negative relationship
What is the relationship as x increases and y increases and decreases
Non-linear relationship
What are the properties of the correlation coefficient
-1=<0=<1 The correlation coefficient always lies between -1 and 1
- r > 0 There is a positive linear relationship
- r < 0 There is a negative linear relationship
- r = 1 Perfect positive linear relationship
- r = -1 Perfect negative linear relationship
- r = 0 Non-linear relationship
between x and y
What is regression used for?
To forecast y bar for given x-values
In the regression equation what does an estimate of a represent and what does an estimate of b represent
a denotes the y intercept
b represents the gradient of the straight line (in other words, for each unit that x increases, the y value will, on average, increase by b units or decrease by b units)
What does the y bar symbol indicate
It indicates the straight line fitted through the data points in the scatterplot in such a way that it best describes the overall linear relationship between the variables
NB!! Y bar also represents the mean y value for the x value
What are residuals
Residual is the difference between the observed value (y) of the dependent variable and the value given by the estimated regression line
(y-ybar)
What is the least squares line
It is obtained by minimising the sum of the squares of the vertical distance between the observed points and the corresponding points on the line
What are the properties of the coefficient of determination
0=<Rsquared=<1
- if R = 1 It is a perfect fit of the curve to the observed data
- if R = 0 The curve does not fit the observed data
- For a straight line it is true that r=R
What denotes the covariance between x and y
Sxy
What are the standard deviations of x and y
Sx and Sy
What is the average of residuals
0
What is a residual plot
It is a scatterplot of the residuals (on the Y axis) against the ones on the (X axis)
What is an outlier
It is an observation outside the general pattern of the other observations
Is r dependent or independent of the units used when the observations are measured
Independent