correlation Flashcards
bivariate data
pairs of values as variables
independent variable
x axis ( explanatory variable)
dependent
y axis (response variable )
types of corelation
strong negative, weak negative strogn positive, weak positve
casual relationship
if one variable causes a change in the other !
comment on the claim that hotter coutnries have less rainfall
the graph does not support the statement that hotter coutnries have less rainfall !
“describe and interpret the corellation between 2 variables “
there is a positive/negative corelation since as ……. increases/decrerases, ??????? increases/decreases
there is a weak negative corelation between internet speed and house value. danyal concludes this ; suggest why he may be wrong
there may be 3rd variable that influences house value and internet connection - eg distance from built up areas
outlier formula
upper: q3 + 1.5(IQR) Lower: q1` - 1.5 (IQR)
give a reason why you might exclude an anomaly give a reason for including an anomally ?
exclude: anomally is an outlier and not representative
include: “anomally” part of distribution data so include it
what kind of corellation is this ?
and what does it show

weak negative ( overall downward trend )
a casual relationship between two variables
type of line of best fit
least squares regression line
regression line that minimises sum of squares of distrances of each data point
D=point on graph
minimses value of… D1 ^2 + D2^2 + D3^2etc.. .
(x,y)
formula for regression line
y= a +bx
order of variabels = importnat
regression line of y on x is different from x on y
coefficent (B) = changei n y for each unit of change in x, example: if b is negative…then data negatively correlated
vice versa
w= windspeed ( knots)
g= gust ( knots)
give an interpretation of the value of the gradient of this regression line?

just say what the gradient does…
if the valeu of windspeed is 10 knots (Exmaple ) , the daily maximum ust increases by 18 knots
justify the use of a linear regression line ?
because corelation suggests a linear relationship!
when should you use lienar regression line ?
values of dependent variable that ware within a range of given data ( interpolation )
value is inside the range of data so is linear regrression used
said value is within the range of data so linear regression is more likely to be accurate
if outside the data ( extrapoaltion), linear regressio nlessl ikely to be accurate
regression equation y= 2 + 6x
man wants to estimate x from y ( x is independent variable.. y is dependent variable !)
suggest why this is bad
independent variable is x , you shoudl onyl make predicitons for dependent variable! so you sohuld not use this model to predict a value of x for a given value of y !
isntead, you need to use regression line of x on y
line of regression common question
comment of the reliability of said valeus x and y ( y is outside of the range of values )
x is reliable as its within the range of data
y is not reliable as it is outside the range of data

There are two key problems with Helen’s statement: First, 10 coats of paint is very far outside our range of given data, and we cannot assume that this linear relationship continues as we extrapolate, so using the regression line is not necessarily valid. Second, even if we accept the extrapolation as valid, a gradient of 1.45 means that, for every extra coat of paint, the protection will increase by 1.45 years. Therefore, if 10 coats of paint are applied, the protection will be 14.5 years longer than if no paint were applied. Helen has, however, forgotten to include the constant 2.93 years, which is the weather resistance if no paint were applied. After 10 coats of paint the protection will last approximately 2.93 + 14.5 = 17.43 years.
comment on how data is outside of range of valeus thus extrapolation may not be correct for this regression equation
comment on the fact that the constant has not been included !
note
if negative weak corelation present, then the coefficient of gradient on regression line should be negative
the equation for the line of regression for houses….
y= 900 + 5x
x= number of bedrooms
person says that if there is no bedrooms, the price of the house will be 900
why is this unreasonable ?
This is not a reasonable statement as there are unlikely to be any houses with no bedrooms, so she is extrapolating outside of the range of data, where the linear relationship is unlikely to continue.
regression equation should be used to give a value for.,,
v= h + 100x
v given h
(this is an example !)