Exam 2- Regression Flashcards

0
Q

Y=a+bx

A
Y-hat reminds us that we have deviations about the line and that values for y specified by the line are PREDICTIOnS
a - intercept
b - slope
^
Y- predicted value if y for a given x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Statistical model

A

An equation that fits the pattern between a response variable and possible explanatory variables, accounting for deviations from the model. Or in other words, a regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does y intercept tells us?

A

The value of y when x=0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does slope tell us?

A

The change in y for every one unit increase in x , on average!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

As x increases by one unit what happened to the y when slope is negative?

A

Y decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

As x increases by one unit what happens to y when slope is positive?

A

Y increases by rise/run units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

b=

A

Rise(y)/run(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Interpretation of slope : rise/run

A

For every inch increase in height at age 4 , height increases by 1.15 inches ON AVERAGE at age 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpretation of y- intercept

A

Males who are zero inches tall at age 4 will be 23 inches tall at age 18

The intercept is the value of y when x=O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to predict

A
  • collect data
  • plot data
  • predict
  • fit the data with a straight line equation
  • evaluate the equation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Residuals

A

Vertical distance from the observed y value and the line , or

The difference between observed y value and y-hat , the value predicted by regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Squared Prediction error (residual)2

A

(Observed y - predicted y)2= (Y - Y(hat)) squared

They are squared because the sum of two residuals are normally equals to zero ( negative residual plus positive residuals above and below the line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Positive residuals

A

Points above the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Negative residuals

A

Points below the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The least-squares residual line is

A

The line with the smallest sum of squares errors (denoted SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Sum of Squared Deviations (residuals, errors (SSE) represents

A
The total variation in observed values of y
Sum residuals2( squared) =
        ( y   -     y-hat) squared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Least - squares equation

A

Y-hat=a +bx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Formula for a (intercept)

A

a=y-bar - bx(bar)

Where y and x are the respective means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Formula for b(slope)

A

Slope is a rate of change, the amount of change in y for a given value of x when x increases by 1

b=r Sy/Sx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Least-squares regressions line facts

A
  • makes the distance of the data points from the line small Only in Y direction
  • if we reverse the roles of two variables we get different least squared regression line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the connection between correlation r and the slope b of the least squared line?

A
Slope and r have the same sign
B=r only when Sy=Sx
Both r and b tell us the direction
If r=0 b =O
If ro b>0
If we know sign of r we know sign of b and vise versa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What b and r have in common

A

Always have the same sign

A change of 1 standard deviation in x corresponds to a change of r standard deviations in y.

Change in y(hat) is less then change in x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The least squares regression line always passes

A

Through the point (x bar;y bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Correlation r describes

A

The straight line relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

The square of correlation r 2 gives us

A

The percentage % of Variation in the values of y that is explained by the least squares regression line

On the chart R-sq=0.6937 or 69.37%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Regression line

A

Is a straight line that describes how a response variable y changes as an explanatory variable x changes

26
Q

Least squares line is a math model used to predict

A

The value of y for a given x

Y = a +bx

27
Q

Least squares regression line requires that we have

A

Explanatory and response variables, quantitative

28
Q

The least squares regression line of y on x is the line that makes

A

The sum of the squares of the vertical distance of the data points from line as small as possible

29
Q

The least squares regression line as any line has

A

Slope and intercept
Chance of y into Yhat

Slope b =r(Sy/Sx) Where r is correlating factor and s are standard deviations for both x and y

30
Q

When r2 is close to 0 zero the regression line

A

Is not a good model for the data ; hamburger shape , no relationship between x and y explained by regression line

31
Q

When r2 is close to 1

A

The regression line should fit the data well or almost 100 % of variations in y are explained by x

32
Q

The coefficient if determination r2

A

represents the fraction (%) of the variation in the values of y that is explained by the least squares regression of y on x.

33
Q

Regression is a common statistical setting and least squared regression is most common method for

A

Fitting a regression line to data

34
Q

Least squares regression line always passes through

A

The point x and y

35
Q

Residual

A

Difference between an observed value of the response variable y and the value predicted by regression line y-hat
Residual = observed y - predicted y or y-hat

36
Q

The residual show

A

How far the data is from the regression line and how well the line describes the data.

37
Q

The mean of the least squared residuals is

A

Always zero!

38
Q

A residual plot (diagnostic plot)

A

Is a scatter plot of the residuals versus the observed x values ( or y-hats ) which lay on the regression line

39
Q

If the residual plot shows uniform scatter of the points about the fitted line

A

Above and below with no unusual observations or systematic pattern, then the regression line captures the overall relationship well

40
Q

Residual plot - curved pattern

A

Relationship is not linear

41
Q

Residual plot - megaphone

A

Increasing or decreasing spread about the line x indicates that prediction of y will be LESS accurate for larger x’s

42
Q

Individual points with large residuals are

A

Outliers in the vertical direction

43
Q

Influential observation

A

Is an outlier in either x or y direction which if removed would markedly change the value of the slope and y- intercept

44
Q

Outlier

A

An observation that lies outside the overall pattern of the other observations

45
Q

Ecological correlation

A

A correlation based on group mean averages rather than on individuals .

46
Q

Correlation measures

A

Direction and strength of linear relationship of quantitative variables x and t

47
Q

Regression models

A

The linear relationship between x and y and can be used to predict a value for the response variable y for a specific value of the explanatory variable x

48
Q

What is total variation?

A

Sum of squared deviations about y-bar

49
Q

What is unexplained variations?

A

Sum of squared residuals or variations not explained by regression line

50
Q

Regression assumptions:

A

The relationship between x and y can be modeled by a straight line ( residuals show randomness around the line)
Variations in Y’s about the line does not depend on values if x ( residuals are similar in size for all X’s)

51
Q

If residuals conditions (assumptions) are met

A

Shoes box or There is no pattern in the residuals

52
Q

Smile or frown pattern in residual plots indicate

A

Non-linear relationship - violation of conditions (assumptions)

53
Q

Megaphone pattern in residual plot indicates

A

Non-constant variations ( variation in y is dependent on x)

54
Q

Shoe box residual plot with a point outside indicates

A

Outlier in either x or y direction

55
Q

An estimated statistical model-

A

Regression equation

56
Q

Regression equation is an

A

Estimated statistical model

57
Q

r2 is a measure of how

A

Successfully the regression explains the variation on the response, y

58
Q

The sum of squared residuals measures …… Variation

A

The unexplained

59
Q

R-sq is a measure of the fraction of variation in y that is …. Not explained by X
R-sq = 1 - unexplained var/total var

A

Not explained by x

60
Q

Residual plot help us to magnify the residuals and identify ….. Sometimes we can see ….. Observations and …… Which are much more visible on the residual plot.

A

Problems.
Unusual observations
Patterns

61
Q

A residual plot is a ….. Of the x-values plotted against the residuals

A

Scatterplot

62
Q

Correlations based on ….. Rather then on …… Can be misleading if they are interpreted to be about individuals

A

Averages…..on ondividuals

63
Q

Removing influential point from the data set will change …

A

Slope and y-intercept