Chapter 6: Simple Regression And Correlation Analysis Chapter 6: Simple Regression And Correlation Analysis Flashcards

1
Q

What is a multivariate dataset

A

It is a set that consists of observations on two or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a bivariate data set

A

It is a set that consists of two variables and the paired observations are denoted by (x1,y1), …, (xn,yn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a dependent variable

A

It is a variable which can be partially explained by the independent variable. Therefore, the dependent variable is dependent on the behaviour of the independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the assumption of changes in values of y and x

A

If it is assumed that changes in the value of y are explained by changes in the value of x, x is referred to as the independent variable and y is referred to as the dependent variable.

The independent variable is sometimes also called the predictor variable or the explanatory variable

We occasionally also refer to the independent variable is the response variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a scatter plot

A

A scatter plot is a graphical representation of the values of the independent variable plotted against the dependent variable

The dependent variable is displayed on the vertical axis and the independent Variable on the horizontal axis. Every pair of (x, y) - values is plotted on the graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is interpolation

A

Estimates a y-value for a given x-value inside the interval of observed x-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is extrapolation

A

Estimates a y-value for given x-values outside the interval of observed x-values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is correlation coefficient

A

The correlation coefficient is a measure of the strength of the linear relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the least squares curve

A

It makes the sum of the squared residuals as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a time series

A

A time series is the set of observations made at different points in time with equal duration between each observation.

Estimating the long-term trend in the timeseries enables the investigator to make predictions into the future

Irregular variation is present in a time series when they are unpredictable movements in the timeseries it as a result a difficult to describe mathematically

Regression methods (such as the method of least-squares) can be used to capture the long-term trend in a time series

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the relationship as x increases and y increases

A

Positive linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the relationship as x increases and y increases but with greater variation

A

Weaker positive relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the relationship as x increases and y decreases but with greater variation

A

Weaker negative relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the relationship as x increases and y decreases

A

Negative relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the relationship as x increases and y increases and decreases

A

Non-linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the properties of the correlation coefficient

A

-1=<0=<1 The correlation coefficient always lies between -1 and 1

  • r > 0 There is a positive linear relationship
  • r < 0 There is a negative linear relationship
  • r = 1 Perfect positive linear relationship
  • r = -1 Perfect negative linear relationship
  • r = 0 Non-linear relationship

between x and y

17
Q

What is regression used for?

A

To forecast y bar for given x-values

18
Q

In the regression equation what does an estimate of a represent and what does an estimate of b represent

A

a denotes the y intercept
b represents the gradient of the straight line (in other words, for each unit that x increases, the y value will, on average, increase by b units or decrease by b units)

19
Q

What does the y bar symbol indicate

A

It indicates the straight line fitted through the data points in the scatterplot in such a way that it best describes the overall linear relationship between the variables

NB!! Y bar also represents the mean y value for the x value

20
Q

What are residuals

A

Residual is the difference between the observed value (y) of the dependent variable and the value given by the estimated regression line
(y-ybar)

21
Q

What is the least squares line

A

It is obtained by minimising the sum of the squares of the vertical distance between the observed points and the corresponding points on the line

22
Q

What are the properties of the coefficient of determination

A

0=<Rsquared=<1
- if R = 1 It is a perfect fit of the curve to the observed data
- if R = 0 The curve does not fit the observed data
- For a straight line it is true that r=R

23
Q

What denotes the covariance between x and y

A

Sxy

24
Q

What are the standard deviations of x and y

A

Sx and Sy

25
Q

What is the average of residuals

A

0

26
Q

What is a residual plot

A

It is a scatterplot of the residuals (on the Y axis) against the ones on the (X axis)

27
Q

What is an outlier

A

It is an observation outside the general pattern of the other observations

28
Q

Is r dependent or independent of the units used when the observations are measured

A

Independent