Regression Flashcards

Question 1

Q

Regression

Answer

A

Regression can be defined as a method or an algorithm in Machine Learning that models a target value
based on independent predictors. It is essentially a statistical tool used in finding out the relationship
between a dependent variable and an independent variable. This method comes to play in forecasting
and finding out the cause and effect relationship between variables.

Question 2

Q

Regression techniques differ based on:

Answer

A

The number of independent variables
The type of relationship between the independent and dependent variable

Question 3

Q

data used

Answer

A

Regression is basically performed when the dependent variable is of a continuous data type. The
independent variables, however, could be of any data type — continuous, nominal/categorical etc.

Question 4

Q

regression methods do..

Answer

A

Regression methods find the most accurate line describing the relationship between the dependent
variable and predictors with least error. In regression, the dependent variable is the function of the
independent variable and the coefficient and the error term.

Question 5

Q

Correlation

Answer

A

is a measure of the strength of a linear relationship between two quantitative variables
(e.g. price, sales)

Correlation is positive when the values increase together
Correlation is negative when one value decreases as the other increases

Question 6

Q

Correlation can have a value

Answer

A

1 is a perfect positive correlation
0 is no correlation (the values don’t seem linked at all)
-1 is a perfect negative correlation

Question 7

Q

cross tabs

Answer

A

Cross tabs help us establish a relationship between two variables. This relationship is exhibited in a tabular form

Question 8

Q

Column percentages

Answer

A

(these are percentages within the columns, so that each column’s
percentages add up to 100%

Question 9

Q

in cross tabs when the variables are not ordered..

Answer

A

where both variables are not ordered, we can simply refer to the strength of the
correlation without discussing its direction

Question 10

Q

Scatterplots

Answer

A

A scatter plot (aka scatter chart, scatter graph) uses dots to represent values for two different numeric
variables.
The position of each dot on the horizontal and vertical axis indicates values for an individual
data point.
Scatter plots are used to observe relationships between variables.

Question 11

Q

What type of correlation is shown here?

Answer

A

This is a negative correlation. As we move along the x-axis toward the greater numbers,
the points move down which means the y-values are decreasing, making this a negative correlation.

Question 12

Q

Pearson’s r

Answer

A

The Pearson correlation coefficient is used to measure the strength of a linear association between
two variables.
where the value r = 1 means a perfect positive correlation and the value r = -1 means a
perfect negative correlation.

Question 13

Q

Requirements for Pearson’s correlation coefficient are as follows: Scale of measurement should be
interval or ratio

Answer

A

Variables should be approximately normally distributed
The association should be linear
There should be no outliers in the data

Question 14

Q

What does this test do?
Pearson’s r

Answer

A

The Pearson product-moment correlation coefficient (or Pearson correlation coefficient, for short) is a
measure of the strength of a linear association between two variables and is denoted by ‘r’.
Basically,
a Pearson product-moment correlation attempts to draw a line of best fit through the data of two
variables,
Pearson correlation coefficient, r, indicates how far away all these data points are to
this line of best fit (i.e., how well the data points fit this new model/line of best fit)

Question 15

Q

What values can the Pearson correlation coefficient take?

Answer

A

The Pearson correlation coefficient, r, can take a range of values from +1 to -1.
A value of 0 indicates
that there is no association between the two variables.
A value greater than 0 indicates a positive
association; that is, as the value of one variable increases, so does the value of the other variable.
A value less than 0 indicates a negative association; that is, as the value of one variable increases, the
value of the other variable decreases.

Question 16

Q

How can we determine the strength of association based on the Pearson correlation coefficient?

Answer

A

The stronger the association of the two variables, the closer the Pearson correlation coefficient, r, will
be to either +1 or -1 depending on whether the relationship is positive or negative, respectively.
Achieving a value of +1 or -1 means that all your data points are included on the line of best fit – there
are no data points that show any variation away from this line.
Values for r between +1 and -1 (for
example, r = 0.8 or -0.4) indicate that there is variation around the line of best fit. The closer the value
of r to 0 the greater the variation around the line of best fit.

Question 17

Q

if we use a simple linear regression model where y depends on x, then the regression line
of y on x is:

Answer

A

y = a + bx

Question 18

Q

regression constant

Answer

A

The two constants a and b are regression parameters. Furthermore, we denote the
variable b as byx and we term it as regression coefficient of y on x.

Question 19

Q

least square method is suitable for

Answer

A

We can call it the best fit as
the result comes from least squares.
This method is the most suitable for finding the value
of y on x i.e. the value of a dependent variable on an independent variable.

Question 20

Q

The standard form of the regression equation of variable x on y is:

Answer

A

[ x – x¯ ]/Sx = r[ y – y¯ ]/Sy

Question 21

Q

a regression line

Answer

A

: In statistics, a regression line is a line that best describes the behaviour of a set of data. In other
words, it’s a line that best fits the trend of a given data.

Question 22

Q

The regression line formula is like the following:

Answer

A

(Y = a + bX + u)

Question 23

Q

The multiple regression formula looks like this

Answer

A

(Y = a + b1X1 + b2X2 + b3X3 + … + btXt +u.)

u is the residual regression

Question 24

Q

purpose of regression line

Answer

A

Regression lines are very useful for forecasting procedures.
The purpose of the line is to describe the
interrelation of a dependent variable (Y variable) with one or many independent variables (X variable).
By using
the equation obtained from the regression line an analyst can forecast future behaviours of the dependent
variable by inputting different values for the independent ones. Regression lines are widely used in the financial
sector and in business in general
Financial analysts employ linear regressions to forecast stock prices, commodity prices and to perform
valuations for many different securities.
companies employ regressions for the purpose of
forecasting sales, inventories and many other variables that are crucial for strategy and planning.

Question 25

Q

Correlation

Answer

A

Correlation is a statistical technique which tells us how strongly the pair of variables are linearly related and
change together.
- . It does not tell us why and how behind the relationship but it just says the relationship exists.
- Example: Correlation between Ice cream sales and sunglasses sold.

Question 26

Q

Causation

Answer

A

Causation indicates that one event is the result of the occurrence of the other event; i.e. there is a
causal relationship between the two events. This is also referred to as cause and effect.