Regression Flashcards

Question 1

Q

what is regression?

Answer

A

used to estimate equations that describe the relationship between variables, such as demand and cost functions, in the form Y = f(x) {OR Y = f(X1, X2 … XN)

we use it to understand the impact of the regressor X, or the independent casual variable, on any one specific regressand Y, or the dependent affected variable

There will always be a SET of casual variables (X1, X2 … XN) impacting the dependent variable (Y) but for this class you only need to understand how to study the impact of any ONE x on y at a time.

converts data clumps into comprehensible information.

Question 2

Q

differentiate between X and Y

Answer

A

1) independent VS dependent variable
2) regessor VS regressand
3) predetermined VS determined variable
4) serves as cause/stimulant VS effect/consequence

Question 3

Q

define ‘stochastic variable’. why is Y also known as this?

Answer

A

stochastic = random variable that can assume more than one value due to chance, AKA probabalistic variable

the value of Y is not fixed and depends on the outcome of a random process e.g. rolling a die or random selection using computer generated number

Question 4

Q

describing the functional relationship Y = f(X)

Answer

A

either positive/direct where Y is directly proportional to X (use correct notation)
or negative/indirect/inverse where Y is inversely proportional to X (Y is proportional to 1/X)
positive has positive slope and upward sloping trajectory; negative has negative slope and downward sloping trajectory

Question 5

Q

converting regression data into a linear equation

Answer

A

the equation will indicate the curve’s TRAJECTORY (sign of gradient) and VALUE OF SLOPE (value of gradient)
Y = mX + h (population regression function PRF) OR Y^ = m^X + h^ (sample regression function SRF)

Question 6

Q

significance of h

Answer

A

h is the y intercept and tells us the value of Y at which Y becomes independent of X/in the absence of X - i.e. the value of Y when X is zero

Question 7

Q

definition of ceteris paribus

Answer

A

with all else in a state of dormancy/passivity, NOT constance (as these variables can change, you are only holding them constant for the sake of observation)

Question 8

Q

what is the best way to collect data, and why?

Answer

A

randomisation, because it ensures all elements in the population have an equal chance of being selected and thus minimises selection bias and improves both representativeness and generalisability

Question 9

Q

format of functional fractions

Answer

A

effect / cause

this is done to examine the proportion of an effect RELATIVE to a specific cause

link to dy/dx - we want to see how much y changes after x changes by a certain amount

Question 10

Q

why do we call it the X and Y axes?

Answer

A

X axis measures ONLY cause, cause is represented by the variable X

opposite for Y

Question 11

Q

trajectory VS slope

Answer

A

trajectory = direction and has NO VALUE; represented by the sign accompanying the value of the slope

slope = angle or gradient and has VALUE

slopedness

Question 12

Q

relationship between elasticity and slope of a curve

Answer

A

elasticity is represented by the slope of the curve - the greater the slope, the more price INELASTIC

Question 13

Q

n VS N

Answer

A

n (sample size) is a SUBSET of N (population size)

the use of n will give you a result that is only an ESTIMATE

as n tends to N, estimation tends to certainty

Question 14

Q

conditions for n to ensure quality of statistical analysis

Answer

A

1) n>30
2) n must make up at LEAST 10% of N

Question 15

Q

hat notation

Question 16

Q

definition of goodness of fit

Answer

A

how well the sample data fits with the population data; measured by R^2 (coefficient of determination of goodness of fit)

Question 17

Q

3 methods of regression

Answer

A

1) OLS
2) Time-series data
3) Exponential/LOGIT

Question 18

Q

OLS method

Answer

A

Ordinary Least Squares

is used to find the line of best fit (or OLS regression line) i.e. the line for the ‘squared gaps’ between points on your scatter plot and your line is as small as possible
line closest to all the points in a fair way!
measure each gap, square the gaps (to ensure no negative and positive numbers cancel out each other) and add all the squared gaps to get a ‘total gap score’. You adjust the line until this score is minimised, which means you have found the line of best fit

Question 19

Q

time-series regression

Answer

A

using past patterns to estimate future data

Question 20

Q

for time-series regression, why is using the median year convenient? (for examples used in class, ONLY)

Answer

A

in the cases done in class, sigma X = 0 which results in several parts of m’s formula cancelling out –> simplifies calculation

Question 21

Q

for time-series regression, forecast
estimates for how many years into the future are considered unreliable?

Answer

A

more than 5 years into the future - we DO NOT KNOW what will happen by that time!

no similar issue for precasts because we already know what has happened in the past and can use this knowledge to explain our results and adjust for various factors

Question 22

Q

exponential regression

Answer

A

used when the data changes (increases or decreases) exponentially rather than linearly, i.e. it multiplies quickly and does not occur in a straight way by the same amount each time

Question 23

Q

how to describe concave to x upwards, downwards and convex to x upwards, downwards

Answer

A

1) concave upwards - for more of x, we get less and less of Y (Y increases at a decreasing rate)

2) concave downwards - for more of X, we give up more and more of Y (Y decreases at an increasing rate)

3) convex upwards - for more of X, we get more and more of Y (Y increases at increasing rate)

4) convex downwards - for more of X, we give up less and less of Y (decreases at decreasing rate)

the change in Y (delta Y) increases/decreases at an increasing/decreasing rate for the same increase in X

Question 24

Q

what is a learning curve?

Answer

A

the time it takes for an employee to learn the systems involved in the production process
visual representation of the relationship between task proficiency and experience in that job
based on the premise that individuals need time to become proficient at something new
therefore businesses need to invest in training to obtain a certain target output - over time, the trainee learns, becomes more efficient and therefore more productive
applied to business, the learning curve represents the relationship between cost and output

Question 25

Q

describe a learning curve

Answer

A

X - number of attempts at learning
Y - performance measure
- slow-paced, fast-paced and plateau phase

Question 26

Q

types of learning curves

Answer

A

1) diminishing returns - illustrates tasks quick to learn and early to plateau; manual tasks

2) increasing returns - tasks difficult to learn at first but rate of return is significant after some time; operating sophisticated instrument

3) sigmoid - AKA increasing-decreasing return curve, represents tasks difficult to learn initially, but begins to plateau once proficiency is obtained

e.g.: salesperson initially struggles, then rapidly improves as they gain experience, but eventually plateaus as they reach market saturation.

4) complex - learning trajectory traced over a long time period where the individual may experience a temporary belief of mastery, only to discover there is more to learn

e.g.: Machine learning models improve in bursts, sometimes requiring major adjustments or new data to overcome plateaus before further progress is made.

Question 27

Q

what is inflexion?

Answer

A

the point at which the rate of change changes and the trajectory stays the same. The curve is neither concave up nor concave down at that point and the CONCAVITY MIGHT BE CHANGING

Question 28

Q

second derivatives and inflexion

Answer

A

f’‘(x)>0 means minima (concave up - smile)
f’‘(x)<0 means maxima (concave down - frown)
f’‘(x)=0 means inflexion occurs at that value of x. ( f’‘(X) must change sign from positive to negative as you pass through this point. If no change in sign, it is NOT an inflexion point and merely a stationary/flat region)

Question 29

Q

why don’t parabolas have inflexion points?

Answer

A

inflexion occurs where the curve changes its concavity, but parabolas are always either fully concave up or fully concave down

Question 30

Q

define minima and maxima points

Answer

A

maxima - highest point on curve FOLLOWING WHICH there is a fall

minima - lowest point on curve FOLLOWING WHICH there is an increase