Regression Flashcards
what is regression?
used to estimate equations that describe the relationship between variables, such as demand and cost functions, in the form Y = f(x) {OR Y = f(X1, X2 … XN)
we use it to understand the impact of the regressor X, or the independent casual variable, on any one specific regressand Y, or the dependent affected variable
There will always be a SET of casual variables (X1, X2 … XN) impacting the dependent variable (Y) but for this class you only need to understand how to study the impact of any ONE x on y at a time.
converts data clumps into comprehensible information.
differentiate between X and Y
1) independent VS dependent variable
2) regessor VS regressand
3) predetermined VS determined variable
4) serves as cause/stimulant VS effect/consequence
define ‘stochastic variable’. why is Y also known as this?
stochastic = random variable that can assume more than one value due to chance, AKA probabalistic variable
- the value of Y is not fixed and depends on the outcome of a random process e.g. rolling a die or random selection using computer generated number
describing the functional relationship Y = f(X)
- either positive/direct where Y is directly proportional to X (use correct notation)
- or negative/indirect/inverse where Y is inversely proportional to X (Y is proportional to 1/X)
- positive has positive slope and upward sloping trajectory; negative has negative slope and downward sloping trajectory
converting regression data into a linear equation
- the equation will indicate the curve’s TRAJECTORY (sign of gradient) and VALUE OF SLOPE (value of gradient)
- Y = mX + h (population regression function PRF) OR Y^ = m^X + h^ (sample regression function SRF)
significance of h
h is the y intercept and tells us the value of Y at which Y becomes independent of X/in the absence of X - i.e. the value of Y when X is zero
definition of ceteris paribus
with all else in a state of dormancy/passivity, NOT constance (as these variables can change, you are only holding them constant for the sake of observation)
what is the best way to collect data, and why?
randomisation, because it ensures all elements in the population have an equal chance of being selected and thus minimises selection bias and improves both representativeness and generalisability
format of functional fractions
effect / cause
this is done to examine the proportion of an effect RELATIVE to a specific cause
link to dy/dx - we want to see how much y changes after x changes by a certain amount
why do we call it the X and Y axes?
- X axis measures ONLY cause, cause is represented by the variable X
opposite for Y
trajectory VS slope
trajectory = direction and has NO VALUE; represented by the sign accompanying the value of the slope
slope = angle or gradient and has VALUE
slopedness
relationship between elasticity and slope of a curve
elasticity is represented by the slope of the curve - the greater the slope, the more price INELASTIC
n VS N
n (sample size) is a SUBSET of N (population size)
the use of n will give you a result that is only an ESTIMATE
as n tends to N, estimation tends to certainty
conditions for n to ensure quality of statistical analysis
1) n>30
2) n must make up at LEAST 10% of N
hat notation