Regression Analysis, Time Series Analysis Flashcards

Question 1

Q

How is correlation coefficient notated/

Answer

A

If measured for a population, called ? (rho)

If estimated from a sample, sure r; i.e r estimates ?

Question 2

Q

What is true of the correlation coefficient?

Answer

A

Correlation and covariance are appropriate for use with continuous variables whose distributions have the same shape (e.g. both normally distributed)
If these assumptions are not met, r will be ‘deflated’ and underestimates ?.

Question 3

Q

What are the data in a regression analysis?

Answer

A

One continuous response variable (called y - dependent variable, response variable)
One or more continuous explanatory variable (called x - independent variable, explanatory variable, predictor variable, regressor variable)

Question 4

Q

What is true of εi?

Answer

A

Mean of εi will be zero

Question 5

Q

What does E(Y|X=x) = β0 + β1x mean?

Answer

A

The expected value of Y when X = x is β0 + β1x

Question 6

Q

What is the true/population regression line?

Answer

A

Yi = β0 + β1xi + εi
β0 and β1 are constant to be estimated
εi is a random variable with mean = 0 if our line is going through the middle of our data

Question 7

Q

How is the population regression line estimated?

Answer

A

ŷ = b0 + b1x
- b0 and b1 are estimated values
- ŷ is a fitted value of the response

Question 8

Q

What is a residual?

Answer

A

vertical distance between observed response and fitted value of response

Question 9

Q

How are residuals estimated?

Answer

A

ri estimates εi, the error variable

Question 10

Q

What is SSE?

Answer

A

The error sum of squares

SSE = nΣi=1 (yi - ŷ)2

Question 11

Q

What error assumptions do we make in regression analysis?

Answer

A

In our fitting we assume the errors have a particular distribution - that is, ε ~ N(o, σε2)
- Normal distibution
- Mean = 0
- Constant variance = σε2
- Errors associated with any two y values are independent

Question 12

Q

What is sε?

Answer

A

sε = standard error of the estimate
Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line

Question 13

Q

What are the t stats in regression analysis output?

Answer

A

T = test statistic (that population intercept/slope = 0 against two sided alternative), compared to t with n-2 degrees of freedom finds P = 0, i.e. intercept/slope is not 0

Question 14

Q

What is S in regression analysis output?

Answer

A

Standard Error of the Regression (S) = average distance that values fall from regression line

Question 15

Q

What is R^2?

Answer

A

Determine the strength and significance of association
coefficient of determination
measures proportion of total variation explained, i.e.
= explained variation / total variation = SSreg / SSy =(correlation coefficient)^2
Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation

Question 16

Q

What is important about R?

Answer

A

r = ± √r2

Question 17

Q

What is Homoscedasticity?

Answer

A

If variation is constant (residuals show constant spread around zero), called homoscedastic

Question 18

Q

What is Hetroscedasticity?

Answer

A

If variation is non-constant (residuals show varying spread around zero), called heteroscedastic

Question 19

Q

What is true about Large Standardised Residuals?

Answer

A

Minitab flags “Large Standardised Residuals” R - should be about 5%, - indicates normality of residuals

Question 20

Q

What must be true to make predictions from a regression analysis?

Answer

A

High R-sq, small std error of estimate
All assumptions appear valid
Predictions should only be made for values inside the observed limits

Question 21

Q

What does β1 represent in a multiple regression with 2 predictors?

Answer

A

β1 represents the expected change in Y when X1 is increased by one unit, but X2 is held constant or otherwise controlled

Question 22

Q

What is meant by additive effects of multiple regression?

Answer

A

Combined effects of X1 and X2 are additive - if both X1 and X2 are increased by one unit, expected change in Y would be ( β1 + β2 )

Question 23

Q

What must be true for us to find a Least Squares solution for a multiple regression?

Answer

A

Number of predictors is less than number of observations

- Non of the independent variables are perfectly correlated with each other

Question 24

Q

What is true of the coefficient of multiple determination?

Answer

A

Will go up as we add more explanatory terms to the model whether they are important or not
Often we use adjusted R-sq - compensates for adding more variables, so it lower than R-Sq when variables are not “important”
So, if comparing models with differing numbers of predictors, use Adjusted R-Sq to compare how much variation in response is explained by model

Question 25

Q

What are the rules of dummy variable regression?

Answer

A

Can code any discrete variable with k categories into (k-1) distinct dummy variables
Usually only used when variables have 2 (sometimes 3) categoreis/levels

Question 26

Q

What is a polynomial regression?

Answer

A

Y = β0 + β1X + β2X^2 + β3X^3 + … + ε
Equivalent to fitting a multiple regression where
- X1 = x
- X2 = x^2
- Xk = x^k

Question 27

Q

What is completeness and interactions terms in polynomial regression?

Answer

A

Called “complete’ if all lower order terms of x are significant
- If only had x and x^3 would be incomplete, third order polynomial regression
Interaction Term
- This is needed if the level of X1 affects the relationship
  between X2 and Y
e.g. Second order model with interaction
Y = β0 + β1X1 + β2X1^2 + β3X2 + β4X2^2 + β5X1X2 + ε

Question 28

Q

What is overparamaterisation?

Answer

A

Polynomial regression
Because we’re fitting so many predictors (parameters) to so few observations, the regression may fit to data too well
Meaning that it might not predict the population accurately
Model doesn’t generalise
High r SQ

Question 29

Q

What are the components of a time series?

Answer

A

Long term trend
Cyclical variation
Seasonal variation
Random variation

Question 30

Q

What is long term trend?

Answer

A

Also called secular trend
Relatively smooth pattern or direction
Can be linear or non-linear

Question 31

Q

What is cyclical variation?

Answer

A

Wave-like pattern describing long term trend apparent over a number of years - cyclical effect
Recurrence period over 1 year (definition)
e.g. Business cycles
Rare to find cyclical patterns that are consistent and predictable

Question 32

Q

What is seasonal variation?

Answer

A

Cycles that occur over short repetitive calendar periods
Duration less than one year (definition)
“seasonal” may mean 4 seasons, or systematic patterns over a month/week/day
e.g. restaurant demand features “seasonal” variation throughout the day, monthly traffic volume

Question 33

Q

What is random variation?

Answer

A

Irregular, unpredictable changes
Not caused by other components (trend, cyclical, seasonal variation)
Often referred to as “noise”
Can mask the existence of other components
Exists in all time series
Goal of most time series analysis is to reduce impact of random variation on forecasting or interpretation