Regression Analysis, Time Series Analysis Flashcards

1
Q

How is correlation coefficient notated/

A

If measured for a population, called ? (rho)

If estimated from a sample, sure r; i.e r estimates ?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is true of the correlation coefficient?

A
  • Correlation and covariance are appropriate for use with continuous variables whose distributions have the same shape (e.g. both normally distributed)
  • If these assumptions are not met, r will be ‘deflated’ and underestimates ?.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the data in a regression analysis?

A
  • One continuous response variable (called y - dependent variable, response variable)
  • One or more continuous explanatory variable (called x - independent variable, explanatory variable, predictor variable, regressor variable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is true of εi?

A

Mean of εi will be zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does E(Y|X=x) = β0 + β1x mean?

A

The expected value of Y when X = x is β0 + β1x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the true/population regression line?

A
  • Yi = β0 + β1xi + εi
  • β0 and β1 are constant to be estimated
  • εi is a random variable with mean = 0 if our line is going through the middle of our data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the population regression line estimated?

A
  • ŷ = b0 + b1x
    • b0 and b1 are estimated values
    • ŷ is a fitted value of the response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a residual?

A

vertical distance between observed response and fitted value of response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How are residuals estimated?

A

ri estimates εi, the error variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is SSE?

A

The error sum of squares

SSE = nΣi=1 (yi - ŷ)2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What error assumptions do we make in regression analysis?

A
  • In our fitting we assume the errors have a particular distribution - that is, ε ~ N(o, σε2)
    • Normal distibution
    • Mean = 0
    • Constant variance = σε2
    • Errors associated with any two y values are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is sε?

A
  • sε = standard error of the estimate
  • Interpretation - standard deviation of residuals; standard error in predicting Y from the regression equation - best definition: standard deviation around prediction line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the t stats in regression analysis output?

A

T = test statistic (that population intercept/slope = 0 against two sided alternative), compared to t with n-2 degrees of freedom finds P = 0, i.e. intercept/slope is not 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is S in regression analysis output?

A

Standard Error of the Regression (S) = average distance that values fall from regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is R^2?

A
  • Determine the strength and significance of association
  • coefficient of determination
  • measures proportion of total variation explained, i.e.
  • = explained variation / total variation = SSreg / SSy =(correlation coefficient)^2
  • Will be between 0 and 1; a value close to 1 indicates most of the variation in y is explained by the regression equation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is important about R?

A

r = ± √r2

17
Q

What is Homoscedasticity?

A

If variation is constant (residuals show constant spread around zero), called homoscedastic

18
Q

What is Hetroscedasticity?

A

If variation is non-constant (residuals show varying spread around zero), called heteroscedastic

19
Q

What is true about Large Standardised Residuals?

A

Minitab flags “Large Standardised Residuals” R - should be about 5%, - indicates normality of residuals

20
Q

What must be true to make predictions from a regression analysis?

A
  • High R-sq, small std error of estimate
  • All assumptions appear valid
  • Predictions should only be made for values inside the observed limits
21
Q

What does β1 represent in a multiple regression with 2 predictors?

A

β1 represents the expected change in Y when X1 is increased by one unit, but X2 is held constant or otherwise controlled

22
Q

What is meant by additive effects of multiple regression?

A

Combined effects of X1 and X2 are additive - if both X1 and X2 are increased by one unit, expected change in Y would be ( β1 + β2 )

23
Q

What must be true for us to find a Least Squares solution for a multiple regression?

A
  • Number of predictors is less than number of observations

- Non of the independent variables are perfectly correlated with each other

24
Q

What is true of the coefficient of multiple determination?

A
  • Will go up as we add more explanatory terms to the model whether they are important or not
  • Often we use adjusted R-sq - compensates for adding more variables, so it lower than R-Sq when variables are not “important”
  • So, if comparing models with differing numbers of predictors, use Adjusted R-Sq to compare how much variation in response is explained by model
25
Q

What are the rules of dummy variable regression?

A
  • Can code any discrete variable with k categories into (k-1) distinct dummy variables
  • Usually only used when variables have 2 (sometimes 3) categoreis/levels
26
Q

What is a polynomial regression?

A
  • Y = β0 + β1X + β2X^2 + β3X^3 + … + ε
  • Equivalent to fitting a multiple regression where
    • X1 = x
    • X2 = x^2
    • Xk = x^k
27
Q

What is completeness and interactions terms in polynomial regression?

A
  • Called “complete’ if all lower order terms of x are significant
    • If only had x and x^3 would be incomplete, third order polynomial regression
  • Interaction Term
    • This is needed if the level of X1 affects the relationship
      between X2 and Y
  • e.g. Second order model with interaction
  • Y = β0 + β1X1 + β2X1^2 + β3X2 + β4X2^2 + β5X1X2 + ε
28
Q

What is overparamaterisation?

A
  • Polynomial regression
  • Because we’re fitting so many predictors (parameters) to so few observations, the regression may fit to data too well
  • Meaning that it might not predict the population accurately
  • Model doesn’t generalise
  • High r SQ
29
Q

What are the components of a time series?

A
  • Long term trend
  • Cyclical variation
  • Seasonal variation
  • Random variation
30
Q

What is long term trend?

A
  • Also called secular trend
  • Relatively smooth pattern or direction
  • Can be linear or non-linear
31
Q

What is cyclical variation?

A
  • Wave-like pattern describing long term trend apparent over a number of years - cyclical effect
  • Recurrence period over 1 year (definition)
  • e.g. Business cycles
  • Rare to find cyclical patterns that are consistent and predictable
32
Q

What is seasonal variation?

A
  • Cycles that occur over short repetitive calendar periods
  • Duration less than one year (definition)
  • “seasonal” may mean 4 seasons, or systematic patterns over a month/week/day
  • e.g. restaurant demand features “seasonal” variation throughout the day, monthly traffic volume
33
Q

What is random variation?

A
  • Irregular, unpredictable changes
  • Not caused by other components (trend, cyclical, seasonal variation)
  • Often referred to as “noise”
  • Can mask the existence of other components
  • Exists in all time series
  • Goal of most time series analysis is to reduce impact of random variation on forecasting or interpretation