Quantitative methods Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Multiple regression model assumptions

A
  • linearity,
  • homoskedasticity –> variance of residuals constant
  • independence of errors, –> Residuals are not serially correlated
  • normality, –> error term is normally distributed evaluated with QQ plot
  • independence of independent variables.–> no linear relationships between independent variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MSR

A

MSR = RSS/k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

MSE

A

MSE = SSE/(n−k−1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SST

A

RSS+SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R2

A

RSS/SST
oppure
(SST-SSE)/SST
oppure
(total variation – unexplained variation )/total variation

indica quanto l’indipendent variable puo spiegare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Breusch pagan

A

n*R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Adjusted R2

A

1-((n-1)/(n-k-1))*(1-R^2)

o measure of goodness of fit that adjusts for the number of independent variables
o adj R2<R2
o decreases when the added independent variable adds little value to regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cook’s D

A

If observation > √(k/n)–> influential point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Odds
Prob given odds

A

Odds= e^coefficient
Prob with odds = odds/(1+odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

F statistic

A

((SSEr-SSEu)/q) / (SSEu/(n-k-1))
=MSR/MSE with K and N-K-1 df

H0 all coefficients are zero
reject H0 if F (test-statistic) > Fc (critical value)
to explain whether at least one coefficient is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Conditional Heteroskedasticity

A

Residual variance is related to level of independent variables

  • Coefficients consistent.
  • St. errors underestimated
  • Type I errors

DETECTION
* Breusch–Pagan chi-square test
* >5%  hetero
* <5%  no hetero

CORRECTIOn
robust or White-corrected standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Serial Correlation

A

Residuals are correlated with each other

  • Coefficients consistent
  • St errors underestimated
  • Type I errors (positive correlation)

DETECTION
* Breusch–Godfrey (BG) F-test
* Durbin Watson (DW)
* DW<2–> pos. serial corre.

CORRECtION
Use robust or Newey–West corrected standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multicollinearity

A

Two or more independent variables are highly correlated

  • Coefficients are consistent (but unreliable).
  • St errors are overestimated
  • Type II errors

DETECTION
* Conflicting t and F-statistics
* variance inflation factors (VIF)
* VIF >5 o 10 problema

CORRECTION
* Drop 1 of the correl. variables
* use a different proxy for an included independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MISSPECIFICATIONS

A

Omission of important independent variable(s)–>May lead to serial correlation or heteroskedasticity in the residuals

Inappropriate transformation / variable form–> May lead to heteroskedasticity in the residuals

Inappropriate scaling–>May lead to heteroskedasticity in the residuals or multicollinearity

Data improperly pooled
Solve it by running regression for multiple periods–May lead to heteroskedasticity or serial correlation in the residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

prob with odds

A

P=(odds)/(1+odds)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Autoregressive (AR) Model

A
  • only 1 lag–>dependent variable is regressed against previous values of itself
  • no distinction between the dependent and independent variables (i.e., x is the only variable).
  • USE t-test to determine whether any of the correlations between residuals at any lag are statistically significant.
  • if not covariance stationary To correct add one lag at a time–> first differencing
  • Ex: pattern of currency using historical price
    add one lag at a time
  • Chain rule forecasting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Covariance Stationary

A
  • Statistic significant = cov stationary
    o Constant and finite mean. E(xt) = E(xt-1) ATTENZIONE no growth rate della mean
    o Constant and finite variance.
    o Constant and finite covariance
  • determine cov. StationaryDickey-Fuller test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Mean Reversion

A

A time series is mean reverting if it tends towards its mean over tim
=b0/(1-b1)

Se b1 =1–> mean reverting è undefined perchè b0/0

19
Q

Unit Root = Random walk

A
  • B1=1 devo first differencing I dati
    Undefined mean rev. level–>Not covariance stationary
20
Q

Random Walk

A
  • random walk = value in one period is equal to the value in another period, plus a random error.
  • Random walk without a drift: xt = xt−1 + εt b0=0 and b1=1
  • Random walk with a drift (con b0): xt = b0 + xt−1 + εt b1=1
21
Q

Seasonality

A
  • More than 1 lag
    o quarterly data = seasonal lag is 4;
    o monthly data = seasonal lag is 12.
22
Q

Root Mean Squared Error (RMSE)

A

to assess accuracy of autoregressive models.
* lower RMSE = better
* Out-of-sample forecasts

23
Q
  • structural change
A

significant shift in the plotted data at a point in time that seems to divide the data into two distinct patterns

24
Q
  • Cointegration:
A

two time series are economically linked (same macro variables) or follow the same trend and that relationship is not expected to change

25
Q

Autoregressive Conditional Heteroskedasticity (ARCH

A

variance of residuals in 1 time period is dependent on the variance of the residuals in another period.–> st. errors of the coefficients and the hypothesis tests are invalid.

26
Q
  1. Penalized regression
A

Regression–> good when I have features reduces overfitting by imposing a penalty on—and reducing—the nonperforming features.

27
Q
  1. Support vector machine
A

classification; separates the data into one of two possible classifiers based on a model-defined hyperplane.

28
Q
  1. K-nearest neighbor
A

classification based on nearness to the observations in the training sample

29
Q
  1. Classification and regression tree
A

. Classification of target variables
* when there are significant nonlinear relationships among variables.
* Binary classification (categorical data)
* Provides a visual explanation

30
Q
  1. Ensemble learning
A

This combines predictions from multiple models, resulting in a lower average error rate.

31
Q
  1. Random forest
A

This is a variant of the classification tree whereby a large number of classification trees are trained using data bagged from the same data set; solution for overfitting

32
Q
  1. Dimension reduction=Principal components analysis
A

summarizes info into smaller set of uncorrelated factors called eigenvectors.

33
Q
  1. K-means clustering.
A

split observations into k non-overlapping clusters; a centroid is associated with each cluster
* Hyperparameter = parameter set before analysis begins ex. 20 groups

34
Q
  1. Hierarchical clustering
A

hierarchy of clusters without any predefined number of clusters

35
Q
  • Neural networks
A

o input layer
o hidden layers (which process the input)
 The nodes in hidden layer =neurons–> summation operator (that calculates a weighted average) and an activation function (a nonlinear function).
o output layer.

o Good for speech recognition and natural language processing
o Good for modelling complex interactions among many features

36
Q
  • Deep learning nets
A

many hidden layers (more than 20) useful for pattern, speech, and image recognition

37
Q
  • Reinforcement learning
A

seek to learn from their own errors maximizing a defined reward.

38
Q
  • precision (P)
A

true positives / (false positives + true positives)

39
Q

recall

A

= true positives / (true positives + false negatives)

40
Q
  • accuracy
A

true positives + true negatives) / (all positives and negatives

41
Q
  • F1 score
A

2 × P × R) / (P + R)

42
Q

Covariance stationarity

A

o Constant and finite mean. E(xt) = E(xt-1) ATTENZIONE no growth rate della mean
o Constant and finite variance.
o Constant and finite covariance with leading or lagged values

  • determine cov. StationaryDickey-Fuller test
43
Q

Logistic model

A

dependent variable is binary

44
Q

steps in a data analysis project

A

1-conceptualization of the modeling task,
2- data collection,
3- data preparation and wrangling,
4- data exploration,
5- and model training.