Quantitative methods Flashcards

Question 1

Q

Multiple regression model assumptions

Answer

A

linearity,
homoskedasticity –> variance of residuals constant
independence of errors, –> Residuals are not serially correlated
normality, –> error term is normally distributed evaluated with QQ plot
independence of independent variables.–> no linear relationships between independent variables

Question 2

Q

MSR

Answer

A

MSR = RSS/k

Question 3

Q

MSE

Answer

A

MSE = SSE/(n−k−1)

Question 4

Q

SST

Question 5

Q

R2

Answer

A

RSS/SST
oppure
(SST-SSE)/SST
oppure
(total variation – unexplained variation )/total variation

indica quanto l’indipendent variable puo spiegare

Question 6

Q

Breusch pagan

Question 7

Q

Adjusted R2

Answer

A

1-((n-1)/(n-k-1))*(1-R^2)

o measure of goodness of fit that adjusts for the number of independent variables
o adj R2<R2
o decreases when the added independent variable adds little value to regression model

Question 8

Q

Cook’s D

Answer

A

If observation > √(k/n)–> influential point

Question 9

Q

Odds
Prob given odds

Answer

A

Odds= e^coefficient
Prob with odds = odds/(1+odds)

Question 10

Q

F statistic

Answer

A

((SSEr-SSEu)/q) / (SSEu/(n-k-1))
=MSR/MSE with K and N-K-1 df

H0 all coefficients are zero
reject H0 if F (test-statistic) > Fc (critical value)
to explain whether at least one coefficient is significant

Question 11

Q

Conditional Heteroskedasticity

Answer

A

Residual variance is related to level of independent variables

Coefficients consistent.
St. errors underestimated
Type I errors

DETECTION
* Breusch–Pagan chi-square test
* >5%  hetero
* <5%  no hetero

CORRECTIOn
robust or White-corrected standard errors

Question 12

Q

Serial Correlation

Answer

A

Residuals are correlated with each other

Coefficients consistent
St errors underestimated
Type I errors (positive correlation)

DETECTION
* Breusch–Godfrey (BG) F-test
* Durbin Watson (DW)
* DW<2–> pos. serial corre.

CORRECtION
Use robust or Newey–West corrected standard errors

Question 13

Q

Multicollinearity

Answer

A

Two or more independent variables are highly correlated

Coefficients are consistent (but unreliable).
St errors are overestimated
Type II errors

DETECTION
* Conflicting t and F-statistics
* variance inflation factors (VIF)
* VIF >5 o 10 problema

CORRECTION
* Drop 1 of the correl. variables
* use a different proxy for an included independent variable

Question 14

Q

MISSPECIFICATIONS

Answer

A

Omission of important independent variable(s)–>May lead to serial correlation or heteroskedasticity in the residuals

Inappropriate transformation / variable form–> May lead to heteroskedasticity in the residuals

Inappropriate scaling–>May lead to heteroskedasticity in the residuals or multicollinearity

Data improperly pooled
Solve it by running regression for multiple periods–May lead to heteroskedasticity or serial correlation in the residuals

Question 15

Q

prob with odds

Answer

A

P=(odds)/(1+odds)

Question 16

Q

Autoregressive (AR) Model

Answer

A

only 1 lag–>dependent variable is regressed against previous values of itself
no distinction between the dependent and independent variables (i.e., x is the only variable).
USE t-test to determine whether any of the correlations between residuals at any lag are statistically significant.
if not covariance stationary To correct add one lag at a time–> first differencing
Ex: pattern of currency using historical price
add one lag at a time
Chain rule forecasting

Question 17

Q

Covariance Stationary

Answer

A

Statistic significant = cov stationary
o Constant and finite mean. E(xt) = E(xt-1) ATTENZIONE no growth rate della mean
o Constant and finite variance.
o Constant and finite covariance
determine cov. StationaryDickey-Fuller test

Question 18

Q

Mean Reversion

Answer

A

A time series is mean reverting if it tends towards its mean over tim
=b0/(1-b1)

Se b1 =1–> mean reverting è undefined perchè b0/0

Question 19

Q

Unit Root = Random walk

Answer

A

B1=1 devo first differencing I dati
Undefined mean rev. level–>Not covariance stationary

Question 20

Q

Random Walk

Answer

A

random walk = value in one period is equal to the value in another period, plus a random error.
Random walk without a drift: xt = xt−1 + εt b0=0 and b1=1
Random walk with a drift (con b0): xt = b0 + xt−1 + εt b1=1

Question 21

Q

Seasonality

Answer

A

More than 1 lag
o quarterly data = seasonal lag is 4;
o monthly data = seasonal lag is 12.

Question 22

Q

Root Mean Squared Error (RMSE)

Answer

A

to assess accuracy of autoregressive models.
* lower RMSE = better
* Out-of-sample forecasts

Question 23

Q

structural change

Answer

A

significant shift in the plotted data at a point in time that seems to divide the data into two distinct patterns

Question 24

Q

Cointegration:

Answer

A

two time series are economically linked (same macro variables) or follow the same trend and that relationship is not expected to change

Question 25

Q

Autoregressive Conditional Heteroskedasticity (ARCH

Answer

A

variance of residuals in 1 time period is dependent on the variance of the residuals in another period.–> st. errors of the coefficients and the hypothesis tests are invalid.

Question 26

Q

Penalized regression

Answer

A

Regression–> good when I have features reduces overfitting by imposing a penalty on—and reducing—the nonperforming features.

Question 27

Q

Support vector machine

Answer

A

classification; separates the data into one of two possible classifiers based on a model-defined hyperplane.

Question 28

Q

K-nearest neighbor

Answer

A

classification based on nearness to the observations in the training sample

Question 29

Q

Classification and regression tree

Answer

A

. Classification of target variables
* when there are significant nonlinear relationships among variables.
* Binary classification (categorical data)
* Provides a visual explanation

Question 30

Q

Ensemble learning

Answer

A

This combines predictions from multiple models, resulting in a lower average error rate.

Question 31

Q

Random forest

Answer

A

This is a variant of the classification tree whereby a large number of classification trees are trained using data bagged from the same data set; solution for overfitting

Question 32

Q

Dimension reduction=Principal components analysis

Answer

A

summarizes info into smaller set of uncorrelated factors called eigenvectors.

Question 33

Q

K-means clustering.

Answer

A

split observations into k non-overlapping clusters; a centroid is associated with each cluster
* Hyperparameter = parameter set before analysis begins ex. 20 groups

Question 34

Q

Hierarchical clustering

Answer

A

hierarchy of clusters without any predefined number of clusters

Question 35

Q

Neural networks

Answer

A

o input layer
o hidden layers (which process the input)
 The nodes in hidden layer =neurons–> summation operator (that calculates a weighted average) and an activation function (a nonlinear function).
o output layer.

o Good for speech recognition and natural language processing
o Good for modelling complex interactions among many features

Question 36

Q

Deep learning nets

Answer

A

many hidden layers (more than 20) useful for pattern, speech, and image recognition

Question 37

Q

Reinforcement learning

Answer

A

seek to learn from their own errors maximizing a defined reward.

Question 38

Q

precision (P)

Answer

A

true positives / (false positives + true positives)

Question 39

Q

recall

Answer

A

= true positives / (true positives + false negatives)

Question 40

Q

accuracy

Answer

A

true positives + true negatives) / (all positives and negatives

Question 41

Q

F1 score

Answer

A

2 × P × R) / (P + R)

Question 42

Q

Covariance stationarity

Answer

A

o Constant and finite mean. E(xt) = E(xt-1) ATTENZIONE no growth rate della mean
o Constant and finite variance.
o Constant and finite covariance with leading or lagged values

determine cov. StationaryDickey-Fuller test

Question 43

Q

Logistic model

Answer

A

dependent variable is binary

Question 44

Q

steps in a data analysis project

Answer

A

1-conceptualization of the modeling task,
2- data collection,
3- data preparation and wrangling,
4- data exploration,
5- and model training.