Quantitative Methods Flashcards
What are the types of linear model?
a) Linear trend model- Appropriate if data points are equally distributed and below the regression line.
b) Log linear trend model- If data points are nonlinear then residuals from linear model will be positive or negative for a period of time.
What does durbin watson stat close to -
a) 2
b) 0
c) 4 means.
a) no correlation
b) positive correlation
c) negative correlation
DW = 2(1-r)
What is Auto regression model?
AR model is a time series that is regressed on its past values. Past values of dependent variable are used to estimate current values.
What rule is used to calculate value of out sample data in time series analysis?
Chain of forecasting rule and the value will be mean reverting.
When is Auto regression model used?
When dependent variable can take a value within a confined range or it is covariance stationary.
What is covariance stationary?
When mean, variance and covariance with lagged and leading values do not change over time.
When will AR have finite mean reverting level?
When the absolute value of the lag coefficient is less than 1. (b1<1)
What is root mean squared error?
RMSE is used to compare the accuracy of AR models in forecasting out sample values.
Lower RMSE, better predicting power
True or False
Out of sample performance is most important indicator of a models real world forecasting ability.
True
Which test is used to check serial correlation in AR model?
We dont use DW test, we use t test
How to check serial correlation?
a) Calculate T stat using auto correlation
b) Calculate critical values taking df=n-k
Conclusion, if fail to reject - no serial correlation
Ho= r= 0
Ha = r =/ 0
When can we not use auto regression models?
Most economic and financial data have unit roots i.e. when values in AR model may go outside the range.
What does value of b1 equal to -
a) 1
b) <1
c) >1 means
a) unit root
b) stable trend
c) unstable trend
How to use AR model when it has unit root?
To use AR we have to transform data using first differenced.
What do we mean by -
a) Random walk with no drift
b) Random walk with drift
a) bo=0
b) bo=/ 0
How to test for unit root?
Dickey fuller test
Ho = b1 = 1
Ha = b1 =/ 1
Critical values,
Fail to reject - unit root
How to check seasonality in data?
It is tested by calculating auto correlation of error term.
A statistically different lagged error term corresponding to periodicity of data indicates seasonality.
What is ARCH model?
Auto regressive conditional heteroskedacticity model.
It is present if the variance of the residuals from an AR model are dependant on the variance of lagged errors.
How to check for ARCH model?
Run regression between Standard error of estimate^2 and Standard error of estimate (t-1)^2
SEE^2 = bo + b1SEE(t-1)^2
Can the regression be used if -
a) One data is stationary and other has unit root
b) Both data have unit root and are cointegrated
c) Both data have unit root and are not cointegrated
a) No
b) Yes
c) No
By which regression is bo coefficient computed?
Ordinary least squares (OLS) regression
When will the non linear trend model show convex and concave curve?
Positive exponential growth means that the random variable (i.e., the time series) tends to increase at some constant rate of growth. If we plot the data, the observations will form a convex curve. Negative exponential growth means that the data tends to decrease at some constant rate of decay, and the plotted time series will be a concave curve.
True or False
If a time series is at its mean-reverting level, the model predicts that the next value of the time series will be the same as its current value.
True
Which regression corrects for heteroskedasticity?
Generalized least squares
Which model works best with non linear relationship?
Machine learning
What is target variable and feature?
Target Variable is dependent variable and feature is independent variable
What is -
a) Supervised learning
b) unsupervised learning
c) Deep learning?
a) Uses label data, target and feature should be defined. Binary classification. Eg. - multiple regression.
b) does not use label data, only feature is entered. Cannnot define whether data is continuous or categorical. Eg. - clustering
c) Image recognition, uses neural network. For continuous and categorical data.
What are overfitting and underfitting data?
In overfitting data, there is high R^2. no noise and inability to generalise pattern.
In underfitting data, no recognised pattern and predicting power of machine is low.
Which type data takes the most time of analyst?
Training data
subject to in sample error
How to solve problem of overfitting?
Complexity reduction - reduce independent variables
Cross validation -Use k fold cross verification
What is penalised regression?
Reduces the problem of overfitting. makes the model parsimonious. Seeks to minimize the total sum of errors. Technique - LASSO and regularisation
What is support vector machine?
When we want to predict one out of two possible outcomes.
What is soft margin classification?
A technique which helps in handling outliers in the data set
What is k nearest neighbour technique and what happens when-
a) k is too low
b) k is too high
c) k is even
Classify data in the basis of nearness of observation. Eg. - predicting bankruptcy.
a) high error rate
b) dilution of results
c) No clean dataset winner
What is classification and regression tree (CART)-
CART is often described as blackbox due to opacity.
For classification tree - Target variable is binary or categorical, can be used when data is non linear. Logit and probit allows us to create a prediction when target is binary but assumes linear.
For regression tree - it is used when data is continous
True or false
As we move down the CART tree, prediction error decreases.
True
What is ensemble and random forest?
Ensemble - Combine predictions of multiple models such that error of one model is overcome by the other.
Types- Aggregation of heterogenous learners, aggregation of homogenous learners.
Random Forest - Similar to CART, but here best tres are combined to make a single tree and we use random features. It increases signal noise ratio.
What is Eigen vector in principal component analysis?
Number of features that have minimal information are combined into 1 Independent variable i.e. Eigen vector.
What are scree plots in principal component analysis?
If there are too many eigen values, we create a chart known as scree plots. It tells how much variance is explained by each vectors.