Quantitative Methods Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are the types of linear model?

A

a) Linear trend model- Appropriate if data points are equally distributed and below the regression line.
b) Log linear trend model- If data points are nonlinear then residuals from linear model will be positive or negative for a period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does durbin watson stat close to -
a) 2
b) 0
c) 4 means.

A

a) no correlation
b) positive correlation
c) negative correlation
DW = 2(1-r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Auto regression model?

A

AR model is a time series that is regressed on its past values. Past values of dependent variable are used to estimate current values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What rule is used to calculate value of out sample data in time series analysis?

A

Chain of forecasting rule and the value will be mean reverting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is Auto regression model used?

A

When dependent variable can take a value within a confined range or it is covariance stationary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is covariance stationary?

A

When mean, variance and covariance with lagged and leading values do not change over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When will AR have finite mean reverting level?

A

When the absolute value of the lag coefficient is less than 1. (b1<1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is root mean squared error?

A

RMSE is used to compare the accuracy of AR models in forecasting out sample values.
Lower RMSE, better predicting power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True or False
Out of sample performance is most important indicator of a models real world forecasting ability.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which test is used to check serial correlation in AR model?

A

We dont use DW test, we use t test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to check serial correlation?

A

a) Calculate T stat using auto correlation
b) Calculate critical values taking df=n-k
Conclusion, if fail to reject - no serial correlation
Ho= r= 0
Ha = r =/ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When can we not use auto regression models?

A

Most economic and financial data have unit roots i.e. when values in AR model may go outside the range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does value of b1 equal to -
a) 1
b) <1
c) >1 means

A

a) unit root
b) stable trend
c) unstable trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to use AR model when it has unit root?

A

To use AR we have to transform data using first differenced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we mean by -
a) Random walk with no drift
b) Random walk with drift

A

a) bo=0
b) bo=/ 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How to test for unit root?

A

Dickey fuller test
Ho = b1 = 1
Ha = b1 =/ 1
Critical values,
Fail to reject - unit root

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to check seasonality in data?

A

It is tested by calculating auto correlation of error term.
A statistically different lagged error term corresponding to periodicity of data indicates seasonality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is ARCH model?

A

Auto regressive conditional heteroskedacticity model.
It is present if the variance of the residuals from an AR model are dependant on the variance of lagged errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to check for ARCH model?

A

Run regression between Standard error of estimate^2 and Standard error of estimate (t-1)^2
SEE^2 = bo + b1SEE(t-1)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can the regression be used if -
a) One data is stationary and other has unit root
b) Both data have unit root and are cointegrated
c) Both data have unit root and are not cointegrated

A

a) No
b) Yes
c) No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

By which regression is bo coefficient computed?

A

Ordinary least squares (OLS) regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

When will the non linear trend model show convex and concave curve?

A

Positive exponential growth means that the random variable (i.e., the time series) tends to increase at some constant rate of growth. If we plot the data, the observations will form a convex curve. Negative exponential growth means that the data tends to decrease at some constant rate of decay, and the plotted time series will be a concave curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False
If a time series is at its mean-reverting level, the model predicts that the next value of the time series will be the same as its current value.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which regression corrects for heteroskedasticity?

A

Generalized least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Which model works best with non linear relationship?

A

Machine learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is target variable and feature?

A

Target Variable is dependent variable and feature is independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is -
a) Supervised learning
b) unsupervised learning
c) Deep learning?

A

a) Uses label data, target and feature should be defined. Binary classification. Eg. - multiple regression.
b) does not use label data, only feature is entered. Cannnot define whether data is continuous or categorical. Eg. - clustering
c) Image recognition, uses neural network. For continuous and categorical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are overfitting and underfitting data?

A

In overfitting data, there is high R^2. no noise and inability to generalise pattern.
In underfitting data, no recognised pattern and predicting power of machine is low.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Which type data takes the most time of analyst?

A

Training data
subject to in sample error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to solve problem of overfitting?

A

Complexity reduction - reduce independent variables
Cross validation -Use k fold cross verification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is penalised regression?

A

Reduces the problem of overfitting. makes the model parsimonious. Seeks to minimize the total sum of errors. Technique - LASSO and regularisation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is support vector machine?

A

When we want to predict one out of two possible outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is soft margin classification?

A

A technique which helps in handling outliers in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is k nearest neighbour technique and what happens when-
a) k is too low
b) k is too high
c) k is even

A

Classify data in the basis of nearness of observation. Eg. - predicting bankruptcy.
a) high error rate
b) dilution of results
c) No clean dataset winner

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is classification and regression tree (CART)-

A

CART is often described as blackbox due to opacity.
For classification tree - Target variable is binary or categorical, can be used when data is non linear. Logit and probit allows us to create a prediction when target is binary but assumes linear.
For regression tree - it is used when data is continous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

True or false
As we move down the CART tree, prediction error decreases.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is ensemble and random forest?

A

Ensemble - Combine predictions of multiple models such that error of one model is overcome by the other.
Types- Aggregation of heterogenous learners, aggregation of homogenous learners.
Random Forest - Similar to CART, but here best tres are combined to make a single tree and we use random features. It increases signal noise ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is Eigen vector in principal component analysis?

A

Number of features that have minimal information are combined into 1 Independent variable i.e. Eigen vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are scree plots in principal component analysis?

A

If there are too many eigen values, we create a chart known as scree plots. It tells how much variance is explained by each vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is clustering? What are the techniques?

A

Grouping data of similar features into 1 cluster. Cohesion and euclidian distance is commonly used. It helps in uncovering hidden structures or similarities in complex data set.
Techniques-
a) K mean clustering
b) hierarchical Clustering - Agglomerative, Devise

41
Q

What is deep learning networks?

A

It is a neural network with more than 20 hidden layers in each node. used in Image recognition and helps in natural language processing.

42
Q

What does summation operator and activation function does in neural network?

A

Summation operator - passes info and takes weighted average
Activation function - Generates output from given input.

43
Q

What is forward propogation and backward integration in neural network?

A

If an output is passed to neuron 2, it is again processed known as forward propagation.
When any changes happen in summation operator while data is being processed in neuron 2, it is known as backward integration

44
Q

What is reinforced learning?

A

It does not rely on label data, no input and output. Only features are given and machine learns itself.

45
Q

A machine learning technique that can be applied to predict either a categorical target variable or a continuous target variable is most likely to describe a ___.

A

Classification and regression tree (CART).

46
Q

What are the 4 V of big data?

A

Volume
Veracity - validity
Variety
Velocity - speed

47
Q

Which step in processing of structured data takes the most time?

A

Data preparation and wrangling

48
Q

What is application programe interface?

A

If the structured data is collected through external sources.

49
Q

What is meta data?

A

The input sample given to filter out data in data preparation and wrangling.

50
Q

What are the 2 parts of structured data wrangling?

A

a) Data transformation - leads to outliers.
To prevent outliers- 1) Trimming 2) Winsorization - convert outliers with maximum value
b) Data scaling - convert features into common units of measurement.
1) Normalisation - sensitive to outliers 2) Standardisation - not sensitive to outliers

51
Q

On which steps does model performance depends in structured data?

A

Data preparation and wrangling
Data exploration

52
Q

What is feature engineering?What are its techniques?

A

Merging 2 or more independent variables to create new feature. Technique - One hot coding - process used to convert binary or dummy variables in order to make model faster and easier. Tends to prevent model unfitting.
Techniques-
a) N gram
b) Numbers
c) Name entity recognition
d) Parts of speech

53
Q

What is stemming and lemmatization?

A

Stemming refers to the text wrangling process in which all similar words are converted into one word. eg. integrate, integrated are converted to integrat.
Lemmatization is an advanced version of stemming.

54
Q

What is token and tokenization?

A

Process of converting word into token.
Process of converting sentences into tokens is tokenisation.

55
Q

What is bag of words?

A

Collection of all tokens.

56
Q

What is N-gram?

A

A process in which 2 words used together are given a single token.

57
Q

If the data distribution is not equal, which is better predictor of accuracy in confusion matrix?
a) Accuracy
b) Fi Score

A

Fi score

58
Q

What does the range of receiver operating characteristics equal to
a) Close to 1
b) Close to 0 means?

A

a) High accuracy
b) Random guesses

59
Q

What is Root Mean Squared Errors?

A

RMSE helps in giving idea of volatility in error term. It is useful when data set is continous.

60
Q

According to model tuning in big data, what happens to fitting curve when complexity is increased?

A

Complexity increases leads to decrease in bias error which further leads to increase in variance error.

61
Q

What is grid search?

A

It is an automatic process of selecting the best hyperparamter combination.

62
Q

When does slight regularization in big data occurs?

A

When the prediciton error of training dataset is small while prediciton error on cross validation dataset is significantly larger.

63
Q

What is ceiling analysis of big data?

A

Evaluation of what tuning is doing and identify weak links which will further help in tuning.

64
Q

True or False
README files associated with a database usually contain information about how, what, and where data is stored

A

True

65
Q

True or False
N-gram implementation will affect the normalization of the BOW.

A

True
N-gram implementation will affect the normalization of the BOW because stop words will not be removed.

66
Q

What is bias error and variance error?

A

Bias error is the prediction error in the training data resulting from underfit models. Variance error is the prediction error in the validation sample resulting from overfitting models that do not generalize well.

67
Q

What type of error does
a) False positive
b) Flase Negative signifies?

A

a) Type 1 error
b) Type 2 error.

68
Q

True or False
High precision is valued when the cost of a type I error is large, while high recall is valued when the cost of a type II error is large.

A

True

69
Q

What is covariance?

A

It tells the relationship between 2 variables. Does not tell the magnitude. Range = -∞ to +∞

70
Q

What is the standard form of line?

A

Y = bo + b1X + E

71
Q

Can a stock has negative beta?

A

No

72
Q

Why is R^2 or coefficient of determination used?

A

To check the reliability of regression.
Higher the R^2, higher the movement explained.

73
Q

What is standard error of estimate or standard error of regression?

A

It measures the degree of variability of the actual Y values relative to estimated Y values from regression.
It gauges the fit of regression line

74
Q

What is F test Statistic?

A

Assesses how well a set of independent variables as a group explains variation in dependent variable.
It is always a one tail test.

75
Q

What is p value?

A

The smallest level of significance at which null can be rejected.

76
Q

What are the 2 types of error?

A

a) Type 1 error- reject the null when it is true.
Probability = significance level
b) Type 2 error - fail to reject the null when it is false.

77
Q

What happens when -
a) significance level>p value
b) significance level<p value

A

a) reject null
b) Fail to reject the null

78
Q

When is a regression said to be unbiased?

A

a) High R^2
b) Intercept = 0
c) Slope = 1

79
Q

True or False,
Adjusted R^2 will always be less than R^2

A

True

80
Q

To evaluate whether 5 independent variables slope are equal to 0. Which test is used?

A

F test

81
Q

What are dummy variables?

A

These are ocassions where independent variable takes form of either “Yes” or “No”. Take value 0 or 1. Often used to quantify impact of quantitative events.

82
Q

What is Akaike Information criteria and Bayesian Information criteria?

A

AIC is used to better forecast and to compete between model. Lower the better.
BIC is used to evaluate goodness of fit in the model. It penalise model for being too complex. Lower the better.

83
Q

What are nested model?

A

These are models such that one model (i.e. full or unrestricted model) has a higher number of independent variables while another model (restricted model) has only a subset of independent variables.
Evaluated by f test.

84
Q

What is heteroskedasticity? What are the types?

A

When variance of residuals is not the same across all observations in sample.
Types -
a) Unconditional - When covariance is not correlated with IV.
b) Conditional - When error variance is correlated with IV. Creates major problem.

85
Q

What are the effects of heteroskedasticity?

A

a) SEE are underestimated due to which t stat is high thus too many type 1 erros.

86
Q

How to detect heteroskedasticity?

A

a) Scatter plot
b) BP chi square test

87
Q

What is BP chi square test?

A

Requires a regression of squared residual terms from original regression equation with Independent variables in the regression.
If 2nd regression-
a) R^2 is high - presence of conditional heteroskedasticity. since error term and IV are correlated.
b) R^2 low - presence of conditional heteroskedasticity.

88
Q

How to correct heteroskedasticity?

A

Do not use SEE of original equation but use higher new SEE known as white corrected SE or heteroskedasticity consistent SE.

89
Q

What is serial correlation?

A

Auto correlation is the situation in which the residual terms are correlated with each other.
a) Positive
b) Negative

90
Q

How to detect serial correlation?

A

a) Graphically
b) Durbin Watson- DW = 2(1-r)
Correlation between residual from one period and those from previous period.
c) Breusch Godfrey BG test

91
Q

How to correct serial correlation?

A

Adjust the coeffecient SE using Hansen method.

92
Q

What is multi collinearity?

A

When 2 or more IV are correlated.

93
Q

What are the effects of multi collinearity?

A

Coefficients are consistent but not reliable. T stat is too low due to high SEE. Thus too many type 2 errors.

94
Q

How to detect multi collinearity?

A

a) F and t test will give conflicting answer and R^2 will be high.
b) Variance Inflation Factor

95
Q

How to correct multi collinearity?

A

Drop one of the correlated variables.

96
Q

What is Variance Inflation Factor?

A

We start by regressing one of the IV against the remaining and calculate the R^2.. Then we calculate VIF as- 1/(1-R^2).
Higher the VIF, higher chances of multi collinearity.

97
Q

What is BG test to detect serial correlation?

A

It regresses the residuals against the orignial set of IV plus one or more IV from lagged residuals. Checked using F test.

98
Q

Which regression models are used for dummy variables?

A

a) probit model - normal distribution.
b) logit model - fatter tails logistic distribution
use log odds as dependent variable
c) discriminant model - altman z score

99
Q

True or False-
A learning curve plots the accuracy rate in the validation or test sample versus the size of the training sample.

A

True