Quantitative methods Flashcards
Durbin-Watson tests for?
Value 0 =
Value 2 =
DW lower =
DW middle =
DW upper =
serial correlation
Value 0 = perfect serial correlation
Value 2 = No serial correlation
DW lower = reject the null
DW middle = inconclusive
DW upper = do not reject the null
T stat =
t stat = Coefficient - mean / SE
mean might be zero
R^2 coefficient of determination =
Adjusted R^2 =
Correlation coefficient =
SST =
MSE =
SEE =
R^2 = RSS / SST =
Adjusted R^2 = lower than R^2
Square root of R^2
SST = SSE + RSS
Total Variaiton = SSEunexplained + RSSexplained
MSE = SSE / (N-k-1)
SEE = Square root of MSE
What is multicollinearity?
Signs of multicollinearity
Problems caused by it?
Correcting multicollinearity?
What is multicollinearity?
Two or more independent variable are highly correlated.
Signs of multicollinearity? High R^2, low t-score, use of a matrix.
Problems caused by it? Estimates of regression coefficients can be unreliable.
Correcting multicollinearity? 1. Remove some of the variables 2. Re-run the model
What is serial correlation?
Test
Correcting for serial correlation?
When variables in a time series appear correlated over previous periods of time (used to predict security price changes. Error terms appear correlated. Use DW upper and lower limits to see. Above upper to 2 no serial correlation exists
Test = Durbin Watson
Correcting for serial correlation? Hansen Method adjusts the SE of regression coefficients UPWARDS until there is no serial correlation.
Which model assigns a 1 or 0 to the value of an independent variable?
Discriminant analysis models
Hansen method adjusts what/?
Adjusts standard errors for both conditional heteroskedasticity and serial correlation
What do probit models test for?
How do they estimate the value of the dependant variable?
What sort of variables can probit test?
Test for normal distribution.
Dependant variable is expected to be 1.
Test for qualitative variables
What is homoskedasticity?
What is conditional heteroskedasticity?
What is unconditional heteroskedasticity?
Variance of the error term is constant across all observations
Variance of error terms changes in a systematic manner that is correlated with values of the independant variable
Variance of the error term changes in an unsystematic way that is not correlated with the independent variables.
Dicky Fuller tests for?
Durbin Watson tests for?
Breusch Pagen tests for?
Dicky Fuller tests for Non-stationarity
Durbin Watson tests for serial correlation
Breusch Pagen tests for conditional Heteroskedasticity using chi-squared.
Y = b0 + b1X1 + b2X2 + error term
b0 =
b1X1 =
b2X2 =
Y = What you are forecasting b0 = Intercept b1X1 = Independent variable x regression coefficient b2X2 = Independent variable x regression coefficient
T-stat =
Coefficient - mean / SE
Coefficient could be advertising or hours worked
F Stat =
F Table =
F stat > F table
F Stat = (RSS/K) / (SSE/n-k-1) or RSS/K
F Table = k / n-k-1
F stat > F table then one independent variable explains variance of the dependent variable
k = number of independant variables ie advertising and hours worked k = 2 n = number of data sets usually years
How many dummy variables needed for 4 quarters?
Dummy variables is ONE less. So 3.
Mean reverting level =
MRL = b0 / 1 - b1
AR Model with 101 observations, SE =
What test is used and what level shows AR model is not correctly specified? =
Less than this level shows?
SE = 1 / Square root of n
T-distribution is used at 5%, above 2 is mis-specified.
Less than this level shows that it is specified correctly.
Random walk =
Random walk unit root lag coefficient b1 =
Mean Reverting =
Random walk = Is Significant
lag coefficient = 1
Mean Reverting = Not significant
ARCH Errors =
error terms are heteroskedastic and SE of regression coefficient is incorrect.
Signs include lagged variable being significantly different to zero in the model.
Dicky fuller
a) has a problem with the unit root
b) does not have a problem with the unit root
Dicky fuller
a) has a problem with the unit root = accept the null
b) does not have a problem with the unit root = reject the null as they are cointegrated.
First differenced random walk =
Yt = b0 + error term
Where Yt = Xt - Xt-1
A lag coefficient > 1 concludes
Lag coefficient > 1 = The model has an explosive root.
First differencing =
Allows analysts to conclude original time series are random walk
Dendogram hierarchical clustering with short dendrites indicates?
Denrites are vertical lines and shorter lines indicate similar clusters of data.
Supervised learning
vs
Unsupervised learning
Supervised learning uses prelabelled data such as fraudulent activities
vs
Unsupervised learning does not use pre-labelled data and algorithms try to describe the data.
Bagging aka bootstrapping samples original data or old data? What type of samples are used?
Bagging uses original data and reduces teh incidence of overfitting. New data bags are produced from random sampling.
What is divisive hierarchical clustering?
Supervised or unsupervised?
Begins with one cluster divided into smaller clusters.
Top down process until each cluster has only one observation
Supervised learning
What is dimension reduction?
Supervised or Unsupervised?
Identifying major correlated data factors and reducing them into fewer uncorrelated variables, a form of unsupervised learning.
Penalised regression =
Adds a penalty as the number of included
What is a classification and Regression Tree? CART
What does it minimise?
Supervised or Unsupervised?
Splitting data into two categories using decision trees and binary branching to classify observations. It makes no assumptions about data sets.
They are used to minimise classification errors.
CART is a form of supervised machine learning
Ensemble learning
Ensemble learning results in more accurate and more stable models. Ensemble learning can aggregate both heterogenous and homogenous learners.
Base error
Bias error
Variance error
Base error arises from randomness of data
Bias error arises when a model does not fit training data well
Variance error arises when the model fits too well creating noise.
What is overfitting?
What is underfitting?
Which is more susceptible to linner and non linnear functions?
When a machine learning model learns the input and target data set too precisely. A non-linnear function error.
The opposite and suseptible to linnear function errors
Centroids k-means clustering?
What does it require?
Centroids k-means clustering is when an algorithm iterates until no observations are moved to new clusters.
It requires a define number of groups ‘k’ being the number of data inputs.
What is an Eigen value?
The proportion of total variance explained by an eigenvector from the initial data.
Agglomorate clustering =
Divisive clustering =
How many observations in the final cluster for each?
Agglomorate clustering is bottom up and final clusters contain all items/observations. Clusters increase in size.
Divisive clustering = Top down, final cluster contains one observation
k-fold cross validation gives an estimate of?
k-fold cross validation gives an estimation of ‘ou-of-sample’ errors
Random forrest classifier?
classification trees undertake classification to reduce a problem of overfitting.
ML Regression problem target is:
ML classification problem target data is:
ML Regression problem target is continuous
ML classification problem target data is either categorical or ordinal
Which neural layer does learning take place?
Which element of a neural network increases/decrease strength of input?
Hidden layer
The activation function increases/decreases strength of input.
Deep learning nets aka:
supervised or unsupervised?
What separates them?
Deep learning nets aka Artificial Neural Networks
BOTH supervised or unsupervised!
What separates them is the many hidden layers (at least three)
Activation of a neural network =
Non-linner function which adjusts the strength of an unput.
Number of nodes in a Deep Learning Net DLN is determined how?
The number of nodes is determined by the number of dimensions in a feature set.
Training a neural network is forward or backward?
Both forward and backward training is used for a neural network.
Different between reinforcement and supervised learning?
Reinforcement learning has neither labelled data nor can it give instantaneous feedback.
Data curation stage includes
Data exploration includes
Web spidering which gathers raw data.
Data exploration feature selection for out of sample data and Feature engineering for optimising selected features.
Mutual Information Token dollar appearing in all class of text is assigned a:
Token dollar appearing in one class of text is assigned a:
Mutual Information Token dollar appearing in all class of text is assigned a = 0
Token dollar appearing in one class of text is assigned a = 1
Stages of data exploration =
- Data Exploration (model features engineered)
- Feature selection
- Feature engineering
Structured or Unstructured for:
Standard ML models
Text ML models
Standard ML models = Structured data
Text ML models = Unstructured data
ML Iterative process
Step 1
Step 2
ML Iterative process
Step 1 = Conceptualization
Step 2 = Reconceptualization
4 V’s of big data =
Volume (quantity of data)
Variety (array of data)
Velocity (speed of data creation)
Veracity (reliability/credibility of data)
Trimming
vs
Filtration
vs
Winsorisation
Trimming removes outliars
vs
Filtration removes unrequired data
vs
Winsorigation is data wrangling (preparing data for ML model) where outliars high and low are replaced.
Precision of data formula =
Recall data formula =
Accuracy formula =
FP =
FN =
Given True Positive TP, False Positive FP, False negatvie FN, and True Negative TN.
Precision of data formula = TP / TP + FP
Recall data formula = TP / TP + FN
Accuracy formula = TP + TN / (TP + FP + TN + FN)
FP = Type 1 error
FN = Type 2 error
Area Under Curve value normal:
AUC showing random guessing and higher convexity =
Area Under Curve value normal: 0.5
AUC showing random guessing and higher convexity = > 0.5 ie 0.67
3 stages of model training =
Small data set more likely of:
3 stages of model training =
- Method selection
- Performance evaluation
- Tuning
Underfitting more likely from small data sets
Main advantage of a simulation model over a decision tree?
3 data types of simulation data =
Simulations provide FULL distribution in addition to expected values.
3 data types of simulation data =
1 Historical data
2. Cross sectional data
3. Adopting a statistical distribution.
Steps in simulations
What is a probabilistic variable
- Determine probablistic vairables
- Define probability distributions
- Check for correlations
What is a probabilistic variable is a trade between number of variables and complexity of the simulation.
Which model below better copes with sequential risk and concurrent risk?
Simulations =
Decision trees =
Scenario analysis =
Simulations = Accomodates both sequential and concurrent
Decision trees = better accommodates sequential risk
Scenario analysis =better accommodates concurrent risk
random walk signs
Test for Random walk on an AR(1) model =
Slope coefficient is close to value: 1
An AR(1) model is tested using the Dicky fuller test and it tests for random walk.
confidence internal =
90%
95%
99%
90% = 1.6 95% = 2 99% = 2.6
Coefficient +/- SE x confidence interval
High bias error
High variance error
Under or over fitted?
High bias error indicates underfitting a dataset
High variance error indicates overfitting a dataset
Precision =
Recall =
Accuracy =
F1 =
Precision = TP / TP + FP
Recall = TP / TP + FN
Accuracy = TP + TN / TP + TN + FP + FN
F1 = 2 x P x R / (P + R)
What is data wrangling
Text wrangling (preprocessing) can be essential in making sure you have the best data to work with. it requires performing normalization and involves the following:
lowercasing
removing stop words such as “the” and “a” because of their many occurrences.
stemming: cutting down a token to its root stem.