More Quant Stuff Flashcards

1
Q

Linear Regression

A

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Geometric Interpretation of Linear Regression

A

A line in 2D space, or a plane in 3D space, depending on how many variables you have interacting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Under what assumptions is Linear Regression unbiased?

A

Linearity, No Autocorrelation, Multivariate Normality, Homoscedasticity, No/low Multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis testing of coefficients

A
  1. Set the hypothesis
  2. Set the significance level, criteria for a decision
  3. Compute the test statistics
  4. Make a decision
    Can test using manual feature elimination (e.g. build a model with all the features, drop the features that have a high p-value, drop redundant features using correlations and VIF) and automated (e.g. RFE and Regularization) techniques
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Outlier detection

A

Z-score/Extreme Value Analysis, Probabilistic and Statistical Modeling, Linear Regression Models, Information Theory Models, High Dimensional Outlier Detection Methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cooks distance

A

Cook’s distance is the scaled change in fitted values, which is useful for identifying outliers in the X values (observations for predictor variables). Cook’s distance shows the influence of each observation on the fitted response values. An observation with Cook’s distance larger than three times the mean Cook’s distance might be an outlier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Leverage Point

A

A leverage point is determined by a point whose x-value is an outlier, while the y-value is on the predicted line (y-value is not an outlier). Therefore, this point is undetected by the y-outlier detection statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

p-value

A

The probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Reject the null hypothesis at p < 0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

t-statistic

A

The ratio of the difference in a number’s estimated value from its assumed value to its standard error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Maximum Likelihood Estimation

A

A method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Estimation mean of Gaussian

A

Σx_i/N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variance of Gaussian

A

σ² where σ = sqrt[(1/(n-1))Σ(x_i - mean of x)²]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multivariate Gaussian

A

A generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If X and Y are joint Gaussians, how do you compute E(X|Y)?

A

𝐸(𝑋|𝑌) = 𝐸(𝑋) + 𝑐𝑜𝑣(𝑋,𝑌)𝑐𝑜𝑣−1(𝑌)(𝑌−𝐸(𝑌))=1+𝑌/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Basic Time Series Models

A

Autoregressive (AR), integrated (I), Moving-Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA); can use vector-valued data and add initial V out front; Autoregressive Conditional Heteroskedasticity (ARCH) and associates (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc); Markov Switching Multifractal (MSMF) for modeling volatility evolution; Hidden Markov Model (HMM) → many of htem are in sktime package

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

AR(1)

A

An autoregressive model of order 1: X_t = Σ φ_iX_(t-i) + ε_t up to from i = 1 to p (the number 1 in this case)

17
Q

MA(1)

A

Moving Average model of order 1:
X_t = μ + Σθ_iε_(t-i) + ε_t from i = 1 to q (1 in this case)

18
Q

ARMA

A

Given a time series of data X_t, the ARMA model is a tool for understanding and, perhaps, predicting future values in this series. The AR part involves regressing the variable on its own lagged (i.e., past) values. The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past.

19
Q

Lagrange optimization

A

A strategy for finding local maxima and minima of a function where you optimize the Lagrangian: L(x,Λ) = f(x) + Λg(x). The partial derivatives with respect to x and Λ should equal zero.

20
Q

Standard errors of fitted coefficients of models calculation

A

Calculate the residuals (observed - predicted), calculate the sum squared of the residuals SSE, compute the mean squared error SSE/(n - k - 1) where n is number of observations and k is number of independent variables, compute the variance-covariance matrix of the regression coefficients V = (X’X)^-1 * MSE. Take the square root of each corresponding diagonal element of V

21
Q

Standard errors of fitted coefficients of sample means calculation

A

σ / sqrt(n)

22
Q

Central Limit Theorem

A

Under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed.

23
Q

Bootstrapping

A

Any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. It’s very simple and it can check the stability of the results, but it depends heavily on the estimator used.

24
Q

Lasso Regression

A

Least Absolute Shrinkage and Selection Operator; a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model.

25
Q

Ridge Regression

A

A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.

26
Q

Regression Trees

A

Decision tree learning where the target variable can take a discrete set of values (called trees). The leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

27
Q

Logistic Regression

A

When you have a binary set of outcomes, it models the log odds of an event as a linear combination of one or more independent variable.

28
Q

k-Means Clustering

A

A method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

29
Q

Boosting

A

An ensemble meta-algorithm for primarily reducing bias, variance. It is used in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones

30
Q

Linearity assumption

A

The target and each independent variable have a linear relationship

31
Q

No (or little) autocorrelation assumption

A

There are no residuals that are dependent on each other

32
Q

Multivariate Normality assumption

A

The data should be normally distributed, and the average of the residuals should be zero (the normal distribution residuals would be a straight line)

33
Q

Homoscedasticity assumption

A

The error term is the same across all values of independent variables (if you plot the residual values vs predicted values, there is no discernible pattern)

34
Q

No (or low) Multicollinearity assumption

A

No or low numbers of independent variables are correlated to one another

35
Q

Z-score

A

A metric that indicates how many standard deviations a data point is from the sample’s mean, assuming a gaussian distribution. This makes z-score a parametric method.
z = (x-μ)/σ

36
Q

Dbscan

A

Density Based Spatial Clustering of Applications with Noise: clustering algorithm that can identify outliers

37
Q

Isolation Forest

A

Isolation forest’s basic principle is that outliers are few and far from the rest of the observations. To build a tree (training), the algorithm randomly picks a feature from the feature space and a random split value ranging between the maximums and minimums. This is made for all the observations in the training set. To build the forest a tree ensemble is made averaging all the trees in the forest.

Then for prediction, it compares an observation against that splitting value in a “node”, that node will have two node children on which another random comparisons will be made. The number of “splittings” made by the algorithm for an instance is named: “path length”. As expected, outliers will have shorter path lengths than the rest of the observations.

s(x,n) = 2^(-E(h(x))/c(n) where h(x) is the path length