More Quant Stuff Flashcards by Anon Anon

Linear Regression

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression.

How well did you know this?

Not at all

Perfectly

Geometric Interpretation of Linear Regression

A line in 2D space, or a plane in 3D space, depending on how many variables you have interacting

How well did you know this?

Not at all

Perfectly

Under what assumptions is Linear Regression unbiased?

Linearity, No Autocorrelation, Multivariate Normality, Homoscedasticity, No/low Multicollinearity

How well did you know this?

Not at all

Perfectly

Hypothesis testing of coefficients

Set the hypothesis
Set the significance level, criteria for a decision
Compute the test statistics
Make a decision
Can test using manual feature elimination (e.g. build a model with all the features, drop the features that have a high p-value, drop redundant features using correlations and VIF) and automated (e.g. RFE and Regularization) techniques

How well did you know this?

Not at all

Perfectly

Outlier detection

Z-score/Extreme Value Analysis, Probabilistic and Statistical Modeling, Linear Regression Models, Information Theory Models, High Dimensional Outlier Detection Methods

How well did you know this?

Not at all

Perfectly

Cooks distance

Cook’s distance is the scaled change in fitted values, which is useful for identifying outliers in the X values (observations for predictor variables). Cook’s distance shows the influence of each observation on the fitted response values. An observation with Cook’s distance larger than three times the mean Cook’s distance might be an outlier.

How well did you know this?

Not at all

Perfectly

Leverage Point

A leverage point is determined by a point whose x-value is an outlier, while the y-value is on the predicted line (y-value is not an outlier). Therefore, this point is undetected by the y-outlier detection statistics

How well did you know this?

Not at all

Perfectly

p-value

The probability of obtaining test results at least as extreme as the result actually observed, under the assumption that the null hypothesis is correct. A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Reject the null hypothesis at p < 0.05

How well did you know this?

Not at all

Perfectly

t-statistic

The ratio of the difference in a number’s estimated value from its assumed value to its standard error.

How well did you know this?

Not at all

Perfectly

Maximum Likelihood Estimation

A method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable

How well did you know this?

Not at all

Perfectly

Estimation mean of Gaussian

Σx_i/N

How well did you know this?

Not at all

Perfectly

Variance of Gaussian

σ² where σ = sqrt[(1/(n-1))Σ(x_i - mean of x)²]

How well did you know this?

Not at all

Perfectly

Multivariate Gaussian

A generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution

How well did you know this?

Not at all

Perfectly

If X and Y are joint Gaussians, how do you compute E(X|Y)?

𝐸(𝑋|𝑌) = 𝐸(𝑋) + 𝑐𝑜𝑣(𝑋,𝑌)𝑐𝑜𝑣−1(𝑌)(𝑌−𝐸(𝑌))=1+𝑌/2

How well did you know this?

Not at all

Perfectly

Basic Time Series Models

Autoregressive (AR), integrated (I), Moving-Average (MA), Autoregressive Moving Average (ARMA), Autoregressive Integrated Moving Average (ARIMA), Autoregressive Fractionally Integrated Moving Average (ARFIMA); can use vector-valued data and add initial V out front; Autoregressive Conditional Heteroskedasticity (ARCH) and associates (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc); Markov Switching Multifractal (MSMF) for modeling volatility evolution; Hidden Markov Model (HMM) → many of htem are in sktime package

How well did you know this?

Not at all

Perfectly

AR(1)

Study These Flashcards

An autoregressive model of order 1: X_t = Σ φ_iX_(t-i) + ε_t up to from i = 1 to p (the number 1 in this case)

MA(1)

Study These Flashcards

Moving Average model of order 1:
X_t = μ + Σθ_iε_(t-i) + ε_t from i = 1 to q (1 in this case)

ARMA

Study These Flashcards

Given a time series of data X_t, the ARMA model is a tool for understanding and, perhaps, predicting future values in this series. The AR part involves regressing the variable on its own lagged (i.e., past) values. The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past.

Lagrange optimization

Study These Flashcards

A strategy for finding local maxima and minima of a function where you optimize the Lagrangian: L(x,Λ) = f(x) + Λg(x). The partial derivatives with respect to x and Λ should equal zero.

Standard errors of fitted coefficients of models calculation

Study These Flashcards

Calculate the residuals (observed - predicted), calculate the sum squared of the residuals SSE, compute the mean squared error SSE/(n - k - 1) where n is number of observations and k is number of independent variables, compute the variance-covariance matrix of the regression coefficients V = (X’X)^-1 * MSE. Take the square root of each corresponding diagonal element of V

Standard errors of fitted coefficients of sample means calculation

Study These Flashcards

σ / sqrt(n)

Central Limit Theorem

Study These Flashcards

Under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed.

Bootstrapping

Study These Flashcards

Any test or metric that uses random sampling with replacement (e.g. mimicking the sampling process), and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. It’s very simple and it can check the stability of the results, but it depends heavily on the estimator used.

Lasso Regression

Study These Flashcards

Least Absolute Shrinkage and Selection Operator; a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model.

Ridge Regression

A method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated.

Regression Trees

Decision tree learning where the target variable can take a discrete set of values (called trees). The leaves represent class labels and branches represent conjunctions of features that lead to those class labels.

Logistic Regression

When you have a binary set of outcomes, it models the log odds of an event as a linear combination of one or more independent variable.

k-Means Clustering

A method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean (cluster centers or cluster centroid), serving as a prototype of the cluster.

Boosting

An ensemble meta-algorithm for primarily reducing bias, variance. It is used in supervised learning and a family of machine learning algorithms that convert weak learners to strong ones

Linearity assumption

The target and each independent variable have a linear relationship

No (or little) autocorrelation assumption

There are no residuals that are dependent on each other

Multivariate Normality assumption

The data should be normally distributed, and the average of the residuals should be zero (the normal distribution residuals would be a straight line)

Homoscedasticity assumption

The error term is the same across all values of independent variables (if you plot the residual values vs predicted values, there is no discernible pattern)

No (or low) Multicollinearity assumption

No or low numbers of independent variables are correlated to one another

Z-score

A metric that indicates how many standard deviations a data point is from the sample’s mean, assuming a gaussian distribution. This makes z-score a parametric method. z = (x-μ)/σ

Dbscan

Density Based Spatial Clustering of Applications with Noise: clustering algorithm that can identify outliers

Isolation Forest

Isolation forest’s basic principle is that outliers are few and far from the rest of the observations. To build a tree (training), the algorithm randomly picks a feature from the feature space and a random split value ranging between the maximums and minimums. This is made for all the observations in the training set. To build the forest a tree ensemble is made averaging all the trees in the forest. Then for prediction, it compares an observation against that splitting value in a “node”, that node will have two node children on which another random comparisons will be made. The number of “splittings” made by the algorithm for an instance is named: “path length”. As expected, outliers will have shorter path lengths than the rest of the observations. s(x,n) = 2^(-E(h(x))/c(n) where h(x) is the path length

More Quant Stuff Flashcards

(37 cards)