Computational Statistic Flashcards by Koos Assheuer

Validation set method

Split Data into training and test set, fitting the model on the training set and calculating MSE on the validation set.

How well did you know this?

Not at all

Perfectly

Haldout Method

Perform validation set method several times and choose the model with the best validation error

How well did you know this?

Not at all

Perfectly

Validation error

The perdiction error calculated on a test set

How well did you know this?

Not at all

Perfectly

Validation set disadvantes

Validation set unriable without much data

- Validation error highly depended on initial randomness of validation sample

How well did you know this?

Not at all

Perfectly

LOOCV

Leave one out Cross Validation

How well did you know this?

Not at all

Perfectly

Leave one out Cross Validation

Train model n times each point being left out once
Calculate each models test error on the left out point
Report mean error

How well did you know this?

Not at all

Perfectly

Validation set method cost

Cheap

How well did you know this?

Not at all

Perfectly

LOOCV cost

Expensive

How well did you know this?

Not at all

Perfectly

K-Fold Cross validation

divide data into k datasets, for each leaving out a small part as validation set
train model on eack of the k training sets and measure error on validation set
report average mse

How well did you know this?

Not at all

Perfectly

Bias of K-Fold validation error

validation error of K-Fold is too optimistic (because the model with the best error is selected)

How well did you know this?

Not at all

Perfectly

Nested K-Fold Validation

Select model with K-Fold and report error of selected model on test set

How well did you know this?

Not at all

Perfectly

Temporal Data

Be carefull not to include data from any point leter than what the model should predict

How well did you know this?

Not at all

Perfectly

Sub selection

Try different subsets of features and seöect the subset with the best validation error

How well did you know this?

Not at all

Perfectly

Feature

Input variables

How well did you know this?

Not at all

Perfectly

Dimensional Reduction

transform features into smaller feature spaces

How well did you know this?

Not at all

Perfectly

Regularization

Study These Flashcards

Add punishment term for large coefficients

Target variable

Study These Flashcards

Y variable the model should predict

(x1, y1), (X2, y2),…,(Xn, yn)

Study These Flashcards

data points

x1, x2, … ,xn

Study These Flashcards

feature vector

y1, y2, …, yn

Study These Flashcards

target variable: output of the model

Hyperplane

Study These Flashcards

In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1

Seperating Hyperplanes

Study These Flashcards

A hyperplane is used to divide the feature space into two sides (one for each class)
predict new point depending on which side of the hyperplane it is

classifier margin

Study These Flashcards

width that the seperating hyperplane could be increased by without hitting a new datapoint

Maximum margin classifier

Study These Flashcards

The seperating hyperplane with the largest possible margin

Soft margin classifier

Allow a budget B of total misclassification to increase the margin of the classifier at the cost of some misclassifications

Support vectors

Points that are "wrong": Lie in the margin/on the border or on the wrong side of the margin.

Computational Statistic Flashcards

(26 cards)