Computational Statistic Flashcards

1
Q

Validation set method

A

Split Data into training and test set, fitting the model on the training set and calculating MSE on the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Haldout Method

A

Perform validation set method several times and choose the model with the best validation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Validation error

A

The perdiction error calculated on a test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Validation set disadvantes

A
  • Validation set unriable without much data

- Validation error highly depended on initial randomness of validation sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

LOOCV

A

Leave one out Cross Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Leave one out Cross Validation

A
  • Train model n times each point being left out once
  • Calculate each models test error on the left out point
  • Report mean error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validation set method cost

A

Cheap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

LOOCV cost

A

Expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-Fold Cross validation

A
  • divide data into k datasets, for each leaving out a small part as validation set
  • train model on eack of the k training sets and measure error on validation set
  • report average mse
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bias of K-Fold validation error

A
  • validation error of K-Fold is too optimistic (because the model with the best error is selected)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Nested K-Fold Validation

A

Select model with K-Fold and report error of selected model on test set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Temporal Data

A

Be carefull not to include data from any point leter than what the model should predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sub selection

A

Try different subsets of features and seöect the subset with the best validation error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Feature

A

Input variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Dimensional Reduction

A

transform features into smaller feature spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Regularization

A

Add punishment term for large coefficients

17
Q

Target variable

A

Y variable the model should predict

18
Q

(x1, y1), (X2, y2),…,(Xn, yn)

A

data points

19
Q

x1, x2, … ,xn

A

feature vector

20
Q

y1, y2, …, yn

A

target variable: output of the model

21
Q

Hyperplane

A

In a p-dimensional space, a hyperplane is a flat affine subspace of dimension p-1

22
Q

Seperating Hyperplanes

A
  • A hyperplane is used to divide the feature space into two sides (one for each class)
  • predict new point depending on which side of the hyperplane it is
23
Q

classifier margin

A

width that the seperating hyperplane could be increased by without hitting a new datapoint

24
Q

Maximum margin classifier

A

The seperating hyperplane with the largest possible margin

25
Q

Soft margin classifier

A

Allow a budget B of total misclassification to increase the margin of the classifier at the cost of some misclassifications

26
Q

Support vectors

A

Points that are “wrong”: Lie in the margin/on the border or on the wrong side of the margin.