Machine Learning Flashcards

Question

What is a biased model?

Answer 1

A model which does not replicate salient features in the underlying pattern, also producing poor predictions

Answer 2

Unwilling to spend. In the context of machine learning, this is used to describe more conservative and simpler models

Answer 3

- limit the number of terminal nodes by increasing the minimum RSS reduction threshold (this is not optimal because it might exclude useful partitions) - penalize the model for producing predictions that fit the data too closely (therefore Penalized Obj = Obj + Penalty) * we are still trying to minimize this new objective function

Answer 4

Penalized obj. = Shannon entropy + alpha|J| * where J is the number of terminal nodes in the tree and alpha is a scaling quantity that we can choose. Alpha is therefor a hyper parameter

Answer 5

The process of rolling the tree back to smaller sub-trees to avoid overfitting

Answer 6

The process of reducing the complexity of a tree by using an adjusted cost function which penalizes overfitting

Answer 7

An optimal sub-tree which is sufficiently generalizable for the prediction problem at hand

Answer 8

We select the simplest model which overlaps (in terms of standard deviations) with the best performing tree on the training data

Answer 9

We divide the data into: - training set which will be used to train the model - validation set which will be used to tweak the model and validate its performance - test set which we only use once when reporting the true performance of the model

Answer 10

- We split the data into K partitions - K-1 folds/partitions are used to train the data and remaining fold is used for validation - the folds are then cycled until all folds have been used for training and validation * test set is left untouched until reporting time * we utilize more of our data for training * we have a small sample size of K for our validation errors which we can then calculate mean and variance from

Answer 11

A type of non-linear model which uses interconnected nodes and mathematical functions to produce responses. They are primarily used for supervised learning but can be easily adapted to unsupervised learning tasks

Answer 12

An activation (or activation function)

Answer 13

A weight (or parameter)

Answer 14

Since the nodes in this layer do not contain any direct observations, their state cannot be known

Answer 15

The number of hidden layers + 1

Answer 16

The jth node in the lth layer of the network

Answer 17

The number of nodes in the lth layer of the network

Answer 18

The weight parameter linking the kth node in layer l-1 and the jth node in layer l

Answer 19

The jth bias in layer l

Answer 20

> runif(n , a , b)

Answer 21

> data.frame(Y=someVector, X=someMatrix/Vector)

Answer 22

> library(neuralnet) > neuralNetModel = neuralnet(Y~X, hidden = c(n, n-1), data = yourData) > plot(neuralNetModel)

Answer 23

Data generating process (i.e. the true underlying function which is producing the data)

Answer 24

The gradient descent algorithm. It assesses the plane defined by the cost function and updates the parameters based on the partial derivatives (slope of the plane) leading to a local minimum x(i+1) = xi + lambda.g(xi) * where g() represents the partial derivative of the plane * where lambda represents the learning rate

Answer 25

Either - after some predefined number of steps or - when some predefined step size is achieved (i.e. gradient levels out to some tolerance level)

Answer 26

Mean Square Error

Answer 27

[1/N].sum of [prediction - actual]²

Answer 28

- the logistic function [output between 0 and 1] - the rectified linear units functions (ReLU) [output between 0 and inf] - the hyperbolic tangent function (tan-h) [output between -1 and 1] - the identity function [output is -inf to inf] *typically for regression problems *refer to notes for their equations

Answer 29

A function which we are trying to maximize or minimize

Answer 30

- MSE for regression problems - Cross-Entropy Error for classification problems * refer to notes for cross entropy equation

Answer 31

A method of penalizing overfitting

Answer 32

The adding of a penalty term to the objective function which penalizes the model for overfitting

Answer 33

1) Start with an over specified unconstrained model | 2) Find a value for the L2 regularization parameter which optimizes validation performance

Answer 34

The number of passes of the entire training set the machine learning algorithm has completed. If the batch size is the entire training set, then the number of epochs will be the number of iterations.

Machine Learning Flashcards

(58 cards)