Practice Machine Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

2019-01-16

caret package in R

A

The caret package, short for classification and regression training, contains numerous tools for developing predictive models using the rich set of models available in R.

The caret package (short for Classification And Regression Training) contains functions to streamline the model training process for complex regression and classification problems.
The package utilizes a number of R packages but tries not to load them all at package start-up (by removing formal package dependencies, the package startup time can be greatly decreased). The package “suggests” field includes 30 packages. caret loads packages as needed and assumes that they are installed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

function createDataPartition

A

create stratified random splits of a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

stratification

A

to ensure that the random sampling is done in a way that guarantees that each class is properly represented in both training and test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

partition

A
  1. a wall, screen, or piece of glass used to separate one area from another in a room or vehicle
  2. the process of dividing a country into two or more separate countries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

nR04

A

the variable nR04 is the number of 4-membered rings in a compound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

zero-variance predictor

A

the simple split of the data into a test and training set caused three descriptors to have a single unique value in the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

function nearZeroVar

A

The function nearZeroVar can be used to identify near zero-variance predictors in a dataset. It returns an index of the column numbers that violate the two conditions above.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

multicollinearity

A

In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

predictor

A

An independent variable, sometimes called an experimental or predictor variable, is a variable that is being manipulated in an experiment in order to observe the effect on a dependent variable, sometimes called an outcome variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

VIF

A

In linear models, the traditional method for reducing multicollinearity is to identify the offending predictors;
For each variable, this statistic measures the increase in the variation of the model parameter estimate in comparison to the optimal situation (i.e., an orthogonal design).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

PCA

A

principal component analysis can reduce the number of variables while maintaining accuracy.

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

orthogonal

A

relating to or composed of right angles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

simple regression model

A

But the idea is that basically we’re going to fit a line to a set of data. So that line will consist of basically multiplying a set of coefficients by each of the different predictors. And so then we get new predictors or new covariance and we multiply them by the coefficients that we estimated with our prediction model and then we get a new prediction for a new value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RMSE

A

Root mean squared error
The root-mean-square deviation or root-mean-square error is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Feature selection with caret package in R

A

Three methods:

  1. Remove redundant features; findCorrelation
  2. Rank features by importance; Learning Vector Quantization (LVQ), decision tree
  3. Feature selection. Recursive Feature Elimination, random forest.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

set.seed( )

A

The random numbers are the same, and they would continue to be the same no matter how far out in the sequence we went.
Tip. Use the set.seed function when running simulations to ensure all results, figures, etc are reproducible.