Practice Machine Learning Flashcards

Question 1

Q

2019-01-16

caret package in R

Answer

A

The caret package, short for classiﬁcation and regression training, contains numerous tools for developing predictive models using the rich set of models available in R.

The caret package (short for Classification And Regression Training) contains functions to streamline the model training process for complex regression and classification problems.
The package utilizes a number of R packages but tries not to load them all at package start-up (by removing formal package dependencies, the package startup time can be greatly decreased). The package “suggests” field includes 30 packages. caret loads packages as needed and assumes that they are installed.

Question 2

Q

function createDataPartition

Answer

A

create stratiﬁed random splits of a data set.

Question 3

Q

stratification

Answer

A

to ensure that the random sampling is done in a way that guarantees that each class is properly represented in both training and test set.

Question 4

Q

partition

Answer

A

a wall, screen, or piece of glass used to separate one area from another in a room or vehicle
the process of dividing a country into two or more separate countries

Question 5

Q

nR04

Answer

A

the variable nR04 is the number of 4-membered rings in a compound

Question 6

Q

zero-variance predictor

Answer

A

the simple split of the data into a test and training set caused three descriptors to have a single unique value in the training set

Question 7

Q

function nearZeroVar

Answer

A

The function nearZeroVar can be used to identify near zero-variance predictors in a dataset. It returns an index of the column numbers that violate the two conditions above.

Question 8

Q

multicollinearity

Answer

A

In statistics, multicollinearity (also collinearity) is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

Question 9

Q

predictor

Answer

A

An independent variable, sometimes called an experimental or predictor variable, is a variable that is being manipulated in an experiment in order to observe the effect on a dependent variable, sometimes called an outcome variable.

Question 10

Q

VIF

Answer

A

In linear models, the traditional method for reducing multicollinearity is to identify the offending predictors;
For each variable, this statistic measures the increase in the variation of the model parameter estimate in comparison to the optimal situation (i.e., an orthogonal design).

Question 11

Q

PCA

Answer

A

principal component analysis can reduce the number of variables while maintaining accuracy.

Principal component analysis is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Question 12

Q

orthogonal

Answer

A

relating to or composed of right angles

Question 13

Q

simple regression model

Answer

A

But the idea is that basically we’re going to fit a line to a set of data. So that line will consist of basically multiplying a set of coefficients by each of the different predictors. And so then we get new predictors or new covariance and we multiply them by the coefficients that we estimated with our prediction model and then we get a new prediction for a new value.

Question 14

Q

RMSE

Answer

A

Root mean squared error
The root-mean-square deviation or root-mean-square error is a frequently used measure of the differences between values predicted by a model or an estimator and the values observed.

Question 15

Q

Feature selection with caret package in R

Answer

A

Three methods:

Remove redundant features; findCorrelation
Rank features by importance; Learning Vector Quantization (LVQ), decision tree
Feature selection. Recursive Feature Elimination, random forest.

Question 16

Q

set.seed( )

Answer

Study These Flashcards

A

The random numbers are the same, and they would continue to be the same no matter how far out in the sequence we went.
Tip. Use the set.seed function when running simulations to ensure all results, figures, etc are reproducible.

Practice Machine Learning Flashcards

(16 cards)