Machine Learning - Reading 7 Flashcards

Question 1

Q

What are target variables

Answer

A

this is the dependent variable and can be continuous, categorical or ordinal

Question 2

Q

What are features

Answer

A

these are the independent variables

Question 3

Q

what is training data set

Answer

A

this is the sample used to fit the model

Question 4

Q

what is a hyperparameter

Answer

A

this is a model input specified by the research

Question 5

Q

What is unsupervised learning

Answer

A

The ML program is not given labeled training data, instead, puts are provided without any conclusions about those inputs

Question 6

Q

what is deep learning

Answer

A

algorithms are used for complex tasks such as image recognition, natural language processing and so on

Question 7

Q

what is supervised learning

Answer

A

uses labeled training data to guide the ML algorithms towards superior forecasting accuracy

Question 8

Q

What is overfitting

Answer

A

is an issue with supervised ML that result when a large number of features are included in the data sample. Overfitting has occurred when the noise in the target variable seems to improve the model fit. Overfitting the model will decrease the accuracy of model forecasts on other data

Question 9

Q

what is bias error

Answer

A

This is the in-sample error resulting from model w/ a poor fit

Question 10

Q

what is variance error

Answer

A

This is the out-of-sample error resulting from overfitting models that do not generalize well

Question 11

Q

what is base error

Answer

A

These are residual errors due to random noise

Question 12

Q

What will a graph of a robust, well generalized model show?

Answer

A

a robust, well-generalizing model will show an improving accuracy rate as the sample size is increased, and the in-sample and out-sample error rates will converge toward a desired accuracy level

Question 13

Q

What is penalized regressions

Answer

A

penalized regression models reduce the problem of overfitting by imposing a penalty based on the number of features used in the model

Question 14

Q

what is LASSO

Answer

A

minimizes the sum of absolute value of slope coefficients

*automatically eliminates the least predictive features

Question 15

Q

what is a support vector machine

Answer

A

is a linear classification algorithm that separates the data into one of two possible classifiers

Question 16

Q

what is k-nearest neighbor

Answer

A

classify an observation based on nearness to the observation in the training sample

Question 17

Q

what is the tradeoff in the specification of k in KNN

Answer

A

when k is too small, you have a high error rate and when it is too large, you dilute the result by averaging across too many outcomes

Question 18

Q

what is CART method

Answer

A

Are appropriate when target when the target variable is categorical, and typically used when the target is binary. Classification trees assign to one of two possible classifications at each node

Question 19

Q

what is ensemble and random forest method

Answer

A

Ensemble learning is the technique of combining predictions from multiple models rather than a single model
**The ensemble method results in a lower average error rate because the different models cancel out noise
Random forest is a variant of classification trees whereby a large number of classification trees are trained using data from the dame data set

Question 20

Q

what is the ensemble method of aggregation of heterogeneous learners

Answer

A

Different algorithms are combined tighter via a voting classifier. The different algorithms each get a vote, and then we go with whichever answer gets the most votes. Ideally, the models selected will have sufficient diversity in approach, resulting in a greater level of confidence in the predictions

Question 21

Q

what is the ensemble method of aggregation of homogeneous learners

Answer

A

The same algorithm is used, but on different training data. The different training data samples van be derived by bootstrapping

Question 22

Q

what is principal component analysis? what is Eigenvectors and eigenvalues?

Answer

A

**dimension reduction

PCA: Summarizes the information in a large number of correlated factors into a much smaller set of uncorrelated factors

Eigenvectors: These uncorrelated factors, ate linear combinations of the original features

Eigenvalue: The proportion of total variance in the data set explained by the eigenvector

Question 23

Q

what is clustering

Answer

A

clustering is the process of grouping observations into categories based on familiarities in their attributes (called cohesion)

Question 24

Q

what is k means clustering

Answer

A

k-means clustering partitions observations into k nonoverlappinf clusters, where k is a parameter. Each cluster has a centroid, and each new observation is assigned to a cluster based on its proximity to the centroid

Question 25

Q

what is hierarchical clustering

Answer

A

builds an hierarchy of clusters without any predefined number of clusters.

In a agglomerative clustering, we start with one observations as its own cluster and other similar observations to that group, or form another non overlapping cluster.

A divisive algorithms starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters

Question 26

Q

what are neural networks

Answer

A

are constructed as nodes connected by links. The input layer consists of nodes with values for the features. These values are scaled so that the information from multiple nodes is comparable

Question 27

Q

what are neurons in neural networks

Answer

A

nodes that follow input variable

Question 28

Q

what is a summation operator in neural networks

Answer

A

collates the information and passes in on to an activation function

Question 29

Q

what is an activation function in neural networks

Answer

A

generate value from input value

Question 30

Q

what is a forward propagation in neural networks

Answer

A

value passed to other neurons in other hidden layers

Question 31

Q

what is a backward propagation in neural networks

Answer

A

process employed to revise the weights in the summation operator

Question 32

Q

what is deep learning

Answer

A

deep learning are neural networks with many hidden layers

Question 33

Q

what is reinforcement learning

Answer

A

reinforcement learning algorithms have an agent that seeks to maximize a defined reward given defined constraints