Machine Learning - Reading 7 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What are target variables

A

this is the dependent variable and can be continuous, categorical or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are features

A

these are the independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is training data set

A

this is the sample used to fit the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a hyperparameter

A

this is a model input specified by the research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is unsupervised learning

A

The ML program is not given labeled training data, instead, puts are provided without any conclusions about those inputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is deep learning

A

algorithms are used for complex tasks such as image recognition, natural language processing and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is supervised learning

A

uses labeled training data to guide the ML algorithms towards superior forecasting accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is overfitting

A

is an issue with supervised ML that result when a large number of features are included in the data sample. Overfitting has occurred when the noise in the target variable seems to improve the model fit. Overfitting the model will decrease the accuracy of model forecasts on other data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is bias error

A

This is the in-sample error resulting from model w/ a poor fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is variance error

A

This is the out-of-sample error resulting from overfitting models that do not generalize well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is base error

A

These are residual errors due to random noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What will a graph of a robust, well generalized model show?

A

a robust, well-generalizing model will show an improving accuracy rate as the sample size is increased, and the in-sample and out-sample error rates will converge toward a desired accuracy level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is penalized regressions

A

penalized regression models reduce the problem of overfitting by imposing a penalty based on the number of features used in the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is LASSO

A

minimizes the sum of absolute value of slope coefficients

*automatically eliminates the least predictive features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is a support vector machine

A

is a linear classification algorithm that separates the data into one of two possible classifiers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is k-nearest neighbor

A

classify an observation based on nearness to the observation in the training sample

17
Q

what is the tradeoff in the specification of k in KNN

A

when k is too small, you have a high error rate and when it is too large, you dilute the result by averaging across too many outcomes

18
Q

what is CART method

A

Are appropriate when target when the target variable is categorical, and typically used when the target is binary. Classification trees assign to one of two possible classifications at each node

19
Q

what is ensemble and random forest method

A

Ensemble learning is the technique of combining predictions from multiple models rather than a single model
**The ensemble method results in a lower average error rate because the different models cancel out noise
Random forest is a variant of classification trees whereby a large number of classification trees are trained using data from the dame data set

20
Q

what is the ensemble method of aggregation of heterogeneous learners

A

Different algorithms are combined tighter via a voting classifier. The different algorithms each get a vote, and then we go with whichever answer gets the most votes. Ideally, the models selected will have sufficient diversity in approach, resulting in a greater level of confidence in the predictions

21
Q

what is the ensemble method of aggregation of homogeneous learners

A

The same algorithm is used, but on different training data. The different training data samples van be derived by bootstrapping

22
Q

what is principal component analysis? what is Eigenvectors and eigenvalues?

A

**dimension reduction

PCA: Summarizes the information in a large number of correlated factors into a much smaller set of uncorrelated factors

Eigenvectors: These uncorrelated factors, ate linear combinations of the original features

Eigenvalue: The proportion of total variance in the data set explained by the eigenvector

23
Q

what is clustering

A

clustering is the process of grouping observations into categories based on familiarities in their attributes (called cohesion)

24
Q

what is k means clustering

A

k-means clustering partitions observations into k nonoverlappinf clusters, where k is a parameter. Each cluster has a centroid, and each new observation is assigned to a cluster based on its proximity to the centroid

25
Q

what is hierarchical clustering

A

builds an hierarchy of clusters without any predefined number of clusters.

In a agglomerative clustering, we start with one observations as its own cluster and other similar observations to that group, or form another non overlapping cluster.

A divisive algorithms starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters

26
Q

what are neural networks

A

are constructed as nodes connected by links. The input layer consists of nodes with values for the features. These values are scaled so that the information from multiple nodes is comparable

27
Q

what are neurons in neural networks

A

nodes that follow input variable

28
Q

what is a summation operator in neural networks

A

collates the information and passes in on to an activation function

29
Q

what is an activation function in neural networks

A

generate value from input value

30
Q

what is a forward propagation in neural networks

A

value passed to other neurons in other hidden layers

31
Q

what is a backward propagation in neural networks

A

process employed to revise the weights in the summation operator

32
Q

what is deep learning

A

deep learning are neural networks with many hidden layers

33
Q

what is reinforcement learning

A

reinforcement learning algorithms have an agent that seeks to maximize a defined reward given defined constraints