Different Machine Learning Flashcards

1
Q

What is penalized regression

A
  • Simmilar to maximizing adjusted R square.
  • Demension Reduction
  • Eliminates/minimazie overfitting
  • Regression coefficients are chosen to minimize the sum of the squared error, plus a penalty term that increases with the number of included features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Support Vector Machine (SVM)

A

Support Vector Machine

It is Classification, Regression, and Outlier detection
Classifying data that is not complex or non-linear.

Is a linear classifier that determines the hyperplane that optimally seperates the observation into two sets of data points.

Does not requier any hyperparameter.

Maximize the probability of making a correct prediction by determining the boundry that is furthest from all observation.

Outliers do not affect either the support vectors or the discriminant boundry.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

K-Nearest Neighbor

A

Classification

Classify new observation by finding similarities in the existing data.

Makes no assumption about the distribution of the data.
It is non-parametric.

KNN results can be sensitive to inclusion of irrelevant or correlated featuers, so it may be neccessary to select featuers manually.
Thereby removing less irrelevant information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Classification and Random Tree (CART)

A

Classification and Regression Trees

Part of supervised ML

Typically applied when the target is binary.

If the goal is regression, the prediction would be the mean of the values of the terminal node.

Makes no assumption about the characteristics of the traning data, so if left unconstrained, potentially it can perfectly learn the traning data.

To avoid overfitting, regulation paramterers can be added, such as the maximum dept of the tree.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ensemble Learning

A

A technique of combining the predictions from a collection of models to achieve a more accurate prediction

The method of combining multiple learning algorithms, as in ensemble learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Random Forest

A

A random forest classifier is a collection of a large number of decision trees trained via a bagging method.

For example, a CART algorithm would be trained using each of the n independent datasets (from the bagging process) to generate the multitude of different decision trees that make up the random forest classifier.

Contains 4 featuers

protects against overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Principle component analysis (PCA)

A

It is part of unsupervised ML
Dimension Reduction

Use to reduce highly correlaed featuers of data into few main uncorrelated composite variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

K-mean clustering

A

K-means partitions observations into a fixed number, k, of non-overlaping cluster.

Each cluster is characterized by its centroid, and each observation is assigned by the algorithm to the cluster with the centroid to which that observation is closest.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hierarchical Clustering

A

is an iterative procedure used to build a hierarchy of clusters.

In k-means clustering, the algorithm segments the data into a predetermined number of clusters; there is no defined relationship among the resulting clusters.

In hierarchical clustering, however, the algorithms create intermediate rounds of clusters of increasing (in “agglomerative”) or decreasing (in “divisive”) size until a final clustering is reached.

The process creates relationships among the rounds of clusters, as the word “hierarchical” suggests.

Although more computationally intensive than k-means clustering, hierarchical clustering has the advantage of allowing the investment analyst to examine alternative segmentations of data of different granularity before deciding which one to use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly