Machine Learning Flashcards

1
Q

Machine learning

A

Algorithms that can learn from observational data and make predictions from it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unsupervised learning

A

An algorithm makes sense of a data set without prior learning experience or answers to learn from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Latent variable

A

A previously unknown part of the data, which unsupervised learning can do

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Supervised learning

A

An algorithm learns from a data set plus the correct “answers”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Training/testing sets

A

A model is trained using a training set of data, then the model is tested on a similar but disjoint set of data to test its accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are practical considerations for training/testing sets?

A
  1. Both sets must be large to have many outliers and variations.
  2. Both sets must be randomly chosen from the source data pool.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is train/test useful?

A

It can guard against overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

K-fold cross variation

A
  1. Split data randomly into K segments.
  2. Take one segment as the test set.
  3. Train on the other sets and compare with the test set.
  4. Average the resulting r-squared values.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

K-means clustering

A
  1. Randomly pick K centroids.
  2. Assign each data point to the closest centroid.
  3. Recompute the centroids based on the average position of each centroid’s data points.
  4. Iterate until points stop moving.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a large caveat with K-means clustering?

A

The algorithm does not assign names or titles to clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Entropy (data science)

A

Disorder of data

Zero if all data points are the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly