Week 7 Flashcards by Ryan Storey

Real uses of unsupervised learning

Customer segmentation (single parents, young party-goers)
Identifying fraud (bank transactions, GPS logs, bots on social media
Identifying new animal species
Creating the classes needed for a classification algorithm

How well did you know this?

Not at all

Perfectly

How does K-means work

Identifies points close to K centroids, where K is given by the user.

How well did you know this?

Not at all

Perfectly

How does DBSCAN work?

It finds core regions of high density and expands clusters from them.

How well did you know this?

Not at all

Perfectly

What is Hierarchical clustering?

Can be agglomerative or divisive, as long as you produce a hierarchy of clusters

Something like:
1. Split all points into clusters A and B
2. Split cluster A into clusters A1 and A2
3. Split cluster B into clusters B1 and B2
4. Split cluster A1 into …

How well did you know this?

Not at all

Perfectly

Hard vs Soft clustering

Hard: each object belongs in one cluster, similar to how a perceptron performs classification

Soft: objects are assigned to multiple clusters, with corresponding probabilities, similar to how a logistic regression performs classification.

How well did you know this?

Not at all

Perfectly

What did DBSCAN work best at compared to others?

Identifying rings.

How well did you know this?

Not at all

Perfectly

What is a key ingredient for clustering?

What data is represented and HOW
The similarity metric/distance metric
(L1 or L2 norm, Jaccard Similarity)

How well did you know this?

Not at all

Perfectly

What is Jaccard Similarity

A n B | / | A u B |

How well did you know this?

Not at all

Perfectly

What is Jaccard distance

1 - Jaccard Similarity

How well did you know this?

Not at all

Perfectly

What do we do in Dimensionality Reduction

Remove noise from the data
Focus on the features (or combinations of features that are actually important)
Less number-crunching = more efficient

How well did you know this?

Not at all

Perfectly

What are the two types of Dimensionality Reduction

Feature selection + extraction

How well did you know this?

Not at all

Perfectly

3 types of feature selection

Filter methods
Wrapper methods
Embedded methods

How well did you know this?

Not at all

Perfectly

Filter method examples

Information gain
Correlation with target
Pairwise correlation
Variance threshold

How well did you know this?

Not at all

Perfectly

Wrapper method examples

Recursive feature elimination
Sequential feature selection
Permutation importance

How well did you know this?

Not at all

Perfectly

Embedded method examples

L1 Lasso Regularization
Decision tree

How well did you know this?

Not at all

Perfectly

What is Variance thresholding

Study These Flashcards

Filter method: low-variance features contain less information
Calculate variance of each feature, drop features with variance below threshold.
FEATURES MUST BE SAME SCALE!

What is Forward Search

Study These Flashcards

Wrapper method: create n models, one feature each, select best one. Create n-1 models, adding one feature, select best one, proceed until you have m features

What is recursive feature elimination

Study These Flashcards

Wrapper method: create n-1 models, with n-1 features each, select best, create n-2 models, removing one feature, select best one, proceed until you have removed m features

What is a Decision Tree

Study These Flashcards

Embedded method: Splits the data into a tree. Can split by Gini impurity coefficient (measures how pure a node is), information gain, and variance reduction

What is feature extraction

Study These Flashcards

Linear and nonlinear methods. Extract useful combinations of features in the data

What is PCA

Study These Flashcards

Linear method of feature extraction
Find an orthogonal coordinate transformation (rotation, rotation+reflection) such that every new coordinate is maximally informative

How does PCA work

Study These Flashcards

Flip points so that each coordinate is maximally informative and orthogonal to each other. Imagine rotating cube, we take a slice where x and y describes most of the data (z a little bit)

What are the new variables from PCA called

Study These Flashcards

Principle components, and they are linear combinations of the original coordinates.

yi are uncorrelated, and are ordered by the fraction of the total variation each retains. PC1 (y1) is most informative, then PC2 etc.

When does PCA work best vs worst

Study These Flashcards

BEST: high correlation between features
WORST: all variables are equally important and uncorrelated. PCA is uninformative.

PCA input and output

Input high dimensional data Output low dimensional data

What is t-SNE

non-linear dimensionality reduction Take the distribution of distances between N points in the dataset. Scatter N points in 2 or 3 dimensions randomly Move those N points around until the distribution of distances between them resembles D.

Why t-SNE

Very useful for visualising high-D data

What is UMAP

Similar to tSNE, only slightly different in every step Runs faster uses less memory No problem embedding into >3 dimensions Can preserve both local and global structure.

Problems with t-SNE and UMAP

Both depend a lot on their hyperparameters Cluster sizes and distances between clusters means nothing X and y axes are basically impossible to interpret.

Week 7 Flashcards

(29 cards)