L11 - Unsupervised Learning Flashcards

Question 1

Q

What is the goal of unsupervised learning?

Answer

A

To identify patters in unseen data.

Question 2

Q

Give some examples of objectives that can be achieved with unsupervised learning…

Answer

A

Identify new animal species, customer segmentation, identifying fraudulent activity.

Question 3

Q

Unsupervised learning is used for clustering tasks, explain how this is done…

Answer

A

Iterate all points in data, establishing distance metrics between one another. Clusters can be created from data points that are closer to one another.

Question 4

Q

Unsupervised learning is used for community detection, explain what this is and how it’s done…

Answer

A

A community is a group of interconnected nodes. Nodes that share more connections have a higher connection strength. E.g. Community of school friends on facebook will be strong due to many mutual friendships.

Question 5

Q

Unsupervised modelling is used for topic modelling, explain what this is and how it’s done…

Answer

A

Topic modelling identifies topics and common themes in a data set. This can be done through methods such as word embedding using lemma or stems words.

Question 6

Q

Give some examples of clustering algorithms…

Answer

A

K means -> Identifies points close to K centroids where K is a hyper parameter given by the user.

DBSCAN -> Density Based Spatial Clustering of Applications with Noise. Finds high density regions, and creates cluster by expanding outwards.

Hierarchical Clustering -> Repeatedly divide clusters into sub-clusters.

Question 7

Q

What are the 2 types of clustering algorithms? Define each…

Answer

A

Hard Clustering -> Each data belongs to 1 cluster and only 1 cluster. Used when we want to make a definite decision on the data. I.e data can’t belong to multiple classifications. e.g data is either in A or B or C.

Soft Clustering -> Data can be assigned to multiple clusters.

Question 8

Q

What is a common similarity / distance metric used for clustering?

Answer

A

Euclidean distance ( L2 norm )

Question 9

Q

When do we use Jaccard Similarity? How is it calculated?

Answer

A

We use Jaccard Similarity when we want to establish the similarity between 2 sets. It’s calculated by the number of intersection points of the sets divided by the number of union data point of the sets.

The Jaccard Distance = 1 - Jaccard Similarity.

Question 10

Q

How do we calculate Jaccard Distance?

Answer

A

Jaccard Distance = 1 - Jaccard Similarity.

L11 - Unsupervised Learning Flashcards

(10 cards)