Clustering & classification Flashcards

Question 1

Q

What is the concept of distance when it comes to classification, why is it necessary?

Answer

A

What

A distance defines the dissimilarity between two points.

Why

Every classification method needs the concept of distance because we need to define a distance matrix between observations.

Methods

Two of the most common methods are:

Question 2

Q

What types of hierarchical clustering methods are there? What are the pros and cons of these methods?

Answer

A

Types

Agglomerative: each observation is considered as a cluster. Iteratively, the most similar clusters (leafs) are merged until one single cluster forms (root)
Divisive: the inverse of the agglomerative approach. It begins with a single root and subsequently the most heterogeneous clusters are divided until each observation form a cluster.

Pros

Cons

Question 3

Q

What types of partioning clustering methods are there? What are their pros and cons?

Answer

A

Types

k-means: each cluster is represented by the center of the cluster
k-medoids or PAM: each cluster is represented by one of the points in that cluster
CLARA (Clustering LARge Applications): suitable when large datasets are analysed

Pros

Cons

Question 4

Q

What is the Naïve Bayes approach?

Answer

A

Naïve Bayes is a probabilistic machine learning algorithm and can be used for filtering spam, classifying documents etc..

It relies on two concepts>

(4 cards)