Machine Learning Algorithms Flashcards

Question 1

Q

Logistic regression

Answer

A

Supervised, binary Y/N
Credit risk, medical conditions, will person perform an action
Sigmoid function fitted to the data (‘S’ shape between Y/N)

Question 2

Q

Linear regression

Answer

A

Supervised, numeric response variable
Economic/financial forecasts, marketing effectiveness, risk valuation
Line of best fit, least sum of squares

Question 3

Q

Support vector machine

Answer

A

Supervised, multi-classification
Customer classification (l, m, h), genomic identification
Draw a line between classes by choosing certain “support” data points to map our ‘margins’, then draw a hyperplane between the two margins
Non-linear version chooses a distance function named “kernel” and maps the learning task to a higher dimensional space. Then applies a SVM classifier in the new space
SVMs are not memory efficient because they have to store the support vectors, which can grow in size with training data

Question 4

Q

Decision trees

Answer

A

Supervised, binary/multi-class/numeric response
Customer analysis, medical conditions
Start at the ‘root node’, internal node takes inputs and provides output. Leaf node is the final decision
Decisions can be binary, numeric (i.e. boundary), or multi-class
Adv: less need for feature transformations prior to running model
DisAdv: Very susceptible to overfitting, therefore we must “prune” the tree

Question 5

Q

Decision trees: deciding on structure

Answer

A

How do we decide on the root node? Which features correlate most with the label
We can sometimes get to the leaf nodes without the use of all features (faster training)
Nodes are split based on the feature that has the largest information gain (IG) between parent node and its split nodes
One metric to quantify IG is to compare entropy before and after splitting
Training (i.e. building the tree) is by maximising IG to choose splits (i.e. the impurity of split sets are lower).

Question 6

Q

Random forests

Answer

A

Supervised, binary/multi-class/numeric response
Collection of decision trees, sometimes decisions trees can be inaccurate - RF are much more accurate and reduce overfitting in decision trees
Randomly select a subset of features from data, find feature with highest correlation to the label and use as root node. Then repeat, excluding the feature used previously as the root node
Performed numerous times, which may produce different predictions
We take a ‘survey’/vote of the results, and our final result will be the summary result (i.e mean, mode)

Question 7

Q

k-means clustering

Answer

A

Unsupervised, multi-classification
Use cases: data exploration, customer categorisation
“k” groups of classes
Start with random points, then iterate to find better locations for each point by comparing the distance of each ‘k’ point against the data points (i.e. they become centroids)
The centroids then move to become the centre point between the data points
Continue to iterate until we reach ‘equilibrium’, where the total variation between the centroid and its corresponding data points is the lowest

Question 8

Q

k-means clustering: finding the optimum ‘k’

Answer

A

Plot ‘reduction in variation’ and ‘k’ against each other and find the “elbow point”: where the variation stops changing considerably.

Question 9

Q

k-nearest neighbour

Answer

A

Supervised, multi-classification
Uses: recommendation engines, similar articles, objects
The number of neighbours to take into account when classifying a new point
‘k’ is often decided by the business case. Should be:
1. Large enough to reduce the influence of outliers
2. Small enough that classes with a small sample size don’t lose influence

Question 10

Q

Latent Dirichlet allocation (LDA)

Answer

A

Unsupervised, multi-classification
Uses: topic discovery, sentiment analysis, automated document tagging
1. perform standard text preprocessing (i.e. stopwords, stemming, tokenisation), then choose ‘k’, which is the number of topics we want the LDA to classify the data into
2. Count words by topic, then by document
3. Take the product of each word-topic matrix in each document
4. Take the highest value and reallocate that word to the topics accordingly
5. Do for all words
6. Use the structure to then analyse and classify a new document

Question 11

Q

Factorisation Machines Algorithm (FMA)

Answer

A

General-purpose supervised learning algorithm you can use for both classification/regression
Extension of the linear model that is designed to capture interactions between features within high dimensional sparse data sets
Good choice for tasks dealing with high dimensional spare data sets, such as click prediction and item recommendation
E.g. click prediction system, capturing patterns observed when ads from certain category are placed on pages from a certain page category

Question 12

Q

Entropy

Answer

A

A relative measure of disorder in the data source

In classification we are trying to reduce the entropy
Disorder is present when the distinction between two or more distinct groups is not pure (e.g. if all 100 observations were 1, then disorder = 0)
For a group of 100 observations where 50% are 0 and 50% are 1, then we say that the entropy is very high (lots of disorder in the data - 50% chance of being classified either way)

Machine Learning Algorithms Flashcards

(12 cards)