Machine Learning Flashcards

Question 1

Q

Logistic Regression

Answer

A

Classification/Regression; input: numerical/categorical/ordinal
Uses logistic function to predict probability (in (0,1)) of the input belonging to a certain class (e.g. input being True or False)
Input can be separated in two classes by linear boundary
Coefficients → Maximum Likelihood Estimators
Positive and negative class
Input has large positive value? —> P in (0.5, 1]
Input has large negative value?—> P in [0, 0.5)
Input lies on linear boundary? —> P = 0.5
Pros: Low variance
Cons: High bias

Question 2

Q

Logistic Regression

Answer

A

Classification/Regression; input: numerical/categorical/ordinal
Uses logistic function to predict probability (in (0,1)) of the input belonging to a certain class (e.g. input being True or False)
Input can be separated in two classes by linear boundary
Coefficients → Maximum Likelihood Estimators
Positive and negative class
Input has large positive value? —> P in (0.5, 1]
Input has large negative value?—> P in [0, 0.5)
Input lies on linear boundary? —> P = 0.5
Pros: Low variance
Cons: High bias

Question 3

Q

Hierarchical Clustering

Answer

A

Question 4

Q

Random forest

Answer

A

Question 5

Q

Decision trees

Answer

A

Recursively split the data by testing values of attributes (= divide and conquer)
Non-parametric algorithm
Builds classification and regression models in the form of a tree
Takes a set of data and breaks it down into smaller homogenous subsets
This process continues until a decision/prediction is reached
Start with best prediction variable as Root node (based on Gini-impurity & lowest entropy)
Variable with lowest Gini-impurity & entropy predicts best
Continue with internal nodes (decision points), repeatedly select the most certain variable
End with Leaf nodes (the classifications)
Entropy = uncertainty with the data. High entropy (mess), low entropy (pure)
Pros: Very explainable; Works well for categorical variables
Cons: Overfitted easily (not very accurate when predicting unknown set)

Question 6

Q

Neural Networks

Answer

A

Classification, Regression & Clustering;
input: vectors
Non-parametric algorithm
Predictive model; consists of artificial neurones, which contain weights & biases
Assumes normalisation & one-hot-encoding
Takes vectors as input
Produces vectors as output
Uses function (with input, bias & weight to calculate activations for each neuron, and eventually outcome)
Back-propagation: optimise the weights so that NN can accurately map the inputs to the outputs
Heavily used in Deep Learning, good for solving problems like handwriting recognition and face detection
Pros: Can solve very complicated quests; Very accurate, improves itself with Back-propagation
Cons: Not well explainable (internal working not clear, black-box algorithm); Does not work properly when assumptions are not met

(6 cards)