Machine Learning Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Logistic Regression

A
  • Classification/Regression; input: numerical/categorical/ordinal
  • Uses logistic function to predict probability (in (0,1)) of the input belonging to a certain class (e.g. input being True or False)
  • Input can be separated in two classes by linear boundary
  • Coefficients → Maximum Likelihood Estimators
  • Positive and negative class
  • Input has large positive value? —> P in (0.5, 1]
  • Input has large negative value?—> P in [0, 0.5)
  • Input lies on linear boundary? —> P = 0.5
  • Pros: Low variance
  • Cons: High bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Logistic Regression

A
  • Classification/Regression; input: numerical/categorical/ordinal
  • Uses logistic function to predict probability (in (0,1)) of the input belonging to a certain class (e.g. input being True or False)
  • Input can be separated in two classes by linear boundary
  • Coefficients → Maximum Likelihood Estimators
  • Positive and negative class
  • Input has large positive value? —> P in (0.5, 1]
  • Input has large negative value?—> P in [0, 0.5)
  • Input lies on linear boundary? —> P = 0.5
  • Pros: Low variance
  • Cons: High bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hierarchical Clustering

A
  • Assign every datapoint of the input in a cluster from the start
  • Repeatedly merge 2 closest clusters with each other
    Stop with one giant cluster
  • Re-create to any number of clusters you want by undoing any merges
  • Pros: #clusters does not need to select beforehand
  • Cons: More time consuming
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Random forest

A
  • Non-parametric algorithm
  • Create multiple Decision Trees with a bootstrapped dataset
  • Uses randomly select features
  • Use this ‘Forest’ of Decision Trees to get a more accurate prediction results
  • Pros: Reduced variance compared to single DT (more accurate)
  • Cons: Not very explainable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Decision trees

A
  • Recursively split the data by testing values of attributes (= divide and conquer)
  • Non-parametric algorithm
  • Builds classification and regression models in the form of a tree
  • Takes a set of data and breaks it down into smaller homogenous subsets
  • This process continues until a decision/prediction is reached
  • Start with best prediction variable as Root node (based on Gini-impurity & lowest entropy)
  • Variable with lowest Gini-impurity & entropy predicts best
  • Continue with internal nodes (decision points), repeatedly select the most certain variable
  • End with Leaf nodes (the classifications)
  • Entropy = uncertainty with the data. High entropy (mess), low entropy (pure)
  • Pros: Very explainable; Works well for categorical variables
  • Cons: Overfitted easily (not very accurate when predicting unknown set)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Neural Networks

A
  • Classification, Regression & Clustering;
  • input: vectors
  • Non-parametric algorithm
  • Predictive model; consists of artificial neurones, which contain weights & biases
  • Assumes normalisation & one-hot-encoding
  • Takes vectors as input
  • Produces vectors as output
  • Uses function (with input, bias & weight to calculate activations for each neuron, and eventually outcome)
  • Back-propagation: optimise the weights so that NN can accurately map the inputs to the outputs
  • Heavily used in Deep Learning, good for solving problems like handwriting recognition and face detection
  • Pros: Can solve very complicated quests; Very accurate, improves itself with Back-propagation
  • Cons: Not well explainable (internal working not clear, black-box algorithm); Does not work properly when assumptions are not met
How well did you know this?
1
Not at all
2
3
4
5
Perfectly