1. No separate training phase 2. Can generate complex decision boundaries

1. Can be susceptible to noise 2. Can be slow, since distance is recalculated each time

Week 7 + 8: Machine Learning Flashcards by Alex Carruthers

What is machine learning

learning from data without previous programming used to discover hidden patterns/trends enables data driven decisions

How well did you know this?

Not at all

Perfectly

4 categories of machine learning

Classification
Regression
Clustering
Association analysis

How well did you know this?

Not at all

Perfectly

Classification is used to predict a…

category

How well did you know this?

Not at all

Perfectly

Regression is used to predict a…

numeric value

How well did you know this?

Not at all

Perfectly

Cluster analysis is used to

organise simliar items into groups eg customers

How well did you know this?

Not at all

Perfectly

Association analysis is used to…

capture assocations between items or events

How well did you know this?

Not at all

Perfectly

Name 2 supervised machine learning techniques

Classification
Regression

How well did you know this?

Not at all

Perfectly

Name 2 unsupervised machine learning techniques

Clustering
Association analysis

How well did you know this?

Not at all

Perfectly

What is supervised machine learning

Where you have input variables an output variable using an algorithm to learn the mapping function between the input and the output

How well did you know this?

Not at all

Perfectly

4 examples of supervised machine learning algorithms

KNN
Decision tree
Linear Regression
SVM (Support vector machines)

How well did you know this?

Not at all

Perfectly

What is unsupervised machine learning? It is where you only have…

input data and not corresponding output variables

How well did you know this?

Not at all

Perfectly

What is the goal of unsupervised machine learning?

To model the underlying structure or distribution in the data

How well did you know this?

Not at all

Perfectly

2 examples of unsupervised machine learning algorithma

k-means clustering
apriori for association analysis

How well did you know this?

Not at all

Perfectly

kNN is used to

classify a sample based on its neighbors

How well did you know this?

Not at all

Perfectly

What is k in kNN? The value of k determines…

the number of nearest neighbors to consider

How well did you know this?

Not at all

Perfectly

kNN 4 distance metrics

Euclidean Distance
City Block Distance
Chi square distance
Cosine distance

How well did you know this?

Not at all

Perfectly

2 pros of kNN

No separate training phase
Can generate complex decision boundaries

How well did you know this?

Not at all

Perfectly

2 cons of kNN

Can be susceptible to noise
Can be slow, since distance is recalculated each time

How well did you know this?

Not at all

Perfectly

What is decision tree ? It is a..

Study These Flashcards

hierarchical structure with nodes and directed edges

Name 3 parts of decision tree

Study These Flashcards

Root node - node at the top
Internal nodes - in between
Leaf nodes - nodes at the bottom

A decision tree classification decision is made by

Study These Flashcards

traversing the decision tree from the root node
answer to the test condition determines the branch when leaf node is reached
the category at the leaf node is the classifiction

What is decision tree - depth of a node

Study These Flashcards

number of edges from the root node to that node

What is decision tree - depth of a decision tree

Study These Flashcards

number of edges in the longest path from the root node to the leaf node

What is decision tree - size of a decision tree

Study These Flashcards

the number of nodes in the tree

When to stop splitting a node?

* All samples in the node have the same class label * Max tree depth is reached * change in impurity is reached

2 pros of decision tree

* resulting tree is easy to interpret induction is computationally inexpensive

2 cons of decision tree

* greedy approach does not guarantee best solution * rectilinear decision boundaries

What is linear regression?

a statistical method that allows us to summarise the relationship between two continuous variables

In linear regression what 3 names can the X variable be called?

1. predictor 2. explanatory 3. independent variable

In linear regression what 3 names can the Y variable be called?

1. response 2. outcome 3. dependent variable

In one dimension a hyperplane is called a

point

In two dimensions a hyperplane is called a

line

In one dimensions a hyperplane is called a

plane

In 4 or more dimensions a hyperplane is called a

hyperplane

The goal of a SVM is to find the...

optimal separating hyperplane which maximizes the margin of the training data

3 pros of SVM

1. works well with a clear margin of separation 2. effetive in high dimensional spaces 3. effective in cases where number of dimensions is greater than the number of samples

3 cons of SVM

1. doesn't work well with large data set 2. doesn't perform very well with noisy data 3. doesn't provide probability estimates

feed forward NN indicates there are

no loops in the network

feedback NN

is also known as a recurrent neural network

4 pros of NN

* can be trained directly on data with thousands of input variables * once trained predictions are fast * good for complex problems (image recognition) * out-performs other models with high quality labelled data

4 cons of NN

* black box * training is computationally expensive * suffers from interference where new data causes to forget old data * often abused where simpler solutions such as linear regression would be best

What distance calucation does K means clustering use?

Euclidean distance

What does the K in K-Means represent?

The amount of clusters to divide into

What type of data does K-means clustering work with?

Continuous

What does K-means cluster try to impove?

The inter group simliarity while keeping the groups as far as possible from each other

Week 7 + 8: Machine Learning Flashcards

(46 cards)