Week 7 + 8: Machine Learning Flashcards

1
Q

What is machine learning

A

learning from data without previous programming used to discover hidden patterns/trends enables data driven decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 categories of machine learning

A
  1. Classification
  2. Regression
  3. Clustering
  4. Association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classification is used to predict a…

A

category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression is used to predict a…

A

numeric value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster analysis is used to

A

organise simliar items into groups eg customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Association analysis is used to…

A

capture assocations between items or events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name 2 supervised machine learning techniques

A
  1. Classification
  2. Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name 2 unsupervised machine learning techniques

A
  1. Clustering
  2. Association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is supervised machine learning

A

Where you have input variables an output variable using an algorithm to learn the mapping function between the input and the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4 examples of supervised machine learning algorithms

A
  1. KNN
  2. Decision tree
  3. Linear Regression
  4. SVM (Support vector machines)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is unsupervised machine learning? It is where you only have…

A

input data and not corresponding output variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the goal of unsupervised machine learning?

A

To model the underlying structure or distribution in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 examples of unsupervised machine learning algorithma

A
  1. k-means clustering
  2. apriori for association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

kNN is used to

A

classify a sample based on its neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is k in kNN? The value of k determines…

A

the number of nearest neighbors to consider

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

kNN 4 distance metrics

A
  1. Euclidean Distance
  2. City Block Distance
  3. Chi square distance
  4. Cosine distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

2 pros of kNN

A
  1. No separate training phase
  2. Can generate complex decision boundaries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

2 cons of kNN

A
  1. Can be susceptible to noise
  2. Can be slow, since distance is recalculated each time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is decision tree ? It is a..

A

hierarchical structure with nodes and directed edges

20
Q

Name 3 parts of decision tree

A
  1. Root node - node at the top
  2. Internal nodes - in between
  3. Leaf nodes - nodes at the bottom
21
Q

A decision tree classification decision is made by

A
  • traversing the decision tree from the root node
  • answer to the test condition determines the branch when leaf node is reached
  • the category at the leaf node is the classifiction
22
Q

What is decision tree - depth of a node

A

number of edges from the root node to that node

23
Q

What is decision tree - depth of a decision tree

A

number of edges in the longest path from the root node to the leaf node

24
Q

What is decision tree - size of a decision tree

A

the number of nodes in the tree

25
Q

When to stop splitting a node?

A
  • All samples in the node have the same class label
  • Max tree depth is reached
  • change in impurity is reached
26
Q

2 pros of decision tree

A
  • resulting tree is easy to interpret induction is computationally inexpensive
27
Q

2 cons of decision tree

A
  • greedy approach does not guarantee best solution
  • rectilinear decision boundaries
28
Q

What is linear regression?

A

a statistical method that allows us to summarise the relationship between two continuous variables

29
Q

In linear regression what 3 names can the X variable be called?

A
  1. predictor
  2. explanatory
  3. independent variable
30
Q

In linear regression what 3 names can the Y variable be called?

A
  1. response
  2. outcome
  3. dependent variable
31
Q

In one dimension a hyperplane is called a

A

point

32
Q

In two dimensions a hyperplane is called a

A

line

33
Q

In one dimensions a hyperplane is called a

A

plane

34
Q

In 4 or more dimensions a hyperplane is called a

A

hyperplane

35
Q

The goal of a SVM is to find the…

A

optimal separating hyperplane which maximizes the margin of the training data

36
Q

3 pros of SVM

A
  1. works well with a clear margin of separation
  2. effetive in high dimensional spaces
  3. effective in cases where number of dimensions is greater than the number of samples
37
Q

3 cons of SVM

A
  1. doesn’t work well with large data set
  2. doesn’t perform very well with noisy data
  3. doesn’t provide probability estimates
38
Q

feed forward NN indicates there are

A

no loops in the network

39
Q

feedback NN

A

is also known as a recurrent neural network

40
Q

4 pros of NN

A
  • can be trained directly on data with thousands of input variables
  • once trained predictions are fast
  • good for complex problems (image recognition)
  • out-performs other models with high quality labelled data
41
Q

4 cons of NN

A
  • black box
  • training is computationally expensive
  • suffers from interference where new data causes to forget old data
  • often abused where simpler solutions such as linear regression would be best
42
Q

What distance calucation does K means clustering use?

A

Euclidean distance

43
Q

What does the K in K-Means represent?

A

The amount of clusters to divide into

44
Q

What type of data does K-means clustering work with?

A

Continuous

45
Q

What does K-means cluster try to impove?

A

The inter group simliarity while keeping the groups as far as possible from each other

46
Q
A