Week 7 + 8: Machine Learning Flashcards

1
Q

What is machine learning

A

learning from data without previous programming used to discover hidden patterns/trends enables data driven decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 categories of machine learning

A
  1. Classification
  2. Regression
  3. Clustering
  4. Association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classification is used to predict a…

A

category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression is used to predict a…

A

numeric value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster analysis is used to

A

organise simliar items into groups eg customers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Association analysis is used to…

A

capture assocations between items or events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name 2 supervised machine learning techniques

A
  1. Classification
  2. Regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name 2 unsupervised machine learning techniques

A
  1. Clustering
  2. Association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is supervised machine learning

A

Where you have input variables an output variable using an algorithm to learn the mapping function between the input and the output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4 examples of supervised machine learning algorithms

A
  1. KNN
  2. Decision tree
  3. Linear Regression
  4. SVM (Support vector machines)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is unsupervised machine learning? It is where you only have…

A

input data and not corresponding output variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the goal of unsupervised machine learning?

A

To model the underlying structure or distribution in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 examples of unsupervised machine learning algorithma

A
  1. k-means clustering
  2. apriori for association analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

kNN is used to

A

classify a sample based on its neighbors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is k in kNN? The value of k determines…

A

the number of nearest neighbors to consider

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

kNN 4 distance metrics

A
  1. Euclidean Distance
  2. City Block Distance
  3. Chi square distance
  4. Cosine distance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

2 pros of kNN

A
  1. No separate training phase
  2. Can generate complex decision boundaries
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

2 cons of kNN

A
  1. Can be susceptible to noise
  2. Can be slow, since distance is recalculated each time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is decision tree ? It is a..

A

hierarchical structure with nodes and directed edges

20
Q

Name 3 parts of decision tree

A
  1. Root node - node at the top
  2. Internal nodes - in between
  3. Leaf nodes - nodes at the bottom
21
Q

A decision tree classification decision is made by

A
  • traversing the decision tree from the root node
  • answer to the test condition determines the branch when leaf node is reached
  • the category at the leaf node is the classifiction
22
Q

What is decision tree - depth of a node

A

number of edges from the root node to that node

23
Q

What is decision tree - depth of a decision tree

A

number of edges in the longest path from the root node to the leaf node

24
Q

What is decision tree - size of a decision tree

A

the number of nodes in the tree

25
When to stop splitting a node?
* All samples in the node have the same class label * Max tree depth is reached * change in impurity is reached
26
2 pros of decision tree
* resulting tree is easy to interpret induction is computationally inexpensive
27
2 cons of decision tree
* greedy approach does not guarantee best solution * rectilinear decision boundaries
28
What is linear regression?
a statistical method that allows us to summarise the relationship between two continuous variables
29
In linear regression what 3 names can the X variable be called?
1. predictor 2. explanatory 3. independent variable
30
In linear regression what 3 names can the Y variable be called?
1. response 2. outcome 3. dependent variable
31
In one dimension a hyperplane is called a
point
32
In two dimensions a hyperplane is called a
line
33
In one dimensions a hyperplane is called a
plane
34
In 4 or more dimensions a hyperplane is called a
hyperplane
35
The goal of a SVM is to find the...
optimal separating hyperplane which maximizes the margin of the training data
36
3 pros of SVM
1. works well with a clear margin of separation 2. effetive in high dimensional spaces 3. effective in cases where number of dimensions is greater than the number of samples
37
3 cons of SVM
1. doesn't work well with large data set 2. doesn't perform very well with noisy data 3. doesn't provide probability estimates
38
feed forward NN indicates there are
no loops in the network
39
feedback NN
is also known as a recurrent neural network
40
4 pros of NN
* can be trained directly on data with thousands of input variables * once trained predictions are fast * good for complex problems (image recognition) * out-performs other models with high quality labelled data
41
4 cons of NN
* black box * training is computationally expensive * suffers from interference where new data causes to forget old data * often abused where simpler solutions such as linear regression would be best
42
What distance calucation does K means clustering use?
Euclidean distance
43
What does the K in K-Means represent?
The amount of clusters to divide into
44
What type of data does K-means clustering work with?
Continuous
45
What does K-means cluster try to impove?
The inter group simliarity while keeping the groups as far as possible from each other
46