Machine learning Flashcards

1
Q

What is machine learning?

A

Computer system or software learns by itself by developing models and training them to predict future outputs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is supervised learning?

A

Infers a function from labelled training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is unsupervised learning?

A

Infers a function from unlabelled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is reinforcement learning?

A

Learns over time via trial and error using feedback
- Award from actions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Give some examples of supervised learning

A
  • Linear regression
  • Decision tree
  • Artificial neural networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Give some examples of unsupervised learning

A
  • Clustering
  • Association rules
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is top-down machine learning?

A

Model different functions and wire them together
- Deduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is bottom up machine learning

A

Give the system lots of data so it can discover the concepts by itself
- Induction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does supervised learning work?

A
  1. Data pre-processing
  2. Partition data into training and testing
  3. Train model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does unsupervised learning work?

A
  1. Data pre-processing
  2. Clustering or association technique
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does clustering work?

A
  1. Choose number of clusters K
  2. Initialise K cluster centroids randomly
    Repeat steps 3 and 4
  3. Assign each data point to the nearest cluster
  4. Update centroids by computing the mean of all the data points assigned
  5. Output final cluster assignments and centroids
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does association work?

A
  1. Discover correlation between two or more variables
  2. Produce dependancy rules to predict occurrence of x with y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three pillars that machine learning is built from

A
  1. Models and algorithms
  2. Powerful and cheap computation
  3. Massive data warehouses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is data mining?

A

The exploration and analysis of large quantities of data to discover valid, novel, useful and understandable patterns in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between machine learning and data mining?

A
  • Machine learning predicts using models
  • Data mining explains patterns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is regression?

A

A relationship between variable Y and variable X

17
Q

How do we describe a linear regression model?

A

An underfitted model
A good model
An overfitted model

18
Q

What is meant by underfitting a model?

A

A model which doesn’t caputre any logic
- High loss
- Low accuracy

19
Q

What is meant by a good model?

A

Caputres the underlying logic of the dataset
- low loss
- high accuracy

20
Q

What is meant by an overfitted model?

A

Caputures all the noise, so “misses the point”. Over complex with lots of parameters
- low loss
- low accuracy

21
Q

How may overfitting occur?

A

Training data size is too small
-> take more samples (could use deeplearning GANs to do this)

22
Q

How may underfitting occur?

A

Model is too simple, too little parameters
-> more training time or input features

23
Q

What are the advantages of regression?

A
  • Short training time
  • Easy to interpret
  • Easy to implement
24
Q

What are the disadvantages of regression?

A
  • Sensitive to noise and outliers (overfitting)
  • Cannot handle complicated relationships (linear only)
25
Q

What two data type can a label be?

A
  • Categorical label
  • Continuous label
26
Q

What is categorical data

A

Data which can be sorted into groups/ categories
- Classification
- good or bad

27
Q

What is continuous data?

A

Data which can take any value
- Regression
- probability

28
Q

What are the components of a decision tree?

A
  • Internal nodes
    Features (decision variables, inputs)
  • Branches
    Course of decision or action
  • Leaf nodes
    A predicted class label (output)
29
Q

How do you train a decision tree?

A

Iteratively partition the decision space
- What values to split on?
- What features to split on?

30
Q

What are the advantages of decision trees?

A
  • Reasonable training time
  • Easy to interpret
  • Easy to implement
  • Can handle large number of features
31
Q

What are the disadvantages of decision trees?

A
  • Over-complex trees lead to over fittting
  • Cannot handle complicated relationships
  • Only simple decision boundaries
  • Problems occur when there is lots of missing data
32
Q

What is a neural network?

A

A set of neurons connected by directed, weighted edges

33
Q

What are the advantages of a neural network?

A
  • Can handle a large number of features
  • Can be more accurate
  • Can learn more complicated class boundaries
34
Q

What are the disadvantages of a neural network?

A
  • Overfitting of data
  • Hard to implement
  • Slow training time
  • Hard to interpret
35
Q

Give a practical example of overfitting

A

Develop a model to recognise letters from peoples handwriting. Model is trained with a small sample of people so cannot recognise other peoples handwriting

36
Q

Give a practical example of underfitting

A

Develop a model to predict housing costs with number of bedrooms. There are many other factors like number of bathrooms, location etc so would not predict correctly.