Machine learning Flashcards by Hana Mahmoud

What is machine learning?

Computer system or software learns by itself by developing models and training them to predict future outputs

How well did you know this?

Not at all

Perfectly

What is supervised learning?

Infers a function from labelled training data

How well did you know this?

Not at all

Perfectly

What is unsupervised learning?

Infers a function from unlabelled data

How well did you know this?

Not at all

Perfectly

What is reinforcement learning?

Learns over time via trial and error using feedback
- Award from actions

How well did you know this?

Not at all

Perfectly

Give some examples of supervised learning

Linear regression
Decision tree
Artificial neural networks

How well did you know this?

Not at all

Perfectly

Give some examples of unsupervised learning

Clustering
Association rules

How well did you know this?

Not at all

Perfectly

What is top-down machine learning?

Model different functions and wire them together
- Deduction

How well did you know this?

Not at all

Perfectly

What is bottom up machine learning

Give the system lots of data so it can discover the concepts by itself
- Induction

How well did you know this?

Not at all

Perfectly

How does supervised learning work?

Data pre-processing
Partition data into training and testing
Train model

How well did you know this?

Not at all

Perfectly

How does unsupervised learning work?

Data pre-processing
Clustering or association technique

How well did you know this?

Not at all

Perfectly

How does clustering work?

Choose number of clusters K
Initialise K cluster centroids randomly
Repeat steps 3 and 4
Assign each data point to the nearest cluster
Update centroids by computing the mean of all the data points assigned
Output final cluster assignments and centroids

How well did you know this?

Not at all

Perfectly

How does association work?

Discover correlation between two or more variables
Produce dependancy rules to predict occurrence of x with y

How well did you know this?

Not at all

Perfectly

What are the three pillars that machine learning is built from

Models and algorithms
Powerful and cheap computation
Massive data warehouses

How well did you know this?

Not at all

Perfectly

What is data mining?

The exploration and analysis of large quantities of data to discover valid, novel, useful and understandable patterns in data

How well did you know this?

Not at all

Perfectly

What is the difference between machine learning and data mining?

Machine learning predicts using models
Data mining explains patterns

How well did you know this?

Not at all

Perfectly

What is regression?

Study These Flashcards

A relationship between variable Y and variable X

How do we describe a linear regression model?

Study These Flashcards

An underfitted model
A good model
An overfitted model

What is meant by underfitting a model?

Study These Flashcards

A model which doesn’t caputre any logic
- High loss
- Low accuracy

What is meant by a good model?

Study These Flashcards

Caputres the underlying logic of the dataset
- low loss
- high accuracy

What is meant by an overfitted model?

Study These Flashcards

Caputures all the noise, so “misses the point”. Over complex with lots of parameters
- low loss
- low accuracy

How may overfitting occur?

Study These Flashcards

Training data size is too small
-> take more samples (could use deeplearning GANs to do this)

How may underfitting occur?

Study These Flashcards

Model is too simple, too little parameters
-> more training time or input features

What are the advantages of regression?

Study These Flashcards

Short training time
Easy to interpret
Easy to implement

What are the disadvantages of regression?

Study These Flashcards

Sensitive to noise and outliers (overfitting)
Cannot handle complicated relationships (linear only)

What two data type can a label be?

- Categorical label - Continuous label

What is categorical data

Data which can be sorted into groups/ categories - Classification - good or bad

What is continuous data?

Data which can take any value - Regression - probability

What are the components of a decision tree?

- Internal nodes Features (decision variables, inputs) - Branches Course of decision or action - Leaf nodes A predicted class label (output)

How do you train a decision tree?

Iteratively partition the decision space - What values to split on? - What features to split on?

What are the advantages of decision trees?

- Reasonable training time - Easy to interpret - Easy to implement - Can handle large number of features

What are the disadvantages of decision trees?

- Over-complex trees lead to over fittting - Cannot handle complicated relationships - Only simple decision boundaries - Problems occur when there is lots of missing data

What is a neural network?

A set of neurons connected by directed, weighted edges

What are the advantages of a neural network?

- Can handle a large number of features - Can be more accurate - Can learn more complicated class boundaries

What are the disadvantages of a neural network?

- Overfitting of data - Hard to implement - Slow training time - Hard to interpret

Give a practical example of overfitting

Develop a model to recognise letters from peoples handwriting. Model is trained with a small sample of people so cannot recognise other peoples handwriting

Give a practical example of underfitting

Develop a model to predict housing costs with number of bedrooms. There are many other factors like number of bathrooms, location etc so would not predict correctly.

Machine learning Flashcards

(36 cards)