Week 6 - Machine Learning I Flashcards
Define Artificial Intelligence (AI)
AI is the science and engineering of making intelligent machines
Define Machine Learning (ML)
ML is a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Define Deep Learning
Deep Learning uses computational structures known as ‘neural networks’ to automatically recognise patterns in data and provide a suitable output, such as a prediction or evidence for a decision
Define Natural Language Processing (NLP)
NLP could be as simple as counting word frequencies to compare different writing styles, to ‘understading’ complete human utterances
What are the types of machine learnings?
- Supervised learning
- Unsupervised learning
- Reinforcement learning
What is supervised machine learning?
Supervised machine learning takes a set of input features (predictors) and output variables (i.e., labelled) and learns a mapping function between input and output
What is unsupervised machine learning?
Unsupervised machine learning learns patterns from unlabeled data (without the explicit output variable)
What is reinforcement machine learning?
The agent (learning system) learns to perform a task by interacting with an unknown environment and learns a policy that maximises the reward. Dialogue generation and machine translation are application areas in NLP for reinforcement learning.
What is the typical process of Machine Learning?
- Obtains the data
- Goes through preprocessing
- Input features
- Output
- Machine learning
What are examples of preprocessing?
Preprocessing includes:
1. converted text to lower case
2. special characters and stopwords are removed
3.removal of URLSs, numbers, punctuation, stopwords, whitespace and stemming
What are examples of input features?
Input features would include:
1. Bag of Words
2. TF - IDF
3. TF - IDF with ngrams
What are examples of output features?
Output features would include:
1. Recruitment and not-recruitment
2. Predatory and non-predatory
What are examples of Machine Learning processes?
- Naive Bayes
- Support vector machine
- Logistic regression
- Neural networks
How many sub-classes of supervised learning are there?
There are two.
Classification - training dataset with input features and discrete output (target or class variables). The goal of classification is to learn the mapping function to map the input to the discrete output
Regression - predict the real or continuous value of output (target variable) given a set of input features called predictors
What classifies as binary classification and multiclass classification?
Binary classification - deals with two possible classes (e.g., spam or not spam, fraudulent or legitimate, hate or not-hate, hate-speech or normal-speech)
Multiclass classification - deals with three or more classes (e.g., fake-news, partially fake, true, unknown // identity theft, cyberstalking, fraudulent sales, legitimate)
What’s the key challenge of supervised learning?
- learn from the data (training data)
- How does it work on ‘unseen’ data
What is a optimal model in ML terms and what are its key challenges?
- Underfitting - too simple model, and it fails to capture the relationship between input and output variables. High training error on training and high test error on unseen (new) data
- Overfitting - low training error and high test error
- Optimal fitting - low training error and low test error
What is the process of applied machine learning in practice 1?
- Problem statement - identify the appropriate task for ML
- Data acquisition
- Clean and pre-process data
What is the process of applied machine learning in practice 2?
- Feature engineering and selection of features
- Select appropriate machine learning algorithms
- Train the model
- Model validation and evaluation: tune the hyperparameters and evaluate the performance
- Interpret the results so you can deploy the model
How can you choose the best algorithm for the task?
You can use the algorithm cheat-sheet in which it looks at which area of data it should be in: classification/ clustering/ regression/ dimensionality reduction
What is Naive Bayes?
Naive Bayes is a simple and fast supervised machine learning algorithm used for classification (binary and multiclass).
This is known as a probabilistic classifier and applies the bayes theorem.
What is Support Vector Machine (SVM)?
SVM is used for classification, regression and outlier detection (one class SVMs) and it performs well for complex (high-dimensional feature) and on small-medium size datasets
What is the goal of SVM?
SVM’s goal is to have the maximum margin hyperplane that provides the largest distance between the two classes and new observations are predicted based on the side of the hyperplane they fall to
How to test if the model in R performs ell on unseen (new) data?
You should split your dataset into training and test sets. Then you test the performance of the model on the new data.
How do you split the dataset into train and test sets?
If we have a large dataset (e.g., 500,000) we can choose a smaller split, but with smaller datasets, we may need to choose a larger split for the test set
Why is it important to test the model?
Testing the model to determine how well it generalises to new cases using test data
What are the issues with single-train test splits?
- Model performance could be really dependent on the training data
- Would like to be more careful
- Cross-validation techniques can be used to validate the generalisation ability of the machine learning models
- Cross-validation can be used on binary and multi-classification tasks
What is k-fold cross validation?
K-fold cross validation is one of the most widely used cross validation techniques for classification models.
It splits the data randomly into k number of folds and at each iteration (from 1-k) use k-1 folds for training and remaining 1-fold is used for testing
Normally this is done on training data, and make a final evaluation on an unseen test set
How can you compare the performance of multiple classifiers?
You can use the following performance metrics:
1. Precision/recall trade-off
2. F1 Score (F-score/F-measure)
3. Recall metric
4. Precision metric
5. Accuracy metric
6. Confusion matrix
7. ROC (receiver operating characteristics)
8. AUC (area under the curve)
What is performance metrics?
The metrics used for measuring and evaluating the performance of machine learning models
What is the confusion matrix?
The confusion matrix is a table that helps to better visualize the performance of classifiers. More concise metrics can be computer from the confusion matrix (accuracy, precision, recall and f1-score).
Look at nb for matrix
What is the accuracy metric?
Using the confusion matrix, you can calculate the accuracy of a classifier as:
accuracy = (TP + TN)/ (TP + FP + FN + TN)
This is a widely used metric but can be misleading if used on the imbalanced dataset (a common situation for real-world problems)
Note: avoid using accuracy metrics alone for classification problems
What is the precision metric?
The ratio of correctly classified observations to the total observations predicted to be positive. Usually they emphasise the accuracy of positive predictions.
precision = TP / (TP + FP)
What is the recall metric?
The recall metric is the ratio of correctly classified observations to the total observations that are in fact positive. This is also known as sensitivity.
recall = TP/TP + FN
What is F1 Score (F-score/F-measure)?
F1 score incorporates both recall and precision. The harmonic mean of the precision and recall will (1) give more weight to lower values than the arithmetic mean (2) F1 is high if both the recall and precision are high
F1 score = 2(precisionrecall)/precision + recall
What is the relationship for a Recall Trade-off?
By increasing the precision reduces recall and vice versa, known as the precision/recall trade-off.
Hence, we can denote the relationship as:
Classifier A: low recall, higher precision
Classifier B: high recall, lower precision
What is Receiver Operating Characteristics (ROC)?
This is used to show the performance of binary classification. This displays the tradeoff between True Positive Rate (TRP) and FPR (False Positive Rate) for various thresholds
Define and formulate: TPR and FPR
TPR (sensitivity, recall): ratio of positive classes that were classified correctly
sensitivity = TP / TP + FN
FPR (1-specificity): specificity is the ratio of the negative class that were classified correctly
specificity = TN / TN + FP
What is class probabilities?
Class probabilities is a threshold that can be used to assign probability values to classes (obtain the score for the ROC curve). Each instance in the classifier has a class probability (between 0 and 1).
A representation (visual) is good to show all the possible values across.
Which classifier is better? Do this in terms of AUC.
Measure the area under the curve (AUC) which is a value between 0 and 1.
A perfect classifier will have an AUC = 1
Purely random classifier will have an AUC = 0.5