Machine Learning Basics Flashcards

Taken from various sources inc: https://d2wvfoqc9gyqzf.cloudfront.net/content/uploads/2018/09/Ng-MLY01-13.pdf

You may prefer our related Brainscape-certified flashcards:
1
Q

What is Machine Learning?

A

The art and science of giving computers the ability to learn to make decisions from data (improve at a task based on experience) without being explicitly programmed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name three types of Machine Learning categories

A

Supervised Learning
Unsupervised Learning
Reinforcement Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Unsupervised Learning?

A

Uncovering hidden patterns from unlabelled data e.g. grouping customer into distinct categories (clusters) that were unknown before and are hopefully meaningful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Reinforcement Learning

A

Software agents interact with an environment and try to find the most efficient pathway to a goal, learn how to optimise behaviour.

Given a system of rewards and punishments. Sucessful routes get reward, failures are restarted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Supervised Learning?

A

The Machine Learning model trains on sets (variables/features) of labelled training data (target variable) then predicts the labels (target variable) of subsequent testing datasets for unseen, often through multiple iterations.
Also called Predictive Data Analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Exploratory Data Analysis (EDA)?

A

Data analysis which performs initial explore of data, using mostly graphical techniques, to gain insight into nature of data and structure. What are important variables and outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Who codified EDA practice?

A

John Tukey in 1970s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Complete the following …“Science does not begin with a tidy question …” (EDA)

A

“… nor does it end with a tidy answer”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What did Tukey refer to EDA as?

A

Detective work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name some EDA techniques

A

Plot data norms: level of distribution, measures of central tendency: mode, median, mean

Range of spread of distribution: Standard Deviations, Percentiles, Quartiles

Relationships between variables/features in datasets/observations

Investigate trends for variables over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Data Wrangling

A

A process that occurs during the Data Preparation stage.

Take messy, incomplete data or data that is too complex and simplify and/or clean it so that it’s useable for analysis

Remove or impute missing values
Convert categorical to numeric
Standardise/Normalise data
Clean data
Join data together
Generate new fields

Overlap with Feature Engineering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Feature Engineering?

A

Taking whatever information you have about your problem and turning it into a usable numeric format that you can use to build your feature matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does Machine Learning work?

A

Use data to form a hypothesis, new data exposed errors in your hypothesis so the error gap is measured and hypothesis is adjusted to fit. Aim to get the error gap as low as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name some types of Feature Engineering.

A

Converting Categorical features to numeric - could use one-hot encoding
Encode images to pixel representation
Impute missing data - fill Nan with mean of column
Build Feature Pipeline to chain together above tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do Machine Learning Algorithms do?

A

Algorithms learn a pattern inherent in existing data. These patterns can be used to make predictions about data that has not yet been analysed. This pattern, or model, is much smaller than the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the Machine Learning lifecycle.

A

Derive pattern/data model using training data and algorithm
Check model using test data
Use formal process to check accuracy of model
Apply model to new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Dimensional Reduction in terms of Feature Engineering?

A

No of dimension too high = too long to process data/produce model
Some dimensions may not be of use

Can either just throw away dimension - use intuition
Employ Dimension Reduction techniques: Decision Trees, Principal Component Analysis (PCA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Principal Component Analysis (PCA)?

A

PCA is a feature extraction technique for reducing the dimension of a feature space (curse of dimensionality), so that there are fewer relationships between features to consider, and => less likely to overfit model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Clustering in terms of Unsupervised learning?

A

Finding islands of similarity in complex data sets.
Uniting singular points into distinct groups or clusters.
Examining data and assembling data points into sluters based on a measure of distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Describe the K-Means Clustering algorithm

A

Unsupervised method utilising clustering.

  1. Choose No of clusters (K) to be used by algorithm (Scree plot)
  2. Randomly plot K cluster centre points as start position
  3. Assign each point to nearest centriod
  4. Update position of centriods to reflect new centre/average location of data points
  5. Repeat 3+4 until no new data assignment occurs.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe Association Rules algorithm?

A

Unsupervised method which uncovers how items are associated with each other e.g. shopping patterns. parent buys healthly ingredients for family; single males buy beer and crisps. Understanding patterns helps to increase sales.

22
Q

What are the two main types of Supervised Learning?

A

Regression - target (predicted) variable is continuous

Classification - target (predicted) variable consists of categories

23
Q

Describe K-Nearest Neighbours (K-NN) algorithm

A

Supervised Classification method that classifies a data point based on the classification of its neighbours. K indicates the No of nearest neighbour data points to include in majority voting process.
Choosing K is parameter tuning

24
Q

Describe Linear Regression algorithm

A

*** Supervised Regression method based on Linear Algebra. Line of best fit.

Use to predict continuous values
Regression implies that one of the variables is dependent on the other => different to correlation

25
Q

Describe Support Vector Machine (SVM) algorithm

A

Supervised Classification method to derive optimal boundaries separating groups.
Identify peripheral data points (support vectors) located closest to points from other group, then optimal boundary drawn down middle.
Fast method - less time required for computation as other support vectors used to derive boundary.
However, sensitive to position of support vectors

26
Q

What is Parameter Tuning in context of Machine Learning?

A

Parameters are options used to tweak an algorithm’s settings. Model’s accuracy suffers when not sufficiently tuned.
Overfitting - mistake random variations as persistent pattern
Underfitting - overlooks underlying patterns

27
Q

Describe Decision Tree algorithm

A

Supervised method that can be used for both Classification and Regression. => sometime referred to as CART (Class and Regress Tree).

A decision tree would predict survival chance through a series of binary questions (Yes/No type questions). Start at top question (root node), and move through tree branches as guided by response until you reach leaf node that indicates survival chance.

28
Q

Why is Evaluation required in Machine Learning?

A

To ensure that any model generated can do the job it has been built for.

  1. Determine a model built for a particular task is most suited to that task
  2. Estimate how the model will perform when deployed
  3. Convince the business for whom model is being developed that it will meet their needs
29
Q

Is there more to evaluation than just measuring model performance?

A

Yes. For a model to be successfully deployed must consider issues such as:

how quickly model makes predictions
how easy it is for human analysts to understand predictions
how easy is it to retrain a model should it go stale

30
Q

How should we define ‘best’ performance when evaluating a model?

A

It depends on the context.
No model will ever be 100% perfect, but there are a range of ways in which models can be incorrect. For medical diagnosis we require very accurate results, and don;t predict a sick patient as healthy.
A model predicting which customers would response to an online ad may not be held to the same strict criteria.

31
Q

When evaluating models what is the most important rule?

A

Don’t use the same data sample to both train a predictive model and then to evaluate/test it

32
Q

When designing evaluation experiments what is hold-out sampling?

A

A hold out set is used to evaluate performance of a model, ensuring it was not used in training the model. Data is randomly allocated to a train and test set, and the performance measured on the test set should be a good indicator of how model will perform on future unseen data. Generally 70/30 split.

33
Q

When designing evaluation experiments what is K-Fold Cross Validation?

A

K-Fold validation is a re-sampling procedure used for limited data samples. Available data is split into K equal sized folds, and K separate evaluation experiments are performed. 1st time, data in the 1st fold is test set and remaining folds are for training. 2nd time, 2nd fold is test and so on.
The performance metrics from each run are aggregated to give a final score.
K=10 has been found to be a good value.

34
Q

What is a Confusion Matrix?

A

A table used to understand performance of a classification model (classifier) on a set of test data for which true values are known. Used as the basis for calculating performance measures.

It calculates frequency of each possible model prediction outcome from a test set.
For prediction problem with binary target feature there are four possible outcomes: TP, TN, FP, FN

35
Q

What is a True Positive?

A

Instance in test set that had a positive target feature value, and predicted to have a positive target feature value.

36
Q

What is a True Negative?

A

Instance in test set that had a negative target feature value, and predicted to have a negative target feature value.

37
Q

What is a False Positive?

A

Instance in test set that had a negative target feature value, and predicted to have a positive target feature value.

38
Q

What is a False Negative?

A

Instance in test set that had a positive target feature value, and predicted to have a negative target feature value.

39
Q

What is the order of a Confusion Matrix?

A

Predicted target value
pos neg
Actual Target pos TP FN
neg FP TN

FP = Type 1 error (False Predicted Positive)
FN = Type 2 error (False Predicted Negative)
40
Q

In Evaluation what are accuracy and misclassification rates (Error Rate)?

A

Accuracy: overall, how often is the classifier correct
Misclassification: overall, how often is it wrong - this is also called the Error Rate.

Accuracy calculated from (TP/TN)/ Total
Misclass = 1 - Accuracy

41
Q

In Evaluation what are problems with replying on accuracy rate?

A

The accuracy paradox. Sometimes accuracy is not a good measure for classifiers in predictive analytics. If there are far more instances of one category over another, say 99%, then predicting every element is in that category will have an accuracy of 99%.

This is known as class imbalance. If used for fraud prediction and only 1% of transactions are fraud, then accuracy of 99% correctly predicting non-fraud is useless.

42
Q

In Evaluation what is the Precision performance measure?

A

% calculated from confusion matrix, and tells us how confident that an instance predicted to have positive target level actually has a positive target level. - when it predicts positive how often is it correct.

Higher % indicates better performance.

TP / (TP + FP)

43
Q

In Evaluation what is the Recall performance measure?

A

% calculated from confusion matrix, and tells us how confident that all instances with a positive target level have been found by the model - when it’s actually positive, how often does it predict positive.

Also called TPR or sensitivity

Higher % indicates better performance.

TP / (TP + FN)

44
Q

In Evaluation what is the F1 performance measure?

A

Useful alternative to simple accuracy rate. Precision and Recall collapsed into single performance measure - harmonic mean of both measures.

45
Q

What is a classifier?

A

Another name for a classification model used in Supervised Learning

46
Q

Should you use terms True Positive/True Negative etc when more than two class in classifier?

A

Not recommended as it can be confusing.

Multi-class (> 2) problems can have confusion matrices and performance metrics, just better not to use TP/TN labels.

47
Q

In Evaluation what is the True Negative Rate (TNR)?

A

Tells us how confident that all instances with a negative target level have been found by the model - when it’s actually negative, how often does it predict negative.

TN / (TN + FP)

48
Q

What is a ROC curve??

A

Receiver Operating Characteristic Curve - ROC Curve.
Commonly used way to visualise performance of a binary classifier.

The curve is plotted using the prediction measurements of TPR (y axis) and FPR (x axis). Plotting performing for each value of the threshold used to separate classes.

49
Q

What constitutes a good ROC curve?

A

A curve that hugs the upper left hand corner.

Imagine a diagonal line bisecting the plot. A curve close to this line is poor and no better than random guessing.

50
Q

What is AUC?

A

Area under the curve (AUC) is a metric (single value) that can measures the area under the ROC curve and can be used to assess models. High value is good.

51
Q

What is the threshold used in classifier predictions?

A

Different classification models produce a prediction score for their target predictions, and a threshold is used to convert into the predicted classification target value. This threshold can be altered to impact the results of the prediction