Lecture 5 - Machine Learning Flashcards
What is machine learning?
Making decisions or predictions from patterns observed in data gives “computers the ability to
learn without being explicitly programmed.”
[Arthur Samuels, 1959]
People will often talk about the generated code as…
being a “model” a “classifier” or a “regressor”.
Machine learning is…
Machine learning is applying an algorithm to data to produce a model that makes predictions.
What is supervised learning?
where we know the answer we want for a large number of samples and we want to learn to predict the answer for new ones
What is unsupervised learning?
Where we don’t have any particular answers in mind (much more useful for exploratory analysis)
What is the attribute we want to learn or predict?
This is called the target variable or often just y.
What are the other variables called in a machine learning algorithm.
All the other columns are often called X.
Data with an included target are called ?
“labelled data” (sometimes also “ground truth”)
Data without an included target are called?
“unlabelled” and are mostly useless for supervised learning
In supervised learning we’ll use the labelled training data to?
learn from
We define a “cost function” which
helps us define how
“wrong” an answer might be in order to decide how much to “correct” later predictions
Examples of supervised machine learning models
Given an email as input, classify it as spam or not-spam
Given data about a house, predict its sale price
Given a date, predict that day’s rainfall
Given a website visitor, predict if they’re likely to sign up
Given a network traffic stream, predict if it’s normal or malicious
Two major types of problems we solve with ML?
Classification - Given two (or more) classes, which class does each sample belong to Regression - Given an input, predict a continuous output variable (e.g. temperature, house price)
A “decision boundary”…
Threshold value or tipping point above which we will classify values into class 1 and below which we classify values into class 2
A line as a decision boundary in
2-dimensions