Lecture 1 Flashcards
Machine Learning
the field of study that gives computers the ability to learn without being explicitly programmed
Supervised Learning
An algorithm maps a new input to an output based on example input-output pairs of the training data.
Unsupervised Learning
Only the input data is known, and no known output data is given to the algorithm
Accuracy
the fraction of inputs for which the right output was predicted
Training Data
Data used to build a machine learning model
Test Data
Data used to assess how well the model works
How is a z-score computed?
Subtracting the mean and dividing by the standard-deviation
Reinforcement Learning
Involves reasoning under uncertainty and how agents take actions to maximize their reward
Semi-supervised Learning
Involves a small portion of labeled examples and a large number of unlabeled examples from which a model must learn and make predictions on new examples
Active Learning
A learning algorithm can interactively query a user to label new data points with the desired outputs
Model
An equation that links the values of some features to the predicted value of the target variable
Score functions/Fit statistics/Score metrics
measures of how well
the model fits the data
Feature selection
reducing the number of predictors by selecting the important ones (dimensionality reduction)
Feature extraction
reducing the number of predictors by means of a mathematical operation (e.g., PCA)
Model Building
finding the equation of the model and the coefficients in it
What are two typical tasks for Machine Learning?
- Prediction (supervised learning)
2. To learn something previously unknown (unsupervised learning)
What are the two main types of Supervised Learning?
Classification and Regression
Classification
A discrete output such as color, gender, yes/no, class membership
question example: “Will you pass this course?”
Regression
A continuous output like temperature, age, distance, salary
question example: “How many points will you get in the exam?”
Preprocessing
Cleaning and/or transforming the data
When do machine learning algorithms not preform well?
When the input numerical attributes have a very different scale
Standard Scaler
z-scores or standard scores where the mean is 0 and the standard deviation is 1
What type of data does the Standard Scaler work well with?
it’s a common method in data normalization so it’s good for non-skewed data
Robust Scaler
The median is 0 and the interquartile range is 1
What type of data does the Robust Scaler work well with?
It’s better for skewed data because it deals better with outliers
MinMax Scaler
Shifts data to an interval set by Xmin and Xmax.
Formula:
Xnew = (x - Xmin) / (Xmax - Xmin)
What happens when you log scale your data?
You get a better prediction accuracy
Normalizer
Each row of the data is rescaled so that its norm becomes 1. Doesn’t work by feature (column) and is only used when the direction of the data matters
What type of graph is a normalizer helpful for
histograms
Binning
Separating the feature values into n categories. You can replace all the values within each category with a single value like their mean
What is Binning effective for?
Models with few parameters like regression models
What is Binning not effective for?
Models with many parameters like decision trees
Cross-validation
Evaluates the model’s ability to predict new data; detects overfitting or selection bias
Feature
properties that describe data points
Sample/Instance
a data point; each entity or row in the data
Pipeline
The end-to-end construct that orchestrates the flow of data into, and output from, a machine learning model
Clustering
A type of unsupervised learning where the algorithm finds natural groups or clusters in data
Feature Vector
A vector listing all the feature values
Feature Value
The value of a property or feature of the data point/instance, e.g. white, 66, yes
Features
An individual measurable property or characteristic of a data point, e.g. color, age, is rich
True or false: Classification problems can be used to predict only two discrete valued output such as 0 and 1.
False
Classification can be used for an arbitrary number of classes, not just 2.
Which scaling method results in a range between 0 and 1?
Min-Max scaler
The standard scaler uses z-scores. How do you compute z-scores?
Subtracting the mean and dividing by the standard-deviation