Lecture 1 Flashcards by Andy Rice

Machine Learning

the field of study that gives computers the ability to learn without being explicitly programmed

How well did you know this?

Not at all

Perfectly

Supervised Learning

An algorithm maps a new input to an output based on example input-output pairs of the training data.

How well did you know this?

Not at all

Perfectly

Unsupervised Learning

Only the input data is known, and no known output data is given to the algorithm

How well did you know this?

Not at all

Perfectly

Accuracy

the fraction of inputs for which the right output was predicted

How well did you know this?

Not at all

Perfectly

Training Data

Data used to build a machine learning model

How well did you know this?

Not at all

Perfectly

Test Data

Data used to assess how well the model works

How well did you know this?

Not at all

Perfectly

How is a z-score computed?

Subtracting the mean and dividing by the standard-deviation

How well did you know this?

Not at all

Perfectly

Reinforcement Learning

Involves reasoning under uncertainty and how agents take actions to maximize their reward

How well did you know this?

Not at all

Perfectly

Semi-supervised Learning

Involves a small portion of labeled examples and a large number of unlabeled examples from which a model must learn and make predictions on new examples

How well did you know this?

Not at all

Perfectly

Active Learning

A learning algorithm can interactively query a user to label new data points with the desired outputs

How well did you know this?

Not at all

Perfectly

Model

An equation that links the values of some features to the predicted value of the target variable

How well did you know this?

Not at all

Perfectly

Score functions/Fit statistics/Score metrics

measures of how well

the model fits the data

How well did you know this?

Not at all

Perfectly

Feature selection

reducing the number of predictors by selecting the important ones (dimensionality reduction)

How well did you know this?

Not at all

Perfectly

Feature extraction

reducing the number of predictors by means of a 
mathematical operation (e.g., PCA)

How well did you know this?

Not at all

Perfectly

Model Building

finding the equation of the model and the coefficients in it

How well did you know this?

Not at all

Perfectly

What are two typical tasks for Machine Learning?

Prediction (supervised learning)

2. To learn something previously unknown (unsupervised learning)

How well did you know this?

Not at all

Perfectly

What are the two main types of Supervised Learning?

Classification and Regression

How well did you know this?

Not at all

Perfectly

Classification

Study These Flashcards

A discrete output such as color, gender, yes/no, class membership

question example: “Will you pass this course?”

Regression

Study These Flashcards

A continuous output like temperature, age, distance, salary

question example: “How many points will you get in the exam?”

Preprocessing

Study These Flashcards

Cleaning and/or transforming the data

When do machine learning algorithms not preform well?

Study These Flashcards

When the input numerical attributes have a very different scale

Standard Scaler

Study These Flashcards

z-scores or standard scores where the mean is 0 and the standard deviation is 1

What type of data does the Standard Scaler work well with?

Study These Flashcards

it’s a common method in data normalization so it’s good for non-skewed data

Robust Scaler

Study These Flashcards

The median is 0 and the interquartile range is 1

What type of data does the Robust Scaler work well with?

It's better for skewed data because it deals better with outliers

MinMax Scaler

Shifts data to an interval set by Xmin and Xmax. Formula: Xnew = (x - Xmin) / (Xmax - Xmin)

What happens when you log scale your data?

You get a better prediction accuracy

Normalizer

Each row of the data is rescaled so that its norm becomes 1. Doesn't work by feature (column) and is only used when the direction of the data matters

What type of graph is a normalizer helpful for

histograms

Binning

Separating the feature values into n categories. You can replace all the values within each category with a single value like their mean

What is Binning effective for?

Models with few parameters like regression models

What is Binning not effective for?

Models with many parameters like decision trees

Cross-validation

Evaluates the model's ability to predict new data; detects overfitting or selection bias

Feature

properties that describe data points

Sample/Instance

a data point; each entity or row in the data

Pipeline

The end-to-end construct that orchestrates the flow of data into, and output from, a machine learning model

Clustering

A type of unsupervised learning where the algorithm finds natural groups or clusters in data

Feature Vector

A vector listing all the feature values

Feature Value

The value of a property or feature of the data point/instance, e.g. white, 66, yes

Features

An individual measurable property or characteristic of a data point, e.g. color, age, is rich

True or false: Classification problems can be used to predict only two discrete valued output such as 0 and 1.

False | Classification can be used for an arbitrary number of classes, not just 2.

Which scaling method results in a range between 0 and 1?

Min-Max scaler

The standard scaler uses z-scores. How do you compute z-scores?

Subtracting the mean and dividing by the standard-deviation

Lecture 1 Flashcards

(43 cards)