Machine Learning Flashcards

Question 1

Q

Nominal Data (1/4 types of data)

Answer

A

Data that is mutually exclusive, but not ordered (eg. Eye color, sex, type of car, zip codes )

Question 2

Q

Ordinal Data (1/4 types of data)

Answer

A

Corresponds to Categories where order matters but not difference between values. Eg. Letter Grades, Movie Ratings, Pain Level, Cold-warm-hot of coffee cup, gender

Question 3

Q

BNN

Answer

A

Biological Neural Network

Question 4

Q

ANN

Answer

A

Artificial Neural Network

Question 5

Q

Typical Neural Network

Answer

A

[input pattern] → Input Layer → Hidden Layers → Output Layer → [Output pattern]

Input pattern is presented to the input layer. Then the output pattern is returned from the output layer. What happens between the input and output layers is a black box.

Question 6

Q

Sigmoid Activation Function

Answer

A

An S curve from 0 to 1

Question 7

Q

Hyperbolic Tangent Activation Function

Answer

A

an S curve from -1 to 1

Question 8

Q

Ways to Normalize Nominal Values

Answer

A

One-of-n Normalization 2. Equilateral Normalization

Question 9

Q

One-of-n Normalization (aka One-hot encoding )

Answer

A

One way of normalizing Nominal Observations. You have one neuron for each of the output class.

The other way to normalize Nominal Observations is Equilateral Encoding

Question 10

Q

Equilateral Encoding

How it works
Neurons needed

Answer

A

A way of normalizing Nominal Observations.

Floating point numbers is created for each class item with uniform equilateral distance to the other class data items. This allows all output neurons to play a part in each class item and causes an error to affect more neurons than one-of-n encoding (the other way to normalize nominal observations)

Requires one less output neuron than One-of-N normalization

Question 11

Q

Row of a dataset (3)

Answer

A

An Entity
An observation
Instance

Question 12

Q

Group of input variable

Answer

A

Input Vector

Question 13

Q

Columns of a dataset (2)

Answer

A

Features

2. Attributes of the Observation

Question 14

Q

Models vs Algorithms

Answer

A

Model = Algorithm(Data)

Question 15

Q

Field of machine learning that focusing on making predictions

Answer

A

Predictive Modeling - A target function “f” that best maps input variable “X” to output variable “Y”. There is an irreducible error “e”

Y=f(X) + e

We are trying to learn the shape of “f”. Different machine learning algorithms make different assumptions on the shape of “f”. This is why we must try different ML algorithms

Question 16

Q

Parametric ML Algorithms

Answer

A

Parametric Functions make assumptions on the shape of “f” in Y=f(X) + e

Linear ML Algorithms
Logistic Regression
Linear Discriminant Analysis
Perceptron

Advantages are Parametric algorithms are simpler, faster, and require less data to train. Disadvantage are they are constrained, have limited complexity, and a poor fit to map the shape of “f”

Question 17

Q

Non-Parametric ML Algorithms

Answer

A

Do no make assumptions on the shape of the target function.

They are good when you have lots of data and don’t want to worry about choosing all the right features

Examples:
Decision Tree, Neural Networks, Naive Bayes, Support Vector Machines

Question 18

Q

(Dis)Advantages of Non-Parametric ML Algorithms

Answer

A

Advantages

Flexibility - may fit a large number of target functions
Power - no assumptions
performance - Higher prediction performance

Disadvantages:

More data needed
slower
overfitting - more likely to overfit

Question 19

Q

4 common types of Data Modeling problems

Answer

A

Data Classification
Regression Analysis
Clustering
Time Series

Question 20

Q

Data Classification

Answer

A

Try and determine the class the data falls into using Supervised Learning. A class is usually a non-numerical data attribute

Question 21

Q

Regression Analysis

Answer

A

A predictive modeling technique which investigates the relationship between a dependent (target) and independent variable (s). Regression problem is when the output variable is a real value, such as “dollars” or “weight.”

Question 22

Q

Clustering

Answer

A

Clustering algorithms take input data and place it into clusters. The programmer usually speciﬁes the number of clusters to be created before training the algorithm. Because there is no expected output, clustering is considered unsupervised training. If the number of clusters changes, the clustering machine learning method will need to be retrained

Question 23

Q

Temporal Algorithm

Answer

A

Algorithm that accepts input for values that range over time. Algorithms often use a sliding input window and a prediction window.

Question 24

Q

Deterministic Training vs Stochastic Training

Answer

A

Deterministic Training Algorithms always perform the exact same way given the same initial state. No random numbers are used.

Stochastic training uses random numbers to train, so the algorithm trains differently each time

Question 25

Q

Internval Data (1/4 types of data)

Answer

A

Data where the difference between two values is meaningful but the value of zero is arbitrary. Eg. Temperature (in F or C), year

Question 26

Q

Ratio Data (1/4 types of data)

Answer

A

It has properties of interval data but a clear concept of zero.

eg. Age, speed, length, width, volume, mass

Question 27

Q

Supervised Learning - Definition + Types of Problems solved

Answer

A

Your training data has the input and output variables and you are using an algorithm to learn the mapping function f

Y=f(X)

Problems solved: 1) Regression 2) Classification

Ex: Linear Regression ; Random Forest, SVM

Question 28

Q

Unsupervised Learning - Definition and Types of Problems solved

Answer

A

You have input data X and no corresponding output variables with the goal to model the underlying structure to learn more about data. Problems solved: 1. Clustering (grouping of data) 2. Association (rules which describe portions of your data

Algorithms: k-means for clustering ; Apriori algorithms for association rule learning

Question 29

Q

Semi-Supervised Learning - Definition and Types of Problems solved

Answer

A

Some data is labeled but most is unlabeled and a mixture of supervised and unsupervised techniques

Question 30

Q

Types of ML Error (3) - Definition

Answer

A

Bias Error- Simplifying Assumptions made by algorithm to make it easier to solve
Variance Error - Sensitivity of the model to changes in training data
Irreducible Error - Unknown variables influencing the mapping of input to output

Question 31

Q

Power calculations

Answer

A

Helps determine amount of data required for training given expected accuracy/reliability

Question 32

Q

Reinforcement Learning

Answer

A

A computer program interacts with a dynamic environment in which it must perform certain tasks, learning through trial and error as it seeks to achieve it’s goal

Question 33

Q

Linear and Polynomial regression

Answer

A

Regression is concerned with modeling relationship between numerical variables that is iteratively refined using a measure of error in the prediction made by the model. Basic assumption is that the output variable(a numeric value) can be expressed as a combination(weight sum) of numeric input variables

Question 34

Q

Neural Networks - 1) Definition 2) Types of Problems

Answer

A

A large number of highly interconnected processing elements work in unison to solve specific problems, usually classification or pattern-matching problems. Each neuron ‘votes’ on the decision outcome, which might trigger out neurons to vote, and the votes are tallied creating a ranking of the outcomes depending on the support each has received.

Question 35

Q

Decision Trees - 1) Definition 2) Types of Problems

Answer

A

Tree like flowcharts use branching to illustrate every possible outcome of a decision. Most decision trees use binary branching (two options) baed on actual values or attributes of a data.

Types of Problems: 1. Classification 2. Regression

Question 36

Q

Overfitting - 1. Definition 2. Solution

Answer

A

ML model learns both the details and the noise too well at the expense of not generalizing to new data.

If we train too long, the error rate on model keeps dropping but error rate on test data goes up!

Solution: Resampling methods(k-fold cross validation) and held-back validation (hold data to very end - if you have enough)

Question 37

Q

Underfitting 1. Definition 2. Solution

Answer

A

Definition: Failing to learn the problem from the training data sufficiently.

Solution: Try different ML algorithms
Advice: You want to be in middle of overfitting and underfitting

Question 38

Q

Generalization

Answer

A

How well the concepts learned from the model apply to specific examples not seen by the model when it was learning

Question 39

Q

Goodness of Fit

Answer

A

measures used in statistics to estimate how well the approximation of the function matches the target function

Question 40

Q

K-fold cross validation

Answer

A

A cross validation technique used to evaluate model on unseen data

Shuffle the dataset randomly.
Split the dataset into k groups
For each unique group:
3a. Take the group as a hold out or test data set
3b. Take the remaining groups as a training data set
3c. Fit a model on the training set and evaluate it on the test set
Retain the evaluation score and discard the model
Summarize the skill of the model using the sample of model evaluation scores

Question 41

Q

Cross Validation

Answer

A

Cross-validation is a RESAMPLING PROCEDURE used to EVALUATE ML models on a limited data sample. It is primarily used in applied machine learning to ESTIMATE the SKILL of a machine learning model on UNSEEN DATA.

Question 42

Q

Gradient Decent - Definition + Types(2)

Answer

A

An OPTIMIZATION algorithm which can be used with many ML problems. It is used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function. Best used when parameters cannot be estimated analytically(Linear Algebra).

Types: Batch and Stochastic

Question 43

Q

Gradient Decent Steps

Answer

A

Choose Random Coefficients or set to zero
Compute Cost: cost = evaluate(f(coefficient))
Find derivative of cost: delta = derivative(cost)
Change coefficient: coefficient - (learning_rate * delta)
Goto back to step 2; new iteration

Question 44

Q

Batch Gradient Descent

Answer

A

Cost is calculated by looking at entire dataset before updating the coefficients (for each iteration of the algorithm)

Question 45

Q

Stochastic Gradient Descent

Answer

A

Used in situations in which you have too much data.

Cost is calculated by taking the derivative from each training data instance and calculating the update immediately

Question 46

Q

Elements of a Decision

Answer

A

Data (input, training, feedback)
prediction
judgement - determine reward s penalties for each possible outcome
action
Outcome

As prediction becomes cheap due to ML and human prediction will decline in value

Value of Judgement will go up

Question 47

Q

Define: Feature Scaling / Normalization

Common Types:

Answer

A

The goal of normalization is to transform features to be on a similar scale.

Scaling to Range - convert from 0 to 1
Clipping - Capping extreme outliers to a min/max value(ie. limit values to +-3σ
Log Scaling - Compute log of values to compress wide range to a narrow range
Z-Score - Scaling that represents number of standard deviations away from mean
BoxCox

Question 48

Q

Define: Bucketing / Binning + Types (2)

Answer

A

Transforms numeric features into categorical features, using a set of thresholds, is called bucketing (or binning). Needed when there is no linear relationship between the numbers (ie. zip code)

Equal Buckets - Buckets are of equal range
Quartile Buckets - Buckets with equal number of points

Question 49

Q

Feature Vocabulary

Answer

A

Numerical index given to items(unique features) in a category

Question 50

Q

Out of Vocab (OOV)

Answer

A

A catch all category for rare ordinal data in a category (low training data) so that machine won’t waste time training on those categories

Question 51

Q

Rectangular Data

Definition
Another term

Answer

A

A rectangular data object like a spreadsheet or data table

Also called a Data Frame

Question 52

Q

Logistic Regression 1) Types of Problems 2) Algorithm/Process to estimate coefficient

Answer

A

A LINEAR algorithm for a two class BINARY classification problem. It will predict the probability that of an instance belonging to the default class, which can be snapped to 0 or 1. Coefficients are estimated using a process called MAXIMUM LIKEIHOOD Estimation

Question 53

Q

Linear Discriminant Analysis 1) Types of Problems

Answer

A

A LINEAR algorithm for classifying data in multiple classes.

LDA makes prediction by estimating the probability that a new set of inputs belongs to each class using Bayes Theorem. It uses statistical properties of your data(mean for each class, and variance for dataset) to make predictions.

Question 54

Q

CART 1) Type of Problems 2) How it’s constructed

Answer

A

Classification and Regression Trees (Decision Trees)

A decision tree is constructed by lining up all values and different split points are tried and tested.

Question 55

Q

Naive Bayes 1) Types of Problems 2) How it is constructed

Answer

A

Classification Problems only
Makes a “naive” assumption that the features in the dataset are not correlated.
Uses Bayes theorem

Advantages: 1) Low training data needed 2) Training is super fast because there is no coefficient optimization steps

Disadvantages: 1) Expects normal distribution for numerical data

2) Bad estimator of probabilities
3) Assumption of independent uncorrelated features

Question 56

Q

k-Nearest Neighbors

1) Types of problems
2) how it works
3) Unique factors

Answer

A

Classification and Regression
2a. Prediction is made by finding k number of instances in the training data that have the shortest distance by comparing the instance to all of the data in the training dataset
2b. Then choose either the median or mode of the output from the training data as the output
No model is trained

Question 57

Q

k-Nearest Neighbor

Advantages
Disadvantages

Answer

A

Advantages

Lazy Learning / No model needs to be prepared
Non-parametric
“Instance-Based Learning” - Raw training instances are used to make predictions

Disadvantages

Suitable for low dimensions (few inputs)
Suitable for small dataset
During prediction, the distance on the entire training dataset needs to be computed