Machine Learning Flashcards

1
Q

Nominal Data (1/4 types of data)

A

Data that is mutually exclusive, but not ordered (eg. Eye color, sex, type of car, zip codes )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ordinal Data (1/4 types of data)

A

Corresponds to Categories where order matters but not difference between values. Eg. Letter Grades, Movie Ratings, Pain Level, Cold-warm-hot of coffee cup, gender

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

BNN

A

Biological Neural Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ANN

A

Artificial Neural Network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Typical Neural Network

A

[input pattern] → Input Layer → Hidden Layers → Output Layer → [Output pattern]

Input pattern is presented to the input layer. Then the output pattern is returned from the output layer. What happens between the input and output layers is a black box.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sigmoid Activation Function

A

An S curve from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Hyperbolic Tangent Activation Function

A

an S curve from -1 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ways to Normalize Nominal Values

A
  1. One-of-n Normalization 2. Equilateral Normalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

One-of-n Normalization (aka One-hot encoding )

A

One way of normalizing Nominal Observations. You have one neuron for each of the output class.

The other way to normalize Nominal Observations is Equilateral Encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Equilateral Encoding

  1. How it works
  2. Neurons needed
A

A way of normalizing Nominal Observations.

Floating point numbers is created for each class item with uniform equilateral distance to the other class data items. This allows all output neurons to play a part in each class item and causes an error to affect more neurons than one-of-n encoding (the other way to normalize nominal observations)

Requires one less output neuron than One-of-N normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Row of a dataset (3)

A
  1. An Entity
  2. An observation
  3. Instance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Group of input variable

A

Input Vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Columns of a dataset (2)

A
  1. Features

2. Attributes of the Observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Models vs Algorithms

A

Model = Algorithm(Data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Field of machine learning that focusing on making predictions

A

Predictive Modeling - A target function “f” that best maps input variable “X” to output variable “Y”. There is an irreducible error “e”

Y=f(X) + e

We are trying to learn the shape of “f”. Different machine learning algorithms make different assumptions on the shape of “f”. This is why we must try different ML algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Parametric ML Algorithms

A

Parametric Functions make assumptions on the shape of “f” in Y=f(X) + e

  • Linear ML Algorithms
  • Logistic Regression
  • Linear Discriminant Analysis
  • Perceptron

Advantages are Parametric algorithms are simpler, faster, and require less data to train. Disadvantage are they are constrained, have limited complexity, and a poor fit to map the shape of “f”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Non-Parametric ML Algorithms

A

Do no make assumptions on the shape of the target function.

They are good when you have lots of data and don’t want to worry about choosing all the right features

Examples:
Decision Tree, Neural Networks, Naive Bayes, Support Vector Machines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

(Dis)Advantages of Non-Parametric ML Algorithms

A

Advantages

  • Flexibility - may fit a large number of target functions
  • Power - no assumptions
  • performance - Higher prediction performance

Disadvantages:

  • More data needed
  • slower
  • overfitting - more likely to overfit
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

4 common types of Data Modeling problems

A
  1. Data Classification
  2. Regression Analysis
  3. Clustering
  4. Time Series
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Data Classification

A

Try and determine the class the data falls into using Supervised Learning. A class is usually a non-numerical data attribute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Regression Analysis

A

A predictive modeling technique which investigates the relationship between a dependent (target) and independent variable (s). Regression problem is when the output variable is a real value, such as “dollars” or “weight.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Clustering

A

Clustering algorithms take input data and place it into clusters. The programmer usually specifies the number of clusters to be created before training the algorithm. Because there is no expected output, clustering is considered unsupervised training. If the number of clusters changes, the clustering machine learning method will need to be retrained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Temporal Algorithm

A

Algorithm that accepts input for values that range over time. Algorithms often use a sliding input window and a prediction window.

24
Q

Deterministic Training vs Stochastic Training

A

Deterministic Training Algorithms always perform the exact same way given the same initial state. No random numbers are used.

Stochastic training uses random numbers to train, so the algorithm trains differently each time

25
Q

Internval Data (1/4 types of data)

A

Data where the difference between two values is meaningful but the value of zero is arbitrary. Eg. Temperature (in F or C), year

26
Q

Ratio Data (1/4 types of data)

A

It has properties of interval data but a clear concept of zero.

eg. Age, speed, length, width, volume, mass

27
Q

Supervised Learning - Definition + Types of Problems solved

A

Your training data has the input and output variables and you are using an algorithm to learn the mapping function f

Y=f(X)

Problems solved: 1) Regression 2) Classification

Ex: Linear Regression ; Random Forest, SVM

28
Q

Unsupervised Learning - Definition and Types of Problems solved

A

You have input data X and no corresponding output variables with the goal to model the underlying structure to learn more about data. Problems solved: 1. Clustering (grouping of data) 2. Association (rules which describe portions of your data

Algorithms: k-means for clustering ; Apriori algorithms for association rule learning

29
Q

Semi-Supervised Learning - Definition and Types of Problems solved

A

Some data is labeled but most is unlabeled and a mixture of supervised and unsupervised techniques

30
Q

Types of ML Error (3) - Definition

A
  1. Bias Error- Simplifying Assumptions made by algorithm to make it easier to solve
  2. Variance Error - Sensitivity of the model to changes in training data
  3. Irreducible Error - Unknown variables influencing the mapping of input to output
31
Q

Power calculations

A

Helps determine amount of data required for training given expected accuracy/reliability

32
Q

Reinforcement Learning

A

A computer program interacts with a dynamic environment in which it must perform certain tasks, learning through trial and error as it seeks to achieve it’s goal

33
Q

Linear and Polynomial regression

A

Regression is concerned with modeling relationship between numerical variables that is iteratively refined using a measure of error in the prediction made by the model. Basic assumption is that the output variable(a numeric value) can be expressed as a combination(weight sum) of numeric input variables

34
Q

Neural Networks - 1) Definition 2) Types of Problems

A

A large number of highly interconnected processing elements work in unison to solve specific problems, usually classification or pattern-matching problems. Each neuron ‘votes’ on the decision outcome, which might trigger out neurons to vote, and the votes are tallied creating a ranking of the outcomes depending on the support each has received.

35
Q

Decision Trees - 1) Definition 2) Types of Problems

A

Tree like flowcharts use branching to illustrate every possible outcome of a decision. Most decision trees use binary branching (two options) baed on actual values or attributes of a data.

Types of Problems: 1. Classification 2. Regression

36
Q

Overfitting - 1. Definition 2. Solution

A

ML model learns both the details and the noise too well at the expense of not generalizing to new data.

If we train too long, the error rate on model keeps dropping but error rate on test data goes up!

Solution: Resampling methods(k-fold cross validation) and held-back validation (hold data to very end - if you have enough)

37
Q

Underfitting 1. Definition 2. Solution

A

Definition: Failing to learn the problem from the training data sufficiently.

Solution: Try different ML algorithms
Advice: You want to be in middle of overfitting and underfitting

38
Q

Generalization

A

How well the concepts learned from the model apply to specific examples not seen by the model when it was learning

39
Q

Goodness of Fit

A

measures used in statistics to estimate how well the approximation of the function matches the target function

40
Q

K-fold cross validation

A

A cross validation technique used to evaluate model on unseen data

  1. Shuffle the dataset randomly.
  2. Split the dataset into k groups
  3. For each unique group:
    3a. Take the group as a hold out or test data set
    3b. Take the remaining groups as a training data set
    3c. Fit a model on the training set and evaluate it on the test set
  4. Retain the evaluation score and discard the model
  5. Summarize the skill of the model using the sample of model evaluation scores
41
Q

Cross Validation

A

Cross-validation is a RESAMPLING PROCEDURE used to EVALUATE ML models on a limited data sample. It is primarily used in applied machine learning to ESTIMATE the SKILL of a machine learning model on UNSEEN DATA.

42
Q

Gradient Decent - Definition + Types(2)

A

An OPTIMIZATION algorithm which can be used with many ML problems. It is used to find the values of parameters (coefficients) of a function (f) that minimizes a cost function. Best used when parameters cannot be estimated analytically(Linear Algebra).

Types: Batch and Stochastic

43
Q

Gradient Decent Steps

A
  1. Choose Random Coefficients or set to zero
  2. Compute Cost: cost = evaluate(f(coefficient))
  3. Find derivative of cost: delta = derivative(cost)
  4. Change coefficient: coefficient - (learning_rate * delta)
  5. Goto back to step 2; new iteration
44
Q

Batch Gradient Descent

A

Cost is calculated by looking at entire dataset before updating the coefficients (for each iteration of the algorithm)

45
Q

Stochastic Gradient Descent

A

Used in situations in which you have too much data.

Cost is calculated by taking the derivative from each training data instance and calculating the update immediately

46
Q

Elements of a Decision

A
  1. Data (input, training, feedback)
  2. prediction
  3. judgement - determine reward s penalties for each possible outcome
  4. action
  5. Outcome

As prediction becomes cheap due to ML and human prediction will decline in value

Value of Judgement will go up

47
Q

Define: Feature Scaling / Normalization

Common Types:

A

The goal of normalization is to transform features to be on a similar scale.

  1. Scaling to Range - convert from 0 to 1
  2. Clipping - Capping extreme outliers to a min/max value(ie. limit values to +-3σ
  3. Log Scaling - Compute log of values to compress wide range to a narrow range
  4. Z-Score - Scaling that represents number of standard deviations away from mean
  5. BoxCox
48
Q

Define: Bucketing / Binning + Types (2)

A

Transforms numeric features into categorical features, using a set of thresholds, is called bucketing (or binning). Needed when there is no linear relationship between the numbers (ie. zip code)

Equal Buckets - Buckets are of equal range
Quartile Buckets - Buckets with equal number of points

49
Q

Feature Vocabulary

A

Numerical index given to items(unique features) in a category

50
Q

Out of Vocab (OOV)

A

A catch all category for rare ordinal data in a category (low training data) so that machine won’t waste time training on those categories

51
Q

Rectangular Data

  1. Definition
  2. Another term
A

A rectangular data object like a spreadsheet or data table

Also called a Data Frame

52
Q

Logistic Regression 1) Types of Problems 2) Algorithm/Process to estimate coefficient

A

A LINEAR algorithm for a two class BINARY classification problem. It will predict the probability that of an instance belonging to the default class, which can be snapped to 0 or 1. Coefficients are estimated using a process called MAXIMUM LIKEIHOOD Estimation

53
Q

Linear Discriminant Analysis 1) Types of Problems

A

A LINEAR algorithm for classifying data in multiple classes.

LDA makes prediction by estimating the probability that a new set of inputs belongs to each class using Bayes Theorem. It uses statistical properties of your data(mean for each class, and variance for dataset) to make predictions.

54
Q

CART 1) Type of Problems 2) How it’s constructed

A

Classification and Regression Trees (Decision Trees)

A decision tree is constructed by lining up all values and different split points are tried and tested.

55
Q

Naive Bayes 1) Types of Problems 2) How it is constructed

A
  1. Classification Problems only
  2. Makes a “naive” assumption that the features in the dataset are not correlated.
  3. Uses Bayes theorem

Advantages: 1) Low training data needed 2) Training is super fast because there is no coefficient optimization steps

Disadvantages: 1) Expects normal distribution for numerical data

2) Bad estimator of probabilities
3) Assumption of independent uncorrelated features

56
Q

k-Nearest Neighbors

1) Types of problems
2) how it works
3) Unique factors

A
  1. Classification and Regression
    2a. Prediction is made by finding k number of instances in the training data that have the shortest distance by comparing the instance to all of the data in the training dataset
    2b. Then choose either the median or mode of the output from the training data as the output
  2. No model is trained
57
Q

k-Nearest Neighbor

  1. Advantages
  2. Disadvantages
A

Advantages

  1. Lazy Learning / No model needs to be prepared
  2. Non-parametric
  3. “Instance-Based Learning” - Raw training instances are used to make predictions

Disadvantages

  1. Suitable for low dimensions (few inputs)
  2. Suitable for small dataset
  3. During prediction, the distance on the entire training dataset needs to be computed