Machine Learning Flashcards
Nominal Data (1/4 types of data)
Data that is mutually exclusive, but not ordered (eg. Eye color, sex, type of car, zip codes )
Ordinal Data (1/4 types of data)
Corresponds to Categories where order matters but not difference between values. Eg. Letter Grades, Movie Ratings, Pain Level, Cold-warm-hot of coffee cup, gender
BNN
Biological Neural Network
ANN
Artificial Neural Network
Typical Neural Network
[input pattern] → Input Layer → Hidden Layers → Output Layer → [Output pattern]
Input pattern is presented to the input layer. Then the output pattern is returned from the output layer. What happens between the input and output layers is a black box.
Sigmoid Activation Function
An S curve from 0 to 1
Hyperbolic Tangent Activation Function
an S curve from -1 to 1
Ways to Normalize Nominal Values
- One-of-n Normalization 2. Equilateral Normalization
One-of-n Normalization (aka One-hot encoding )
One way of normalizing Nominal Observations. You have one neuron for each of the output class.
The other way to normalize Nominal Observations is Equilateral Encoding
Equilateral Encoding
- How it works
- Neurons needed
A way of normalizing Nominal Observations.
Floating point numbers is created for each class item with uniform equilateral distance to the other class data items. This allows all output neurons to play a part in each class item and causes an error to affect more neurons than one-of-n encoding (the other way to normalize nominal observations)
Requires one less output neuron than One-of-N normalization
Row of a dataset (3)
- An Entity
- An observation
- Instance
Group of input variable
Input Vector
Columns of a dataset (2)
- Features
2. Attributes of the Observation
Models vs Algorithms
Model = Algorithm(Data)
Field of machine learning that focusing on making predictions
Predictive Modeling - A target function “f” that best maps input variable “X” to output variable “Y”. There is an irreducible error “e”
Y=f(X) + e
We are trying to learn the shape of “f”. Different machine learning algorithms make different assumptions on the shape of “f”. This is why we must try different ML algorithms
Parametric ML Algorithms
Parametric Functions make assumptions on the shape of “f” in Y=f(X) + e
- Linear ML Algorithms
- Logistic Regression
- Linear Discriminant Analysis
- Perceptron
Advantages are Parametric algorithms are simpler, faster, and require less data to train. Disadvantage are they are constrained, have limited complexity, and a poor fit to map the shape of “f”
Non-Parametric ML Algorithms
Do no make assumptions on the shape of the target function.
They are good when you have lots of data and don’t want to worry about choosing all the right features
Examples:
Decision Tree, Neural Networks, Naive Bayes, Support Vector Machines
(Dis)Advantages of Non-Parametric ML Algorithms
Advantages
- Flexibility - may fit a large number of target functions
- Power - no assumptions
- performance - Higher prediction performance
Disadvantages:
- More data needed
- slower
- overfitting - more likely to overfit
4 common types of Data Modeling problems
- Data Classification
- Regression Analysis
- Clustering
- Time Series
Data Classification
Try and determine the class the data falls into using Supervised Learning. A class is usually a non-numerical data attribute
Regression Analysis
A predictive modeling technique which investigates the relationship between a dependent (target) and independent variable (s). Regression problem is when the output variable is a real value, such as “dollars” or “weight.”
Clustering
Clustering algorithms take input data and place it into clusters. The programmer usually specifies the number of clusters to be created before training the algorithm. Because there is no expected output, clustering is considered unsupervised training. If the number of clusters changes, the clustering machine learning method will need to be retrained