Business Analytics Midterm 2 Flashcards
Model
A simplified representation of reality created to serve a purpose
Predictive model
A formula for estimating the unknown value of interest: the target (formula can be mathematical, logical statement, etc.)
Prediction
Estimate an unknown value (the target)
Instance/example
Represents a fact or data point. Described by a set of attributes (fields, columns, variables, or features)
Model induction
The creation of models from data
Training data
The input data for the induction algorithm
Beta estimates
“Weights” used to calculate a prediction.
Intercept: 1.5
Age: -0.3
Height: 1.2
What is the EQ to predict result of 65 inch person who is 38 years old?
y = 1.5 + (-0.3)(38) + (1.2)(65)
Information gain measures…
The change in entropy due to any amount of new information being added. Calculated by subtracting the entropy of children from entropy of parent (multiply each child by its weight)
Entropy
Measures the general disorder of a dataset. Ex. a bag with 5 white chips and 5 black has an entropy of 1. 10 black chips has an entropy of 0
Why is laplace correction used?
Laplace correction skews probabilities with low sample sizes. Ex. 6 samples, 4 are positive. Chance for next person is 4/6 = 0.6667. With laplace correction chance is 5/8 = 0.625. Decreases probability to be conservative!
Two classification problems in creating a model
- Target values are discrete with no order. Ex. Single, Married, Divorced, Widowed.
- Target values are binary (0 and 1)
Classifier model (solution to classification)
Model predicts same set of discrete values as data. Ex. For binary data, model output is 0 or 1
Ranking (solution to classification)
Model predicts a score where a higher score means model thinks example is more likely to be in one class.
Probability estimation
Model predicts a score between 0 and 1 that is meant to be the probability of being in that class. Ex. Titanic data.