Business Analytics Midterm 2 Flashcards

1
Q

Model

A

A simplified representation of reality created to serve a purpose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Predictive model

A

A formula for estimating the unknown value of interest: the target (formula can be mathematical, logical statement, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Prediction

A

Estimate an unknown value (the target)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Instance/example

A

Represents a fact or data point. Described by a set of attributes (fields, columns, variables, or features)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Model induction

A

The creation of models from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Training data

A

The input data for the induction algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Beta estimates

A

“Weights” used to calculate a prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Intercept: 1.5
Age: -0.3
Height: 1.2

What is the EQ to predict result of 65 inch person who is 38 years old?

A

y = 1.5 + (-0.3)(38) + (1.2)(65)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Information gain measures…

A

The change in entropy due to any amount of new information being added. Calculated by subtracting the entropy of children from entropy of parent (multiply each child by its weight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Entropy

A

Measures the general disorder of a dataset. Ex. a bag with 5 white chips and 5 black has an entropy of 1. 10 black chips has an entropy of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is laplace correction used?

A

Laplace correction skews probabilities with low sample sizes. Ex. 6 samples, 4 are positive. Chance for next person is 4/6 = 0.6667. With laplace correction chance is 5/8 = 0.625. Decreases probability to be conservative!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Two classification problems in creating a model

A
  1. Target values are discrete with no order. Ex. Single, Married, Divorced, Widowed.
  2. Target values are binary (0 and 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Classifier model (solution to classification)

A

Model predicts same set of discrete values as data. Ex. For binary data, model output is 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Ranking (solution to classification)

A

Model predicts a score where a higher score means model thinks example is more likely to be in one class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Probability estimation

A

Model predicts a score between 0 and 1 that is meant to be the probability of being in that class. Ex. Titanic data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Order the three classification solutions from least accurate to most

A

Classifier model: Don’t use it
Ranking
Probability: You can always rank/classify if you have probabilities

You can always go backwards (to less accurate method) but not forwards

17
Q

Pruning

A

Simplifies a decision tree to prevent over-fitting

18
Q

Pre-pruning vs. post-pruning

A

Pre-pruning: Stops growing a branch when information becomes unreliable.
Post-pruning: Takes fully-grown tree and discards unreliable parts.

Post-pruning is preferred!