Week 4 - RFM (Logistic Regression) Flashcards

Question 1

Q

Examples of Large Databases

Answer

A

Online transactions (e-commerce)
- Amazon: 300 million customer accounts
Web browsing/click stream data
Purchases at department/ grocery/ convenience stores
- Albert Heijn:16 million transactions per week
Subscription data
- Netflix: 200 million-plus subscriber

Question 2

Q

Why Data Mining?

Answer

A

Lots of data being collected
Computers and technology cheaper and more powerful now
Gain a competetive edge
Discover “hidden” info in the data

Question 3

Q

Data in the real world is dirty. Why?

Answer

A

Lacks values
Errors
Discrepencies

Question 4

Q

Major Tasks in Data Preprocessing

Answer

A

Data cleaning: dealing with missing values, inconsistencies
Data integration: integration of multiple databases
Data transfromation (date, time..)
Data reduction: reduced volume, same result

Question 5

Q

What is data mining for?

Answer

A

Pattern Discovery
- finding new, useful patterns in datasets
Relationship Analysis
- uncover unexpected rel. and summarize

Question 6

Q

Examples of Database Marketing Applications

Answer

A

Predicting customer response
* Likelihood of future purchase
* Likelihood of churn
* Marketing affectiveness
Market Basket Analysis
Click-stream Analytics

Question 7

Q

what does RFM stand for?

Answer

A

Recency = Time passed since last purchase
Frequency = Frequency of purchase in a given period
Monetary value = Amount spent on average in a given period

Question 8

Q

RFM + limitations (3)

Answer

A

segmentation technique
-accurate
-easy
-can be computed for any database

limitations:
-does not take into account other factors
-prediction of next period only
-past behaviour may be due to PAST MKT activities

Question 9

Q

What is Logistic Regression + Types of Logistic Regression (2)

Answer

A

Predicts categorical (non metric) outcomes (purchased, not purchased) with two or more categories = (yes (1) / no (0))

if two categories - Binary Logistic Regression
if more than two = Multinominal Logistic Regression (not covered in this course!)

Question 10

Q

Objectives of Logistic Regression

Answer

A

Identify
- finds which factors (RFM) influence the likelihood of an event. (purchasing)

Predict
- if a customer will buy based on their RFM scores

Question 11

Q

Logistic Regression Assumptions

Answer

A

No specific distribution required
No equal variance needed
Multicolinearity matters

Question 12

Q

Omibus Test

Answer

A

Is our model a better fit than Block 0? (baseline with no IVs)

sig. results = its better to use this model than Block 0 model with no IVs

Question 13

Q

Cox and Snell R square / Negelkerke R square

Answer

A

similar to R square in linear regression

-usefulness of the model
-between (COX n.) and (Neg. no.) of the variability in the DV is explained by this set of IVs

Question 14

Q

Hosmer and Lemeshow Test

Answer

A

How well the predicted values match the actual observed values of the DV

A non-significant p-value (greater than 0.05) is what you want.

It means there is no significant difference between the predicted and actual values, indicating the model is a good fit

Question 15

Q

Classification Accuracy of the model (Predicted vs Observed table)

Answer

A

How well the model predicts whether a purchase is made or not.
(how acurate are the predictions)

Question 16

Q

Exponentiated coefficients Exp(B)

Answer

A

shows the magnitude and direction of the effect of each IV on DV

Question 17

Q

Wald

Answer

A

similar to t-test