Stats SA3 Flashcards
What does the data say will happen?
Predictive Analytics
What has happened or what is happening now?
Descriptive Analytics
Why it happened?
Diagnostic Analytics
What will likely happen?
Predictive Analytics
Predictive Analytics Process Order
Project Design, Data Sampling, Data Exploration, Data Modification, Model Validation, Model Development
Kickoff meeting
Understand modeling objective
Define acceptance criteria
Document data and deployment requirement
Project Design
Data extraction
Apply filters and exclusions
Identify external data sources
Data Sampling
Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data
Data Exploration
Data Cleaning
Data augmentation and transformation
Feature selection
Data Modification
Model performance review
Feedback based on business knowledge and inputs from subject matter experts (SME’s)
Model Validation
Apply different modeling techniques and select final methodology
Model Development
Linear Regression Analysis Formula
y = 6x + a + ε
Dependent Variable (Value to be predicted)
y
Beta coefficient (Rate multiplied to X)
6
Independent variable (Value driving prediction)
x
Alpha intercept (Baseline figure for y)
α
Error term (Balancing figure)
ε
Reasons for Inclusion for the Error Term (1) :
To account for unexplained variability in the dependent variable for other relevant independent variables, which may not have been included in the model
Reasons for Inclusion for the Error Term (2) :
To capture measurement error in both the dependent and independent variables
You can have more than one predictor variable (x1 - xn)
Multiple Linear Regression
You still need to investigate the model’s _______
goodness-of-fit
You need to prove if your predictors are _______
significant
The _________, R^2, is a goodness-of-fit measure
coefficient of multiple determination
_____ is a figure of merit; the higher the ____, the better is the success of the model in explaining the variation in the response using the set of predictors
R^2
R^2 is normally expressed as a percentage and is interpreted as the amount of _____ in the response explained by the independent variables
variability
is a decomposition of the total variation in the response into explained (pattern) and unexplained (error) parts
Analysis of Variance (ANOVA)
The ______ variability is the amount of variation in the response variable that may be attributed to the predictors explicitly state in the model
explained
The ______ variability is the amount of variation attribute to random error
unexplained
SS refers to _____
Sum of Squares
The df column refers to the _____
degrees of freedom
The df for _____ is always the number of regression parameters minus one
Regression
The df for ______, it is the sample size minus the number of regression parameters
Residual
The total df is the ___ of those two degrees of freedom
sum
MS refers to _____
Mean Squares
The values in this column are the ratio of each sum of square to their respective degrees of freedom
Mean Squares
_______ have no physical meaning but are instrumental in computing the F-statistic
Mean squares
The _____ determines if regression is meaningful for the data at hand
F-test
When the ____ is small. it means that there is at least one significant predictor in the analysis
p-value
When _ is low, __ must go
p, Ho
The p-value is low if it is less than the ____
a significance level
The _____ helps in assessing if an individual predictor is significant
t-test
In t-test, if p < 0.05, ____
Significant Predictor
In t-test, if p > 0.05, ____
Insignificant Predictor