FA4 + M4 - Sheet1 Flashcards
Which of the following activities are associated with Data Exploration?
Data cleaning
Data augmentation and transformation
Exploratory data analysis
Feature selection
Identify data dependencies and correlations
Identify trends or anomalies in the data
Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data
Which of the following activities are associated with Data Exploration?
Group of answer choices
Identify data dependencies and correlations
Identify trends or anomalies in the data
Exploratory data analysis
Data cleaning
Feature selection
Data augmentation and transformation
Identify data dependencies and correlations
Identify trends or anomalies in the data
Exploratory data analysis
Which of the following activities are associated with Data Modification?
Group of answer choices
Data cleaning
Data augmentation and transformation
Exploratory data analysis
Feature selection
Identify data dependencies and correlations
Identify trends or anomalies in the data
Data cleaning
Data augmentation and transformation
Identify trends or anomalies in the data
hindi dapat identify, feature selection dapat :/
Which activity involves adding new data points or modifying existing ones to improve the dataset?
Group of answer choices
Data augmentation
Data cleaning
Exploratory data analysis
Feature selection
Data augmentation
Which of the following is NOT typically a part of Data Exploration?
Group of answer choices
Cleaning the data
Identifying data dependencies
Identifying trends in the data
Exploratory data analysis
Cleaning the data
Which activity is crucial for understanding the relationships between different variables in a dataset?
Group of answer choices
Identifying data dependencies and correlations
Data cleaning
Data augmentatio
Feature selection
Identifying data dependencies and correlations
What does the data say will happen?
Predictive Analytics
What has happened or what is happening now?
Descriptive Analytics
Why it happened?
Diagnostic Analytics
What will likely happen?
Predictive Analytics
Predictive Analytics Process:
Project Design
Data Sampling
Data Exploration
Data Modification
Model Validation
Model Development
Project Design
Project Design:
Kickoff meeting
Understand modeling objective
Define acceptance criteria
Document data and deployment requirement
Data Sampling
Data extraction
Apply filters and exclusions
Identify external data sources
Data Exploration
Exploratory data analysis
Identify data dependencies and correlations
Identify trends or anomalies in the data
Data Modification
Data Cleaning
Data augmentation and transformation
Feature selection
Model Validation
Model performance review
Feedback based on business knowledge and inputs from subject matter experts (SME’s)
Model Development
Apply different modeling techniques and select final methodology
Dependent Variable (Value to be predicted)
y
Beta coefficient (Rate multiplied to X)
6
Independent variable (Value driving prediction)
x
Alpha intercept (Baseline figure for y)
α
Error term (Balancing figure)
ε
To account for unexplained variability in the dependent variable for other relevant independent variables, which may not have been included in the model
Inclusion for the Error Term
To capture measurement error in both the dependent and independent variables
Inclusion for the Error Term
You can have more than one predictor variable (x1 - xn)
Multiple Linear Regression
Training vs. Validation vs. Test Data
Splitting the Dataset
Can I use the model already for prediction purposes?
You still need to investigate the model’s ______
You need to prove if your predictors are ____
goodness-of-fit.
significant
The ________ , is a goodness-of-fit measure
coefficient of multiple determination, R^2
___ is a figure of merit
R^2
the ____ the R^2, the better is the success of the model in explaining the variation in the response using the set of predictors
higher
___ is normally expressed as a percentage and is interpreted as the amount of variability in the response explained by the independent variables
R^2
The _____ is a decomposition of the total variation in the response into explained (pattern) and unexplained (error) parts
ANOVA
ANOVA meaning:
Analysis of Variance
The ____ variability is the amount of variation in the response variable that may be attributed to the predictors explicitly state in the model
explained
The _____ variability is the amount of variation attribute to random error
unexplained
SS refers to
Sum of Squares
There is good fit if the Regression Sum of Squares is ____ than the Residual Sum of Squares
much larger
The df column refers to the ____
degrees of freedom
The df for Regression is always the ________
number of regression parameters minus one
The df for Residual, it is the sample size minus the _____
number of regression parameter
The total df is the _____
sum of those two degrees of freedom
MS refers to _____.
Mean Squares
The values in this column are the ratio of each sum of square to their respective degrees of freedom.
Mean Squares
have no physical meaning but are instrumental in computing the F-statistic
Mean Squares
Mean squares have no physical meaning but are instrumental in computing the _____
F-statistic
The ____ determines if regression is meaningful for the data at hand
F-test
When the ____ is small. it means that there is at least one significant predictor in the analysis
p-value
When the p-value is _____. it means that there is at least one significant predictor in the analysis
small
When p is ___, Ho must
low, go
The p-value is _____ than the a significance level
low if it is less
The ___ helps in assessing if an individual predictor is significant
t-test
If p <0.05:
significant predictor
if p >0.05:
insignificant predictor