C3 Data Science Methodology Flashcards
Define Methodology
A system of methods used in a particular domain.
10 stages of Data Science Methodology?
Get Business understanding.
Determine Analytic approach.
Data requirements.
Data collection.
Data understanding.
Data preparation.
Modeling.
Evaluation.
Deployment.
Feedback.
(1.) Business Understanding asks?
What is the problem that you’re trying to solve?
An (2.) Analytic Approach asks?
How can I use data to answer the question?
or,
What patterns address the question most effectively?
Evaluating (3.) Data Requirements asks?
What data do you need to answer the question?
Planning (4.) Data collection asks?
Where will I get the data that I need, and how will I ingress it?
(5.) Data Understanding asks?
Does the data I’ve collected represent the problem to be solved?
During (6.) Data Preparation ask?
What additional work is required to manipulate and work with the data?
When (7.) Modeling, ask?
When you apply your data visualizations, do you see answers that address the business problem?
During (8.) Evaluation ask?
Does the data model answer the initial business question or must you adjust the data?
During (9.) Deployment ask?
Can you put the model into practice?
When seeking (10.) Feedback, ask?
Can I get constructive feedback from the data and the stakeholder to answer the business question?
What is a cohort?
A group that shares a common characteristic.
What is CRISP-DM?
Cross-Industry Standard Process for Data Mining.
Predictive models do what?
Tell us the probability of a future event based on historical data.
In broad terms, Descriptive Models do what?
Summarize data, without making predictions
What is a Feature?
A characteristic or attribute developed within the data that helps in solving the problem.
What are, Pairwise Correlations?
An analysis to determine the relationships and correlations between different variables.
What is, Text Analysis?
Analyzing and manipulating textual data, & extracting meaningful information and patterns.
Descriptive Analytics tells us?
What happened. Hindsight.
Low value & low difficulty.
Provides information.
Diagnostic Analytics tells us?
Why did it happen. Insight.
Middling value & difficulty.
Provides information.
Predictive Analytics tells us?
What will happen. Forsight.
High value & high difficulty.
Helps us optimize.
Prescriptive Analytics tells us?
How we can make it happen. Greater foresight.
Highest value & difficulty.
Helps us optimize.
What does ROC curve stand for?
Receiver Operating Characteristic Curve.
First developed during World War II to detect enemy aircraft on radar.
Senesitivity formula?
Sensitivity = TP / P = TP / (TP + FN)
Also known as:
Hit Rate
True Positive Rate
Recall
Specificity formula?
Specificity = TN/N = TN / (TN+FP)
Also known as:
Selectivity
True Negative Rate
What is PPV?
Positive Predictive Value.
What is NPV?
Negative Predictive Value.
PPV formula?
TP / (TP+FP) = Positive Predictive Value
What is a Confusion Matrix?
A matrix thus:
|TP|FN|
————
|FP|TN|
NPV formula?
TN / (TN+FP) = Negative Predictive Value.
Type I error?
A False Positive.
What is a Type II Error?
A False Negative.
List the CRISP-DM steps.
Business Understanding.
Data Understanding.
Data Preparation.
Modeling.
Evaluation.
Deployment.