What is Data Science Flashcards
What is ‘Hypothesis-Driven Data Science’?
When you start with a hypothesis and then tries to prove or disprove that with data and evidence you find.
What is ‘Problem-Driven Data Science’?
When you start with a hypothesis or problem and then tries to prove or disprove that with data and evidence you find.
What is ‘Data-Driven Data Science’?
We have a data set and want to find new insights from that data.
What does it mean to ‘Frame the problem’?
To translate the ambiguous problem into a well-defined problem.
And Identify business priorities & strategy decisions that will influence your work
What are the steps of processing the data?
Examine Data at High-Level:
- Understand every column; Identify errors, missing values & corrupt records.
Clean the data:
- Throw away, replace, and or filter corrupt/faulty and missing values.
What does the step of performing In-Depth Analysis on the data involve?
In-Depth Analysis
Create a predictive model:
Evaluate and Refine Model
- Perhaps return to step of exploring, examining and cleaning the data
Definition of data science
Data Science is an application of scientific methods and principles to data processing. - Henrik Strøm
Framing the problem is the most important step in Data Science. How do you do it accurately
First of all, make sure you are asking a relevant question.
Put your question is put into a relevant context
Ask if the question can be answered with the resources you have
What is the process of collecting raw data?
Identify available datasets and then
extracting to a usable format. e.g .csv
What does the process of exploring the data involve?
Play around with the data:
- Split, segment and plot data in different ways.
Identify patterns and Extract features:
- Use statistics to identify & test significant variables.
What does the process of communicating the results involve?
Identify Business Insights
- Return back to the business problem
Visualize your findings
-Keep it simple and priority-driven
Tell a clear and actionable story
Effectively communicate to non-technical audiences