Data analytics lifecycle Flashcards
What are the main reasons to use frameworks?
efficient use of time
nothing gets forgotten
scale projects
why use frameworks in data science?
acts as a guide
ensure focus is on ds not bi
needs a collaborative approach
what are the 2 key project roles that get a sponsor presentation?
Business user
project sponsor
what are the 2 key project roles that get the code and technical documents?
data engineer
data scientist
what are the 2 key project roles that get an analyst presentation?
BI analyst
Database administrator
what are the 6 key project roles?
business user project sponsor project manager bi analyst data engineer database administrator ds
what is the data lifecycle? (6 phases)
discovery data prep model planning model building communicate results operationalise
In discovery what are the seven main areas?
learn business domain learn from the past resources frame the problem interviewing formulate initial hypothesis identify data sources
In discovery learn the domain - what do you not need to do? A)determine amount of domain knowledge B) determine general analytic problem C) decide what technique to use D)if you have no idea. Conduct research.
C) decide what technique to use
In discovery learn from the past what do you need to do?
have there been any previous attempts
why did they fail?
who is a business user?
someone who benefits from end results
who is the project sponsor?
person responsible for genesis of the project
who is a project manager?
ensure key milestones are met
who is the BI analyst?
business domain expert
who is the data engineer?
deep technical skills
who is the DBA?
provisions and configures database
who is the DS?
SME for techniques for overall analytic objectives being met
what is crisp DM?
cross-industry process for data mining
what are the 6 phases of CRISP-DM?
business understanding data understanding data prep modeling evaluation deployment
In discovery resources what do you need to access
available tech
data
people
time
In discovery frame the problem what are the objectives
What is the goal
What is the failure criterion
Identify the success criteria
In discovery formulate initial hypotheses what do you need to do? (2)?
gather and assess hypothesis
data exploration to inform discussions
In discovery identify data sources what do you need to do? (4)
aggregate sources
review the raw data
determine the structures and tools
scope the kind of data needed
How big is an analytical sandbox?
10x
In data prep what are the phases?(5)
prepare sandbox perform ELT familiarise with the data data conditioning survey and visualise
in model planning what are the phases? (6)
determine methods techniques and workflow data exploration variable selection model selection test & train
how much time is spent in data prep?
a) 50%
b) 60%
c) 70%
70%
what should you do in communicate results?
make recommendations
compare results
identify key findings
what should you do in operationalise?
run a pilot
assess benefits
implement model
why run a pilot?
make sure the model is robust
what type of tools can be used for phase 2?
SQL
Hadoop
MapReduce
what type of tools can be used for phase 4?
R
SQL
SAS