Data analytics lifecycle Flashcards

1
Q

What are the main reasons to use frameworks?

A

efficient use of time
nothing gets forgotten
scale projects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

why use frameworks in data science?

A

acts as a guide
ensure focus is on ds not bi
needs a collaborative approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what are the 2 key project roles that get a sponsor presentation?

A

Business user

project sponsor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what are the 2 key project roles that get the code and technical documents?

A

data engineer

data scientist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what are the 2 key project roles that get an analyst presentation?

A

BI analyst

Database administrator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what are the 6 key project roles?

A
business user
project sponsor
project manager
bi analyst
data engineer
database administrator
ds
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the data lifecycle? (6 phases)

A
discovery
data prep
model planning
model building 
communicate results
operationalise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In discovery what are the seven main areas?

A
learn business domain
learn from the past
resources 
frame the problem
interviewing 
formulate initial hypothesis
identify data sources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q
In discovery learn the domain - what do you not need to do?
A)determine amount of domain knowledge 
B) determine general analytic problem
C) decide what technique to use
D)if you have no idea. Conduct research.
A

C) decide what technique to use

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In discovery learn from the past what do you need to do?

A

have there been any previous attempts

why did they fail?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

who is a business user?

A

someone who benefits from end results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

who is the project sponsor?

A

person responsible for genesis of the project

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

who is a project manager?

A

ensure key milestones are met

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

who is the BI analyst?

A

business domain expert

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

who is the data engineer?

A

deep technical skills

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

who is the DBA?

A

provisions and configures database

17
Q

who is the DS?

A

SME for techniques for overall analytic objectives being met

18
Q

what is crisp DM?

A

cross-industry process for data mining

19
Q

what are the 6 phases of CRISP-DM?

A
business understanding
data understanding
data prep
modeling
evaluation
deployment
20
Q

In discovery resources what do you need to access

A

available tech
data
people
time

21
Q

In discovery frame the problem what are the objectives

A

What is the goal
What is the failure criterion
Identify the success criteria

22
Q

In discovery formulate initial hypotheses what do you need to do? (2)?

A

gather and assess hypothesis

data exploration to inform discussions

23
Q

In discovery identify data sources what do you need to do? (4)

A

aggregate sources
review the raw data
determine the structures and tools
scope the kind of data needed

24
Q

How big is an analytical sandbox?

A

10x

25
Q

In data prep what are the phases?(5)

A
prepare sandbox
perform ELT
familiarise with the data
data conditioning
survey and visualise
26
Q

in model planning what are the phases? (6)

A
determine methods
techniques and workflow
data exploration 
variable selection 
model selection 
test & train
27
Q

how much time is spent in data prep?

a) 50%
b) 60%
c) 70%

A

70%

28
Q

what should you do in communicate results?

A

make recommendations
compare results
identify key findings

29
Q

what should you do in operationalise?

A

run a pilot
assess benefits
implement model

30
Q

why run a pilot?

A

make sure the model is robust

31
Q

what type of tools can be used for phase 2?

A

SQL
Hadoop
MapReduce

32
Q

what type of tools can be used for phase 4?

A

R
SQL
SAS