Data Analyst Concept Flashcards

1
Q

State the five sub-categories of the routine steps in data analysis

A

1) Discovering the problem
2) Data preparation
3) Fitting models to the data
4) Understanding the results
5) Sharing your work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe the two elements of a problem hypothesis

A

Problem statement - outlines the problem in simple real world terms.
Hypothesis - a simple, testable theory that addresses the problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name 5 requirements for a good hypothesis

A

1) Must be testable and written in non-ambiguous language
2) Must at least partly answer the problem statement
3) Must make at least one clear prediction
4) Must be based on relevant and reliable information
5) Must contain a dependant and independent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the difference between a dependant and independent variable

A

The independent variable is something that changes regularly and we measure it as it’s happening, such as time, whereas the dependant variable depends on the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain the difference between a hypothesis and a prediction

A

A hypothesis makes a broad suggestion trend, i.e the more time a customer spends on an online shop, the more likely a customer is to buy an item, whereas a prediction states a specific trend, i.e I predict for every 2 minutes spent extra, the customer is 5% more likely to buy an item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does ETL stand for

A

Extract, transform, load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define a ‘mathematical model’

A

A model that describes some features of the data using equations and parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

State the four steps involved in Cross-Validation

A

1) Select two random sets of data from the original dataset - Training and Testing Data
2) Fit the model (trend line) to the Training Data
3) Plot the Testing Data against the trendline
4) Assess how well the trendline predicts the Testing Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can you identify a ‘good’ mathematical model

A

The model will clearly predict the testing data when fitted to a small amount of training data

Will look similar even if you randomly pick different samples of training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you identify a ‘bad’ mathematical model

A

The model will give unpredictable results to the testing data

Will look very different each time you use randomly selected training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define ‘null hypothesis’

A

The null hypothesis is a theory that whatever relationship you are studying is not due to a real effect but observed only because of a random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define ‘alternative hypothesis’

A

Opposite of null, whatever observed relationship is related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly