Data Analyst Concept Flashcards

Question 1

Q

State the five sub-categories of the routine steps in data analysis

Answer

A

1) Discovering the problem
2) Data preparation
3) Fitting models to the data
4) Understanding the results
5) Sharing your work

Question 2

Q

Describe the two elements of a problem hypothesis

Answer

A

Problem statement - outlines the problem in simple real world terms.
Hypothesis - a simple, testable theory that addresses the problem

Question 3

Q

Name 5 requirements for a good hypothesis

Answer

A

1) Must be testable and written in non-ambiguous language
2) Must at least partly answer the problem statement
3) Must make at least one clear prediction
4) Must be based on relevant and reliable information
5) Must contain a dependant and independent variable

Question 4

Q

Explain the difference between a dependant and independent variable

Answer

A

The independent variable is something that changes regularly and we measure it as it’s happening, such as time, whereas the dependant variable depends on the independent variable.

Question 5

Q

Explain the difference between a hypothesis and a prediction

Answer

A

A hypothesis makes a broad suggestion trend, i.e the more time a customer spends on an online shop, the more likely a customer is to buy an item, whereas a prediction states a specific trend, i.e I predict for every 2 minutes spent extra, the customer is 5% more likely to buy an item

Question 6

Q

What does ETL stand for

Answer

A

Extract, transform, load

Question 7

Q

Define a ‘mathematical model’

Answer

A

A model that describes some features of the data using equations and parameters.

Question 8

Q

State the four steps involved in Cross-Validation

Answer

A

1) Select two random sets of data from the original dataset - Training and Testing Data
2) Fit the model (trend line) to the Training Data
3) Plot the Testing Data against the trendline
4) Assess how well the trendline predicts the Testing Data

Question 9

Q

How can you identify a ‘good’ mathematical model

Answer

A

The model will clearly predict the testing data when fitted to a small amount of training data

Will look similar even if you randomly pick different samples of training data

Question 10

Q

How can you identify a ‘bad’ mathematical model

Answer

A

The model will give unpredictable results to the testing data

Will look very different each time you use randomly selected training data

Question 11

Q

Define ‘null hypothesis’

Answer

A

The null hypothesis is a theory that whatever relationship you are studying is not due to a real effect but observed only because of a random sampling

Question 12

Q

Define ‘alternative hypothesis’

Answer

A

Opposite of null, whatever observed relationship is related

Data Analyst Concept Flashcards

(12 cards)