Data Analysis Pipeline Flashcards

1
Q

What are the 5 steps that are necessary once you collect some data?

A
  1. Figure out the question.
  2. Find/​acquire relevant data.
  3. Clean & prepare the data.
  4. Analyze the data.
  5. Interpret & present results.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

It’s not unusual to spend most of your time _______ ______ when processing data

A

cleaning data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some different ways to get data for the question your attempting to answer?

A
  • Files (CSV, Excel, XML, etc)
  • API
  • DB
  • A sensor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When first retrieving data, it may have some inconsistencies that need to be cleaned such as:

  • Irrelevant things
  • Different ____ for similar values
  • Different _______ in files
  • _____ wrong
A

units
formats
shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When first retrieving data, it may have straight up incorrect values. What are some causes of this?

A
  • Missing values (failed sensor, incomplete collection, etc)
  • Outliers (data entry errors, etc)
  • Noise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The full data analysis or pipeline is often broken into steps. What are some reasons for this?

A
  • Don’t want to spam an API every time we process data
  • Test runs might take too long
  • Intermediate results might be meaningful
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The full data pipeline is not always obvious. In the end you may need to run ______ programs. You should always _______ your code so you know how things should be done

A

multiple, document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some reasons you might have manual steps in your pipeline?

A
  • Easier to do by hand than automating

- Most cases can be automatically determined, but some outliers could be left for manual intervention

How well did you know this?
1
Not at all
2
3
4
5
Perfectly