COURSERA - Getting and cleaning data Flashcards

1
Q

THE COURSE GOAL

A

Raw data -> Processing script -> tidy data -> data analysis -> data communication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DEFINING DATA

A

Start with a SET OF ITEMS ; population

Determine VARIABLES that need to be measured

Determine what type of values of the VARIABLES are relevant => QUALITATIVE or QUANTITATIVE

QUALITATIVE: sex , country of origin, etc.

QUANTITATIVE: height, weight, blood pressure, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RAW vs PROCESSED DATA

A

Data is deemed RAW or processed depending on the analysis required.

RAW data is characterized by the fact that it is in its original format and it needs processing for the purpose of the planned analysis.

Processing data involves operations such as : merging, subsetting, transforming, etc.

Processing steps need to be recorded and transmitted to the analysis stage.

PROCESSED data is ready to be subjected to the planned analysis constraints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

DATA PROCESSING PIPELINE

A

a pipeline is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion; in that case, some amount of buffer storage is often inserted between elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly