Data Cleaning Flashcards

1
Q

Dirty Data

A

Data that’s incomplete, incorrect, or irrelevant to the problem you’re trying to solve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clean Data

A

Data that’s complete, correct, and relevant to the problem you’re trying to solve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Engineers

A

Transform data into a useful format for analysis and give it a reliable infrastructure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data Warehousing Specialists

A

Develop processes and procedures to effectively store and organise data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Null

A

An indication that a value does not exist in a dataset. A null is not the same as a zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Field

A

A single piece of information from a row or a column of a spreadsheet.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Field Length

A

A tool for determining how many characters can be keyed into a field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Data Validation

A

This is a tool for checking the accuracy and quality of data before adding or importing it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data-cleaning tools and techniques

A
  • Remove duplicates
  • Remove irrelevant data
  • Remove extra spaces and blanks
  • Fix misspellings
  • Fix inconsistent capitalisation
  • Fix incorrect punctuation and other types
  • Removing formatting [most Spreadsheet apps have a ‘clear format’ tool in the toolbar.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data merging

A

The process of combining two or more datasets into a single dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Compatibility

A

How well two or more datasets are able to work together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Compatibility Questions

A
  • Do I have all the data I need?
  • Does the data I need exist within these datasets?
  • Do the datasets need to be cleaned or are they ready for me to use?
  • Are the datasets cleaned to the same standard?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly