Data Cleaning Flashcards
Dirty Data
Data that’s incomplete, incorrect, or irrelevant to the problem you’re trying to solve.
Clean Data
Data that’s complete, correct, and relevant to the problem you’re trying to solve.
Data Engineers
Transform data into a useful format for analysis and give it a reliable infrastructure.
Data Warehousing Specialists
Develop processes and procedures to effectively store and organise data.
Null
An indication that a value does not exist in a dataset. A null is not the same as a zero.
Field
A single piece of information from a row or a column of a spreadsheet.
Field Length
A tool for determining how many characters can be keyed into a field.
Data Validation
This is a tool for checking the accuracy and quality of data before adding or importing it.
Data-cleaning tools and techniques
- Remove duplicates
- Remove irrelevant data
- Remove extra spaces and blanks
- Fix misspellings
- Fix inconsistent capitalisation
- Fix incorrect punctuation and other types
- Removing formatting [most Spreadsheet apps have a ‘clear format’ tool in the toolbar.
Data merging
The process of combining two or more datasets into a single dataset.
Compatibility
How well two or more datasets are able to work together.
Compatibility Questions
- Do I have all the data I need?
- Does the data I need exist within these datasets?
- Do the datasets need to be cleaned or are they ready for me to use?
- Are the datasets cleaned to the same standard?