Course 4: Module 2 Flashcards
Dirty data
Data that is incomplete, incorrect, or irrelevant to the problem you’re trying to solve
Clean data
Data that is complete, correct, and relevant to the problem you’re trying to solve
Null
An indication that a value does not exist in a dataset
Duplicate data
Any data record that shows up more than once
Outdated data
Any data that is old which should be replaced with newer and more accurate information
Incomplete data
Any data that is missing important fields
Incorrect/Inaccurate data
Any data that is complete but inaccurate
Inconsistent data
Any data that uses different formats to represent the same thing
Field
A single piece of information from a row or column of a spreadsheet
Data validation
A tool for checking the accuracy and quality of data before adding or importing it
Data merging
The process of combining two or more datasets into a single dataset
Compatibility
How well two or more datasets are able to work together
Common mistakes to avoid
- Not checking for spelling errors
- Forgetting to document errors
- Not checking for misfielded values
- Overlooking missing values
- Only looking at a subset of the data
- Not fixing the source of the error
- Not analyzing the system prior to data cleaning
- Not backing up your data prior to data cleaning
- Not accounting for data cleaning in your deadlines/process
Conditional formatting
A spreadsheet tool that changes how cells appear when values meet specific conditions
Remove duplicates
A tool that automatically searches for and eliminates duplicate entries from a spreadsheet
Text string
A group of characters within a cell, most often composed of letters
Split
A tool that divides text around a specified character and puts each fragment into a new, separate cell
Concatenate
A function that joins multiple text strings into a single string
COUNT IF
Returns the number of cells that match a specified value
Syntax
A predetermined structure that includes all required information and its proper placement
Len
A function that tells you the length of a text string by counting the number of characters it contains
LEFT
A function that gives you a set number of characters from the left side of a text string
RIGHT
A function that gives you a set number of characters from the right side of a text string
MID
A function that gives you a segment from the middle of a text string
Trim
A function that removes leading, trailing, and repeated spaces in data
Pivot table
A data summarization tool that is used in data processing
VLOOKUP
A function that searches for a certain value in a column to return a corresponding piece of information
Data mapping
The process of matching fields from one data source to another
Compatibility
How well two or more datasets are able to work together
Schema
A way of describing how something is organized