Process Data from Dirty to Clean (Terms) Flashcards
A range of values that conveys how likely a statistical estimate reflects the population
Confidence interval
A character that indicates the beginning or end of a data item
Delimiter
A data value that cannot be left blank or empty
Mandatory
A file containing a chronologically ordered list of modifications made to a project
Changelog
A function that removes leading, trailing, and repeated spaces in data
TRIM
A function that returns a segment from the middle of a text string
MID
A function that returns a set number of characters from the left side of a text string
LEFT
A function that returns a set number of characters from the right side of a text string
RIGHT
A function that returns the length of a text string by counting the number of characters it contains
LEN
A group of characters within a cell, most often composed of letters
Text string
A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries
DISTINCT
A number that contains a decimal
Float
A process that ensures certain conditions for multiple data fields are satisfied
Cross-field validation
A process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable
Verification
A process to determine if a survey or experiment has meaningful results
Hypothesis testing
A professional who develops processes and procedures to effectively store and organize data
Data warehousing specialist
A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure
Data engineer
A rule that says the values in a table must match a prescribed pattern
Regular expression (RegEx)
A spreadsheet function that calculates the number of days, months, or years between two dates
DATEDIF
A spreadsheet function that counts the total number of values within a specified range
COUNTA
A spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell
Split
A spreadsheet function that joins together two or more text strings
CONCATENATE
A spreadsheet function that returns the number of cells in a range that match a specified value
COUNTIF
A spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information
VLOOKUP
A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet
Remove duplicates
A spreadsheet tool that changes how cells appear when values meet specific conditions
Conditional formatting
A SQL function that adds strings together to create new text strings that can be used as unique keys
CONCAT
A SQL function that converts data from one datatype to another
CAST