All Glossary Terms Flashcards
The data wrangling step in which errors in the raw data are corrected.
Cleaning
Documentation of characteristics of the wrangled data such as names and definitions of the fields, units of measure used in the fields, the source(s) of the raw data, relationship(s) of the wrangled data with other data, and other attributes.
Data dictionary
The process of cleaning, transforming, and managing data so it is more reliable and can be more easily accessed and used for analysis.
Data wrangling
A tag or marker that separates structured data into various fields
Delimiters
The data wrangling step in which the analyst becomes familiar with the data in order to conceptualize how it might be used and potentially discovers issues that will need to be addressed later in the data wrangling process.
Discovery
A field that that takes a value of 0 or 1 to indicate the absence or presence of some categorical effect.
Dummy variable
The data wrangling step in which the raw data are augmented by incorporating values from other data sets and/or applying transformations to portions of the existing data to ensure that all data that will be required for the ensuing analyses will be included in the resulting data set.
Enriching
A characteristic of the observations in a data set.
Field
A data file in which structured data are arrayed as a rectangle, with each row representing an observation or record, and each column representing a unique variable or field.
Flat file
Instances for which there is an appropriate reason for the value of a field to be missing.
Illegitimately missing data
Instances for which there is an appropriate reason for the value of a field to be missing.
Legitimately missing data
Systematic replacement of missing values with values that seem reasonable.
Imputation
Instances for which the tendency for a record to be missing a value of some field is related to the value of some other fields(s) in the record.
Missing at random
Instances for which the tendency for a record to be missing a value of some field is entirely random.
Missing completely at random
Instances for which the tendency for a record to be missing a value of some field is related to the missing value.
Missing not at random