All Glossary Terms Flashcards by Jack Jordahl

The data wrangling step in which errors in the raw data are corrected.

Cleaning

How well did you know this?

Not at all

Perfectly

Documentation of characteristics of the wrangled data such as names and definitions of the fields, units of measure used in the fields, the source(s) of the raw data, relationship(s) of the wrangled data with other data, and other attributes.

Data dictionary

How well did you know this?

Not at all

Perfectly

The process of cleaning, transforming, and managing data so it is more reliable and can be more easily accessed and used for analysis.

Data wrangling

How well did you know this?

Not at all

Perfectly

A tag or marker that separates structured data into various fields

Delimiters

How well did you know this?

Not at all

Perfectly

The data wrangling step in which the analyst becomes familiar with the data in order to conceptualize how it might be used and potentially discovers issues that will need to be addressed later in the data wrangling process.

Discovery

How well did you know this?

Not at all

Perfectly

A field that that takes a value of 0 or 1 to indicate the absence or presence of some categorical effect.

Dummy variable

How well did you know this?

Not at all

Perfectly

The data wrangling step in which the raw data are augmented by incorporating values from other data sets and/or applying transformations to portions of the existing data to ensure that all data that will be required for the ensuing analyses will be included in the resulting data set.

Enriching

How well did you know this?

Not at all

Perfectly

A characteristic of the observations in a data set.

Field

How well did you know this?

Not at all

Perfectly

A data file in which structured data are arrayed as a rectangle, with each row representing an observation or record, and each column representing a unique variable or field.

Flat file

How well did you know this?

Not at all

Perfectly

Instances for which there is an appropriate reason for the value of a field to be missing.

Illegitimately missing data

How well did you know this?

Not at all

Perfectly

Instances for which there is an appropriate reason for the value of a field to be missing.

Legitimately missing data

How well did you know this?

Not at all

Perfectly

Systematic replacement of missing values with values that seem reasonable.

Imputation

How well did you know this?

Not at all

Perfectly

Instances for which the tendency for a record to be missing a value of some field is related to the value of some other fields(s) in the record.

Missing at random

How well did you know this?

Not at all

Perfectly

Instances for which the tendency for a record to be missing a value of some field is entirely random.

Missing completely at random

How well did you know this?

Not at all

Perfectly

Instances for which the tendency for a record to be missing a value of some field is related to the missing value.

Missing not at random

How well did you know this?

Not at all

Perfectly

are data that are stored in a manner that allows mathematical operations to be performed on them. Data of this type generally represent a count or measurement.

Study These Flashcards

Numeric data

Combining multiple data sets that each have different data for individual records, when each record occurs no more than once in each data set.

Study These Flashcards

One-to-one merger

Combining multiple data sets that each have different data for individual records, when at least one record occurs more than once in at least one of the data sets

Study These Flashcards

One-to-many merger

The data wrangling step in which a file containing the wrangled data and documentation of the file’s contents are made available to its intended users in a format they can use.

Study These Flashcards

Publishing

Data that has not been processed or prepared for analysis.

Study These Flashcards

Raw data

A grouping of characteristics for a particular observation in a data set.

Study These Flashcards

Record

Data that does not have the same level of organization as structured data, but that allow for isolation of some elements of the raw data when they are imported.

Study These Flashcards

Semi-structured data

Data organized so that the values for each variable are stored in a single field.

Study These Flashcards

Stacked data

Data sets that are arrayed in a predetermined pattern that make them easy to manage and search.

Study These Flashcards

Structured data

The data wrangling step in which the raw data file is arranged so it can be more readily analyzed in the intended manner.

Structuring

The extraction of fields and records that will be useful for ensuing analyses from data.

Subsetting

Data that are words, phrases, sentences, and paragraphs.

Text data

Data organized so that the values for one variable correspond to separate fields, each of which contains observations corresponding to these respective fields.

Unstacked data

Databases that are not arranged in a predetermined pattern, and therefore in their raw form cannot be stored in a manner similar to a flat file.

Unstructured data

The data wrangling step in which the accuracy and reliability of the wrangled data are confirmed so they are ready for the ensuing analyses.

Validating

All Glossary Terms Flashcards

(30 cards)