All Glossary Terms Flashcards

1
Q

The data wrangling step in which errors in the raw data are corrected.

A

Cleaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Documentation of characteristics of the wrangled data such as names and definitions of the fields, units of measure used in the fields, the source(s) of the raw data, relationship(s) of the wrangled data with other data, and other attributes.

A

Data dictionary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The process of cleaning, transforming, and managing data so it is more reliable and can be more easily accessed and used for analysis.

A

Data wrangling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A tag or marker that separates structured data into various fields

A

Delimiters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The data wrangling step in which the analyst becomes familiar with the data in order to conceptualize how it might be used and potentially discovers issues that will need to be addressed later in the data wrangling process.

A

Discovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A field that that takes a value of 0 or 1 to indicate the absence or presence of some categorical effect.

A

Dummy variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The data wrangling step in which the raw data are augmented by incorporating values from other data sets and/or applying transformations to portions of the existing data to ensure that all data that will be required for the ensuing analyses will be included in the resulting data set.

A

Enriching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A characteristic of the observations in a data set.

A

Field

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A data file in which structured data are arrayed as a rectangle, with each row representing an observation or record, and each column representing a unique variable or field.

A

Flat file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Instances for which there is an appropriate reason for the value of a field to be missing.

A

Illegitimately missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Instances for which there is an appropriate reason for the value of a field to be missing.

A

Legitimately missing data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Systematic replacement of missing values with values that seem reasonable.

A

Imputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Instances for which the tendency for a record to be missing a value of some field is related to the value of some other fields(s) in the record.

A

Missing at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Instances for which the tendency for a record to be missing a value of some field is entirely random.

A

Missing completely at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Instances for which the tendency for a record to be missing a value of some field is related to the missing value.

A

Missing not at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

are data that are stored in a manner that allows mathematical operations to be performed on them. Data of this type generally represent a count or measurement.

A

Numeric data

14
Q

Combining multiple data sets that each have different data for individual records, when each record occurs no more than once in each data set.

A

One-to-one merger

15
Q

Combining multiple data sets that each have different data for individual records, when at least one record occurs more than once in at least one of the data sets

A

One-to-many merger

16
Q

The data wrangling step in which a file containing the wrangled data and documentation of the file’s contents are made available to its intended users in a format they can use.

A

Publishing

17
Q

Data that has not been processed or prepared for analysis.

A

Raw data

18
Q

A grouping of characteristics for a particular observation in a data set.

A

Record

19
Q

Data that does not have the same level of organization as structured data, but that allow for isolation of some elements of the raw data when they are imported.

A

Semi-structured data

20
Q

Data organized so that the values for each variable are stored in a single field.

A

Stacked data

21
Q

Data sets that are arrayed in a predetermined pattern that make them easy to manage and search.

A

Structured data

22
Q

The data wrangling step in which the raw data file is arranged so it can be more readily analyzed in the intended manner.

A

Structuring

23
Q

The extraction of fields and records that will be useful for ensuing analyses from data.

A

Subsetting

24
Q

Data that are words, phrases, sentences, and paragraphs.

A

Text data

25
Q

Data organized so that the values for one variable correspond to separate fields, each of which contains observations corresponding to these respective fields.

A

Unstacked data

26
Q

Databases that are not arranged in a predetermined pattern, and therefore in their raw form cannot be stored in a manner similar to a flat file.

A

Unstructured data

27
Q

The data wrangling step in which the accuracy and reliability of the wrangled data are confirmed so they are ready for the ensuing analyses.

A

Validating