Process Data from Dirty to Clean (Terms) Flashcards

1
Q

A range of values that conveys how likely a statistical estimate reflects the population

A

Confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A character that indicates the beginning or end of a data item

A

Delimiter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

A data value that cannot be left blank or empty

A

Mandatory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A file containing a chronologically ordered list of modifications made to a project

A

Changelog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

A function that removes leading, trailing, and repeated spaces in data

A

TRIM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A function that returns a segment from the middle of a text string

A

MID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A function that returns a set number of characters from the left side of a text string

A

LEFT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A function that returns a set number of characters from the right side of a text string

A

RIGHT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

A function that returns the length of a text string by counting the number of characters it contains

A

LEN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A group of characters within a cell, most often composed of letters

A

Text string

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

A keyword that is added to a SQL SELECT statement to retrieve only non-duplicate entries

A

DISTINCT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A number that contains a decimal

A

Float

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A process that ensures certain conditions for multiple data fields are satisfied

A

Cross-field validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A process to confirm that a data-cleaning effort was well executed and the resulting data is accurate and reliable

A

Verification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A process to determine if a survey or experiment has meaningful results

A

Hypothesis testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A professional who develops processes and procedures to effectively store and organize data

A

Data warehousing specialist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

A professional who transforms data into a useful format for analysis and gives it a reliable infrastructure

A

Data engineer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

A rule that says the values in a table must match a prescribed pattern

A

Regular expression (RegEx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

A spreadsheet function that calculates the number of days, months, or years between two dates

A

DATEDIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

A spreadsheet function that counts the total number of values within a specified range

A

COUNTA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

A spreadsheet function that divides text around a specified character and puts each fragment into a new, separate cell

A

Split

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A spreadsheet function that joins together two or more text strings

A

CONCATENATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

A spreadsheet function that returns the number of cells in a range that match a specified value

A

COUNTIF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

A spreadsheet function that vertically searches for a certain value in a column to return a corresponding piece of information

A

VLOOKUP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A spreadsheet tool that automatically searches for and eliminates duplicate entries from a spreadsheet

A

Remove duplicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

A spreadsheet tool that changes how cells appear when values meet specific conditions

A

Conditional formatting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A SQL function that adds strings together to create new text strings that can be used as unique keys

A

CONCAT

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A SQL function that converts data from one datatype to another

A

CAST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

A SQL function that extracts a substring from a string variable

A

SUBSTR

30
Q

A SQL function that returns non-null values in a list

A

COALESCE

31
Q

A SQL statement that returns records that meet conditions by including an if/then statement in a query

A

CASE

32
Q

A subset of a text string

A

Substring

33
Q

A tool for checking the accuracy and quality of data

A

Data validation

34
Q

A tool for determining how many characters can be keyed into a spreadsheet field

A

Field length

35
Q

A tool that finds a specified search term and replaces it with something else

A

Find and replace

36
Q

A value that can’t have a duplicate

A

Unique

37
Q

A way of selecting a sample from a population so that every possible type of the sample has an equal chance of being chosen

A

Random sampling

38
Q

An agreement that unites two organizations into a single new one

A

Merger

39
Q

An indication that a value does not exist in a dataset

A

Null

40
Q

Any data that has been superseded by newer and more accurate information

A

Outdated data

41
Q

Any record that inadvertently shares data with another record

A

Duplicate data

42
Q

Converting data from one type to another

A

Typecasting

43
Q

Data that is complete but inaccurate

A

Incorrect/inaccurate data

44
Q

Data that uses different formats to represent the same thing

A

Inconsistent data

45
Q

Data that is complete, correct, and relevant to the problem being solved

A

Clean data

46
Q

Data that is incomplete, incorrect, or irrelevant to the problem to be solved

A

Dirty data

47
Q

Data that is missing important fields

A

Incomplete data

48
Q

How well two or more datasets are able to work together

A

Compatibility

49
Q

Nontechnical traits and behaviors that relate to how people work

A

Soft skills

50
Q

Numerical values that fall between predefined maximum and minimum values

A

Data range

51
Q

Skills and qualities that can transfer from one job or industry to another

A

Transferable skills

52
Q

The accuracy, completeness, consistency, and trustworthiness of data throughout its life cycle

A

Data integrity

53
Q

The average number of people who typically complete a survey

A

Estimated response rate

54
Q

The criteria that determine whether a piece of a data is clean and valid

A

Data constraints

55
Q

The degree to which data conforms to constraints when it is input, collected, or created

A

Validity

56
Q

The degree to which data conforms to the actual entity being measured or described

A

Accuracy

57
Q

The degree to which data contains all desired components or measures

A

Completeness

58
Q

The degree to which data is repeatable from different points of entry or collection

A

Consistency

59
Q

The maximum amount that sample results are expected to differ from those of the actual population

A

Margin of error

60
Q

The number of characters in a text string

A

Length

61
Q

The predetermined structure of a language that includes all required words, symbols, and punctuation, as well as their proper placement

A

Syntax

62
Q

The probability that a sample size accurately reflects the greater population

A

Confidence level

63
Q

The probability that a test of significance will recognize an effect that is present

A

Statistical power

64
Q

The probability that sample results are not due to random chance

A

Statistical significance

65
Q

The process of changing data to make it more organized and easier to read

A

Data manipulation

66
Q

The process of combining two or more datasets into a single dataset

A

Data merging

67
Q

The process of copying data from a storage device to computer memory or from one computer to another

A

Data transfer

68
Q

The process of matching fields from one data source to another

A

Data mapping

69
Q

The process of storing data in multiple locations

A

Data replication

70
Q

The process of testing two variations of the same web page to determine which page is more successful at attracting user traffic and generating revenue

A

A/B testing