Course 4: Module 2 Flashcards

1
Q

Dirty data

A

Data that is incomplete, incorrect, or irrelevant to the problem you’re trying to solve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Clean data

A

Data that is complete, correct, and relevant to the problem you’re trying to solve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Null

A

An indication that a value does not exist in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Duplicate data

A

Any data record that shows up more than once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Outdated data

A

Any data that is old which should be replaced with newer and more accurate information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Incomplete data

A

Any data that is missing important fields

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Incorrect/Inaccurate data

A

Any data that is complete but inaccurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inconsistent data

A

Any data that uses different formats to represent the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Field

A

A single piece of information from a row or column of a spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data validation

A

A tool for checking the accuracy and quality of data before adding or importing it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data merging

A

The process of combining two or more datasets into a single dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Compatibility

A

How well two or more datasets are able to work together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Common mistakes to avoid

A
  • Not checking for spelling errors
  • Forgetting to document errors
  • Not checking for misfielded values
  • Overlooking missing values
  • Only looking at a subset of the data
  • Not fixing the source of the error
  • Not analyzing the system prior to data cleaning
  • Not backing up your data prior to data cleaning
  • Not accounting for data cleaning in your deadlines/process
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional formatting

A

A spreadsheet tool that changes how cells appear when values meet specific conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Remove duplicates

A

A tool that automatically searches for and eliminates duplicate entries from a spreadsheet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Text string

A

A group of characters within a cell, most often composed of letters

17
Q

Split

A

A tool that divides text around a specified character and puts each fragment into a new, separate cell

18
Q

Concatenate

A

A function that joins multiple text strings into a single string

19
Q

COUNT IF

A

Returns the number of cells that match a specified value

20
Q

Syntax

A

A predetermined structure that includes all required information and its proper placement

21
Q

Len

A

A function that tells you the length of a text string by counting the number of characters it contains

22
Q

LEFT

A

A function that gives you a set number of characters from the left side of a text string

23
Q

RIGHT

A

A function that gives you a set number of characters from the right side of a text string

25
Q

MID

A

A function that gives you a segment from the middle of a text string

26
Q

Trim

A

A function that removes leading, trailing, and repeated spaces in data

27
Q

Pivot table

A

A data summarization tool that is used in data processing

28
Q

VLOOKUP

A

A function that searches for a certain value in a column to return a corresponding piece of information

29
Q

Data mapping

A

The process of matching fields from one data source to another

30
Q

Compatibility

A

How well two or more datasets are able to work together

31
Q

Schema

A

A way of describing how something is organized