Module 2 Dirty to Clean Data Flashcards

1
Q

Data Mapping

A

The process of matching fields from one data source to another.

Viewing how your data has evolved and transformed between one database and another. Integral to success of data migration and integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compatibility

A

Describes how well two or more datasets are able to work together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Schema

A

A way of describing how something is organised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Primary key

A

References a column in which each value is unique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Foreign Key

A

A field within a table that is a primary key in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CONCATENATE

A

A function that joins together two or more text strings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

.csv files

A

.CSV files are plain text files with an organised table structure that includes rows and columns. The values in each row are separated by commas. This table structure makes them easy to understand, edit, manipulate, and use for data analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

COUNTIF

A

Count the number of cells that make the argument you input true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

SPLIT

A

The split function divides tax around a specific character all string, and puts each fragment of tax into a separate style in the row.

=SPLIT(F2, “-“)..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Your cleaning checklist

A
  • Determine size of the dataset
  • Determine the number of categories or labels
  • Identify Missing Data
  • Identify unformatted data
  • Explore the different data types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data cleaning - Determine size of data set

A

Large datasets may have more data quality issues and take longer to process. This may impact your choice of data cleaning techniques and how much time to allocate to the project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Determine number of categories or labels

A

By understanding the number and nature of categories and labels in a dataset, you can better understand the diversity of the dataset. This understanding also helps inform data merging and migration strategies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Identify missing data

A

Recognizing missing data helps you understand data quality so you can take appropriate steps to remediate the problem. Data integrity is important for accurate and unbiased analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Identify unformatted data

A

Identifying improperly or inconsistently formatted data helps analysts ensure data uniformity. This is essential for accurate analysis and visualization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explore the different data types

A

Understanding the types of data in your dataset (for instance, numerical, categorical, text) helps you select appropriate cleaning methods and apply relevant data analysis techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Clean Data

A

Data that is complete, correct, and relevant to the problem being solved.

17
Q

Conditional Formatting

A

A spreadsheet tool that changes how cells appear when value meet specific conditions.

18
Q

Data Merging

A

The process of matching fields from one data source to another.

19
Q

Data Validation

A

A tool for checking the accuracy and quality of data.

20
Q

Delimiter

A

A character that indicates the beginning or end of a data item.

21
Q

Dirty Data

A

Data that is incomplete, incorrect, or irrelevant to the problem to be solved.

22
Q

Duplicate Data

A

Any record that inadvertently shares data with another record.

23
Q

Field Length

A

A tool for determining how many characters can be keyed into a spreadsheet field.

24
Q

Incomplete Data

A

Data that is missing important fields.

25
Q

Inconsistent data

A

Data that uses different formats to represent the same thing.

26
Q

Incorrect/Inaccurate Data

A

Data that is complete but inaccurate.

27
Q

TRIM

A

A function that removes leading,trailing, and repeated space in data.

28
Q

Unique

A

A value that can’t have a duplicate.

29
Q
A