ETL Flashcards

1
Q

What is Data?

A

is anything we can quantify whether it be figures, numbers, texts, pictures, values, dates, etc. (Researching & absorbing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Information?

A

Is any meaningful insight we can gather from the data. Any conclusions we can make by reviewing the data (doing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Knowledge?

A

Gathering of Information to understand what is happening in the Data (interacting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Wisdom?

A

Using that knowledge to make future decisions (reflecting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Structured Data?

A

Eazy to analyze and turn into information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Unstructured Data?

A

Difficult or tedious to extract the data into information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Semi-Structured Data?

A

Some structure on how the data is presented but may have tough parts to analyze or difficult to extract

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the Data Quality Traits?

A

Consistent, Relevant, Complete, Accurate, Granular

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe Consistent Data

A

Data doesn’t contradict or conflict with itself

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe Relevant Data

A

Data is up-to-date ( Is this data that we still need )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Complete Data

A

No missing records or fields (when using that data in comparison to other data. The data should have same fields)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe Accurate Data

A

Represents reality ( The collected data should be realistic and not skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Granular Data

A

How specific is our data ( We need to determine how specific our data is. What kind of conclusion can we draw with how specific our data is )

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Enterprise Data?

A

Data (Data that is collected across the aspect of our business)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the different types of Enterprise Data?

A

Transactional Data, Analytical Data, Master Data, Reference Data, Reporting Data, Meta Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is transactional Data?

A

data that is used in on-going operations within the company.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is analytical Data?

A

numerical values or metrics that provide business intelligence and help make business decisions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Master Data?

A

Data that is persistent and define the primary business entities within the company. This could include

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is Reference Data?

A

Subset of master data. Any information that is usually standardized by some other party.

20
Q

What is Reporting Data?

A

Aggregated data compiled for the purpose of analytics

21
Q

What is Meta Data?

A

Data about the data.

22
Q

What is OLTP (Online Transaction Processing) ?

A

A data system that is based off transactions. Is optimized to run as many concurrent transactions as possible. Focuses on very fast write and update capabilities. Highly normalized and relational. For Client Facing Applications

23
Q

What is OLAP (Online Analysis Processing) ?

A

A data system for business facing applications. The databases are optimized for reads/querying. Is a denormalized database. Easier to store

24
Q

What is ETL?

A

EXTRACT TRANSFORM LOAD - A procedure of copying data from one or more sources into a new location called a data target.

25
Q

What is Extract? And the different types

A

taking data from a data source’

Full Extraction - Taking all data from a data source

Incremental Extraction - taking only new data from a data source

26
Q

What is transformation? And the different types

A

Converting/modifying extracted data to match a desired data target.

Cleaning(Making data Consistent), DeDuplication, Format Revision, Key Restructing

Advanced: Validation, filtering, derivation

27
Q

What is load? And the different types

A

Adding the data into the data target, such as an ODS, Staging area, Data Warehouse, or Data Mart.

  • Initial Load - populating all the data warehouse tables
  • Incremental Load - Adding in new incremenetal data periodically as needed.
  • Full Refresh - Removing all data from one or more tables and reloading all fresh data. (kind of truncating and reloading)
28
Q

What is a data source?

A
  • Anywhere that we are gathering our data from ( could be data gathered from structured/unstructured/semistructured)
  • Client facing applications
  • External Data
29
Q

What is a data warehouse?

A

Single central hub for the entire company’s dataset (All of the Enterprise Data). Everything extracted from all data sources.

30
Q

What is a staging area?

A

is a temporary holding location for data that has just been extracted (or potentially through the ETL process)

31
Q

What is an Operational Data Store (ODS)

A

interim area used before a Data Warehouse

Provides a snapshop of the latest data from multiple data sources

32
Q

What is a Data Mart?

A

A custom designed database for business intelligence for a specific department / aspect of your company that you wish to draw conclusions from Dimensional Database

33
Q

What is a Dimensional Cube?

A

A visual representation of the relationship between fact tables and dimensions

34
Q

What is Data Mining?

A

Process of collecting the data, and modeling the data to find relationships / patterns, and making business decisions based on these models

35
Q

What is Data Purging?

A

Process of deleting data from a data warehouse. The frequency of a data purge depend entirely on the business requirements. Data is often kept
for 5-10 years before it is purged

36
Q

What is ETL Testing?

A

The process of validating, verifying, and qualifying data while preventing duplicate records and data loss

37
Q

What is the ETL Testing Lifecycle?

A

Identify business requirements, Validate our data sources, Design Test cases, ETL, Summary Report, Test Closure

38
Q

What are some different types of ETL tests?

A

Source to Target Testing, Data Transformation Testing, Data Integration Testing

39
Q

How would you do a source to target Testing with sql?

A

Can select count(*) from one table and ensure that count equals our target and source

40
Q

How would you do data transformation test with sql?

A

Changing null values to 0 - SELECT ISNULL(myColumn, 0 ) FROM myTable

Transforming with replace - Update , Set (Replace), Where

41
Q

What is Data Migration Testing?

A

Process of verifying that data migrates from source to target with no loss of information/data

42
Q

What is data validation?

A

Method for checking the accuracy and quality of your data

43
Q

How would you do some data validation testing with sql?

A

isnull function, update, replace

44
Q

Some ETL Testing Scenarios :

A

Validation, Null validation, Duplicate Validation, Data Cleaning

45
Q

What is business Reporting?

A

Process of gathering data by utilizing different software and tools to extract relavent information.

Can use google data studios to create this visual representation

46
Q

How would you check for duplicates?

A

SELECT name, COUNT(email)
FROM users
GROUP BY email
HAVING COUNT(email) > 1

You can select by what you to output with the count of what duplicates you’re looking for from the table you want and group by whatever duplicate you’re looking for with having count greater than 1