Combining Data Flashcards

1
Q

What are the Benefits of Combining Data?

A
  1. Complete & comprehensive view of a situation = better understanding & accurate analysis πŸͺŸ
  2. Broader range of data so organisations can make more informed decisions 🏌️
  3. Processes streamlined as data accessed from one place 🏞️
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the risks of Combining Data?

A
  1. DQ = sources have different formats = inconsistencies πŸ’Ύ
  2. Different terms for the same concept leading to inaccuracies πŸ“–
  3. Risk of data breaches ⚠️
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Full Outer Join?

A

includes all rows from both tables, regardless of whether there is a match, populating non-matches with nulls

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why did you use a Full Outer Join? What are the Benefits?

A
  1. I needed to identify rows present in one table but not another (using NULLS)
  2. Ensured no data is lost from both tables
  3. A left join would only keep alfabet apps and see if they were logged in Collibra (matches), but I needed both to raise to stakeholders
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the risks of a Full Outer Join?

A
  1. Creates a large data set (storage/processing issues) - not an issue for my Project as datasets were small
  2. Introducing NULL values can be challenging to process (changed to a cross to help this)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Right Join?

A

displays rows from the right table along with any rows matching from the left table, showing null values when there is no match

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a right join good for?

A

focusing on data from the right table while adding insights from the left table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a left join?

A

all rows from the left table are included with any matching rows from the right table. NULL values used if there is no match found in the right table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a disadvantage of a left join?

A

And other types!
* Impact on performance when scanning for matching rows - time consuming
* Compare apps added to Collibra but not Alfabet to alert stakeholders outside the project
* Therefore Left Join was not suitable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an inner join?

A
  • Combines rows from multiple tables
  • Connecting them through a related column
  • Only including rows with matching values
  • E.g. apps found in both Alfabet and Collibra
  • Rows without matching ID’s would have been excluded
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a benefit of an inner join?

A
  1. Simple βž– straight forward πŸ“
  2. Great for large data sets, filtering out matching rows πŸ”πŸƒπŸƒto improve analysis efficiency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a disadvantage of an inner join?

A
  1. data loss risk since
  2. Did not bring back all rows from both tables, so would not work for my use case
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a join?

A

combines 2 or more datasets based on a common column or key allowing data to be retrieved and analysed together

How well did you know this?
1
Not at all
2
3
4
5
Perfectly