Combining Data Flashcards
What are the Benefits of Combining Data?
- Complete & comprehensive view of a situation = better understanding & accurate analysis πͺ
- Broader range of data so organisations can make more informed decisions ποΈ
- Processes streamlined as data accessed from one place ποΈ
What are the risks of Combining Data?
- DQ = sources have different formats = inconsistencies πΎ
- Different terms for the same concept leading to inaccuracies π
- Risk of data breaches β οΈ
What is a Full Outer Join?
includes all rows from both tables, regardless of whether there is a match, populating non-matches with nulls
Why did you use a Full Outer Join? What are the Benefits?
- I needed to identify rows present in one table but not another (using NULLS)
- Ensured no data is lost from both tables
- A left join would only keep alfabet apps and see if they were logged in Collibra (matches), but I needed both to raise to stakeholders
What are the risks of a Full Outer Join?
- Creates a large data set (storage/processing issues) - not an issue for my Project as datasets were small
- Introducing NULL values can be challenging to process (changed to a cross to help this)
What is a Right Join?
displays rows from the right table along with any rows matching from the left table, showing null values when there is no match
What is a right join good for?
focusing on data from the right table while adding insights from the left table
What is a left join?
all rows from the left table are included with any matching rows from the right table. NULL values used if there is no match found in the right table
What is a disadvantage of a left join?
And other types!
* Impact on performance when scanning for matching rows - time consuming
* Compare apps added to Collibra but not Alfabet to alert stakeholders outside the project
* Therefore Left Join was not suitable
What is an inner join?
- Combines rows from multiple tables
- Connecting them through a related column
- Only including rows with matching values
- E.g. apps found in both Alfabet and Collibra
- Rows without matching IDβs would have been excluded
What is a benefit of an inner join?
- Simple β straight forward π
- Great for large data sets, filtering out matching rows πππto improve analysis efficiency
What is a disadvantage of an inner join?
- data loss risk since
- Did not bring back all rows from both tables, so would not work for my use case
What is a join?
combines 2 or more datasets based on a common column or key allowing data to be retrieved and analysed together