WEEK 4: Verifying and reporting results Flashcards
Verification
Is a process to confirm that a data cleaning effort was well- executed and the resulting data is accurate and reliable.
Making sure your data is properly verified is so important because it allows you to double-check that the work you did to clean up your data was thorough and accurate.
Verification lets you catch mistakes before you begin analysis. Without it, any insights you gain from analysis can’t be trusted for decision-making.
Reports
Reports are a super effective way to show your team that you’re being 100 percent transparent about your data cleaning.
Reporting is also a great opportunity to show stakeholders that you’re accountable, build trust with your team, and make sure you’re all on the same page of important project details.
Changelog
Is a document used to record the notable changes made to a project over its lifetime across all of its tasks.
Capturing cleaning changes
This involves documentation which is the process of tracking changes, additions, deletions and errors involved in your data cleaning effort.
Having a record of how a data set evolved does three very important things. First, it lets us recover data-cleaning errors.
It’s also a good idea to create a clean table rather than overriding your existing table.
Second, documentation gives you a way to inform other users of changes you’ve made.
Third, documentation helps you to determine the quality of the data to be used in analysis.
The way you create and view a changelog with SQL depends on the software program you’re using.
Essentially, all you have to do is specify exactly what you did and why when you commit a query to the repository as a new and improved query. This allows the company to revert back to a previous version if something you’ve done crashes the system.
Another option is to just add comments as you go while you’re cleaning data in SQL.
Difference between sheets version hisroty feature and changelogs
Version histories record what was done in a data change for a project, but don’t tell us why. Changelogs are super useful for helping us understand the reasons changes have been made.
A changelog records these type of information
Data, file, formula, query, or any other component that changed
Description of what changed
Date of the change
Person who made the change
Person who approved the change
Version number
Reason for the change
A good changelog
Changelogs are for humans, not machines, so write legibly.
Every version should have its own entry.
Each change should have its own line.
Group the same types of changes. For example, Fixed should be grouped separately from Added.
Versions should be ordered chronologically starting with the latest.
The release date of each version should be noted.
Feedback and cleaning
Clean data is important to the task at hand. But the data-cleaning process itself can reveal insights that are helpful to a business. The feedback we get when we report on our cleaning can transform data collection processes, and ultimately business development.
With consistent documentation and reporting, we can uncover error patterns in data collection and entry procedures and use the feedback we get to make sure common errors aren’t repeated.
In more extreme cases, the feedback we get can even send us back to the drawing board to rethink expectations and possibly update quality control procedures..sometimes it’s useful to schedule a meeting with a data engineer or data owner to make sure the data is brought in properly and doesn’t require constant cleaning.