Term Glossary (Topic 1.4-1.6) Flashcards
State the three stages of data profiling
1) Create simple summary statistics (Counts, means, min/max)
2) Check data quality
3) Identify problems for future data integrations (mislabelled columns, inconsistent data formats)
What are the four options when an error is found
1) Accept it - Keep the error (perhaps flag it or offer an explanation)
2) Reject the data entry (remove the entry)
3) Correct the error (if possible, identify and amend the data entry)
4) Create default value (replace the error with a set value to help with data consistency)
What does CUVCAT stand for
C - Completeness U - Uniqueness V - Validity C - Consistency A - Accuracy T - Timeliness
Describe ‘Data Migration’
The physical movement of data from one source to a destination.
Explain ‘Master Data Management (MDM)’
This describes identifying, protecting and properly handling of data that is core to business operations. It is important to identify the specific datasets that are critically sensitive.
State the three elements that should be considered within ‘Integration Design’
1) Rules and Requirements
2) Objectives and Deliverables
3) Support models and SLAs
Describe ‘Rules and Requirements’ with relation to Integration Design
An organisation will likely have a set of rules and requirements that govern how its data is integrated to remain legally compliant, maintain security, retain performance…etc
.
.
Describe ‘Support models and SLAs’ with relation to Integration Design
Database models (recap from Data analysis concepts) should be set up to support easy data integration. Remember that even dashboards linked to multiple tables are examples of data integration. SLAs can define the level of output required from the data integrating system.
State the three elements that should be considered within ‘Data Integration Tools’
1) Future Scalability
2) Implementation
3) Support Costs
Describe ‘Future Scalability’ & state three requirements for future scalability with relation to Data Integration Tools
When setting up databases, it is important to account for the possibility that further tables may need to be added in future. Making sure that:
All tables have primary keys (even tables that do not connect to other tables).
Keep consistent field names (so that columns can me matched with equivalent columns in other tables).
Keep consistent data formats.
Describe ‘Implementation’ with relation to Data Integration Tools
A major issue is combining data that was previously measured/recorded in different ways. This would lead inconsistent data stores. This would have to be solved prior to integration.
Describe ‘Support costs’ with relation to Data Integration Tools
Integrating large amounts of data efficiently will involve significant expense primarily from man-hours needed and new hardware/software required.
Define ‘Data Synchronisation’
This is a form of data integration that aims to keep the records stored in one location consistent with records stored in another location through a continuous updating process. By contrast, data integration is the process of connecting data sets together often in single events. There are number of reasons for synchronising data.
Define ‘Technical Acceptance Testing (TAT)’
Set of tests designed to find whether a piece of software (such as a dashboard) has satisfied its technical requirements. It is often done just before or simultaneously with User Acceptance Testing (UAT). TAT is often done if non-functional requirements of a system change.