Data to create and preserve value for organisations Flashcards
Data Engineering
Practice of designing and building systems for collecting, storing and analysing large sets of data.
Extraction, Transformation & Loading (ETL)
Three stages in blending data from multiple sources into a destination system, for example a data warehouse.
Extraction
Process of harvesting data from source databases. Prior to extraction the data needs to be analysed to understand its content, format and structure - Data Profiling.
Transformation
Takes extracted data and changes it into a format suitable for destination database and ultimate intended use. Done using code and rules designed to interrogate the source data before converting it to a new format as per the code instructions.
Loading
When the newly cleaned and prepared data is uploaded into the destination database ready for use.
Data Warehouse
Store for data that has been loaded into the ETL process. Data will be held in a systematic and logical way ready for further interrogation and analysis by the business intelligence function.
Business Intelligence (BI)
Technology driven process of analysing business data to create insightful and actionable information to help improve the operations or products of a business.
Data Mining
Important component of BI. The process of uncovering patterns and other valuable information from large sets of data in the data warehouse.
Challenges of ETL
Rate of growth
Types and Sources
New Technologies
Data Model
Considers data of an organisation in a systematic way. Allows it to be stored and retrieved in an efficient and effective manner.
Advantages of Data Modelling
Foundation for handling data
Enforces business rules and helps achieve compliance
Consistency
Quality of data is enhanced
Three levels of a data modelling process
Conceptual - Business oriented and practical, considering the business data and its requirements.
Logical - Begins to develop a technical map of rules and data structures, defining how data will be held and used.
Physical - Considers how defined system requirements will be implemented using a specific database management system (DBMS).
Data Manipulation
Process of changing data to make it easier to read.
Data Analysis
Process of examining, transforming and arranging a given data set in specific ways in order to study its individual parts and extract useful information.
Data Strategy
A coherent approach for organising, governing, analysing and deploying an organisation’s information assets.