DAT Data Scale Flashcards
The Vs of Big Data?
Variety (many types) Volume Velocity Veracity (many sources) Value
Define Big Data
Data which is beyond the capacity of traditional processing technologies.
What are the different layers of Data Structuring?
Structured - Well defined model
Semi-structured - Definition embedded within data, eg XML
Quasi-structured - Erratic structure, eg web click streams
Unstructured - No structure, eg image, text
What are the Challenges of Scale?
Infrastructure Architectural complexities Security Data quality Ethics
How to deal with Scale?
Two options:
Apache/Hadoop - Open source framework. Uses non-specialised computers.
Cloud - On demand computing capacity, eg AWS, Azure.
What is Master Data Management?
Organisations capture data about the same real world items, with slight differences in data structure and value.
MDM is setting one dataset as the Canonical Gold Standard, representing the absolute truth for reference.
What are Controlled Vocabularies?
They provide a reference system for data and terms, promoting consistency.
What are the stages of Data Migration?
Project based, not process. Selection Preparation Extraction Transformation Deposition
What are some types of Data Migration?
Database Migration
Application Migration
Business Process Migration
What are the selection criteria for Data Integration Tools?
Future Scalability
Implementation
Support Costs
What is Data Synchronisation?
Ensuring consistency when an organisation has multiple copies of data.
A data steward oversees the day-to-day management of the dataset and sets validation rules. Data owner approves these rules and sets the framework for managing the dataset.
To think about: ownership, updates, format, security, data quality, performance, maintenance.