Data Platforms Flashcards
What is Data-Driven Innovation?
Use of data and analytics to foster new products
What is Analytics?
A catch-all term for different business intelligence (BI) and application-related initiatives
What is Advanced Analytics?
(Semi-)Autonomous examination of data to discover deeper insights
What is Augmented Analytics?
Use of technologies such as machine learning and AI to assist with data preparation
What is a Data Platform?
A centralized infrastructure facilitating ingestion
What is the main challenge with raw data in a Data Platform?
Raw data is difficult to obtain
What is a Database?
A structured and persistent collection of information about some aspect of the real world
What is a Data Warehouse (DWH)?
A collection of data that supports decision-making processes
What is OLTP?
Online Transaction Processing
What is OLAP?
Online Analytical Processing
What is the difference between OLTP and OLAP?
OLTP involves constant transactions and short-term data retention
What is a Schemaless Database?
A type of database with no predefined schema
What is a Data Lake?
A central repository for storage
What are the differences between Data Warehouses and Data Lakes?
Data Warehouses are schema-on-write and curated
What is a Data Lakehouse?
A data management architecture combining the flexibility of data lakes with the management and ACID transactions of data warehouses
What is Data Provenance?
The description of the origins of data and the process by which it arrived at the database
What is the role of a Data Steward?
Ensures that data governance processes are followed
What is Data Versioning?
Managing changes to data collections with revision/version numbers
What is Data Compression?
The process of encoding information using fewer bits. Lossless compression removes redundancy without losing data; lossy compression removes less important information.
What is Data Profiling?
Methods to analyze data sets to derive metadata such as data types
What is Entity Resolution?
Finding records that refer to the same entity across different data sources to ensure consistency and avoid duplication.
What is a Data Catalog?
An organized inventory of data at a metadata level
What is Data Fabric?
A design concept that connects different clouds (private
What is Data Mesh?
A distributed data architecture with domain-oriented data ownership
What is the difference between Data Mesh and Data Fabric?
Data Mesh focuses on decentralization and organizational change