Lecture Three - Flashcards
Digital Transformation - Definition
Integration of digital technologies in various sectors, transforming traditional business models and processes.
Digital Transformation Key Drivers - World Wide Web/ Internet
Foundation for global connectivity and information dissemination.
Digital Transformation Key Drivers - Cloud Computing
On-demand computing resources offering scalability and flexibility.
Digital Transformation Key Drivers - Smartphones
Ubiquitous mobile devices that facilitate connectivity and application access.
Digital Transformation Key Drivers - Internet of Things
Interconnected devices generating real-time data for analysis and automation.
Digital Transformation Key Drivers - 5G Networks
Advanced mobile communication with high-speed data transfer and low latency.
Digital Transformation Sectoral Impacts - E-commerce, FinTech, E-government:
Revolutionizing business and governance with digital platforms.
Digital Transformation Sectoral Impacts - Industry 4.0/5.0
Advancing manufacturing through automation and data exchange (Cyber-Physical Systems).
Digital Transformation Sectoral Impacts - Circular Economy
Enhancing resource efficiency and sustainability through data-driven asset management.
Digital Transformation Sectoral Impacts - Smart Cities
Integrating infrastructure and services for increased operational efficiency and improved quality of life.
New Emerging Paradigms - Industry 4.0/5.0
Automation and Data Exchange: Leveraging Cyber-Physical Systems to enhance manufacturing processes and productivity.
Impact: Streamlined operations, increased production efficiency, and improved product quality.
New Emerging Paradigms - Circular Economy
Data Utilization: Tracking and managing assets to maximize value and minimize waste through continuous resource upscaling.
Impact: Promotes environmental sustainability by optimizing resource usage and lifecycle management.
New Emerging Paradigms - Smart Cities
Infrastructure Integration: Virtualization and integration of urban services and infrastructure for improved efficiency.
Impact: Enhances urban living through innovative solutions and operational insights.
New Emerging Paradigms - Digital Health
Efficient Healthcare Delivery: Application of digital technologies in medicine for developing new treatments and improving patient care.
Impact: Facilitates aged and assisted living, supports chronic disease management, and enhances patient outcomes.
Big Data - Concept
Refers to large, complex datasets that are challenging to process using traditional data processing methods.
Big Data - Attributes (The 5 V’s)
Velocity: Speed at which data is generated, collected, and processed, often in real-time.
Volume: Massive amounts of data generated from diverse sources, measured in petabytes or exabytes.
Value: Economic and strategic benefits derived from analyzing and utilizing data effectively.
Variety: Diversity of data formats and sources, including structured, semi-structured, and unstructured data.
Veracity: Accuracy, reliability, and trustworthiness of data, which can be affected by factors like data quality and biases.
Big Data - Definition
Large-scale datasets characterized by high complexity and volume, requiring advanced technologies for management and analysis.
Big Data - Key Features
High Throughput Processing: Ability to manage and analyze vast volumes of data efficiently.
Diverse Sources: Data from social media, IoT devices, transaction systems, and more, contributing to a rich but complex data ecosystem.
Big Data - Challenges
Data Management: Storing and organizing data to facilitate easy retrieval and analysis.
Data Quality: Ensuring accuracy and consistency across diverse datasets.
Analytics: Developing methodologies and tools to derive meaningful insights and drive business intelligence.
Data Storage & Processing Over Time - Historical Evolution
3,000 BC: Ancient Egypt’s use of written records for crop storage management, marking the early use of data.
circa 1,450 AD: The printing press revolutionized data dissemination through mass-produced written materials.
1940s: Advent of digital computers with data stored on magnetic tape, requiring sequential reading.
1950s: Introduction of more affordable PCs and the first database systems, enabling broader data management.
1960s: Development of specialized Database Management Systems (DBMS) to enhance data organization.
1970s: Emergence of relational databases offering data independence by separating physical and logical data representations.
1980s: Geographic expansion of businesses led to increased data sources and complexity.
1990s onwards: Big Data emerged as a strategic asset, providing a competitive advantage through advanced analytics.
Data Lake - Definition
A vast repository for raw, unprocessed data without a predefined purpose.
Data Lake - Usage
Ideal for storing diverse data types until specific processing and analysis needs are identified.
Data Warehouse - Definition
A centralized repository organized in a unified data model, designed to aggregate and curate data from multiple sources.
Data Warehouse - Usage
Supports business operations by providing clean, organized data ready for analysis and reporting.
Data Lake and Data Warehouse - Key Distinctions
Data Lakes: Focus on data ingestion and flexibility, allowing for experimentation and discovery.
Data Warehouses: Emphasize data quality, consistency, and integration to support business intelligence activities.
Why Data Warehouse?
Data Redundancy: Consolidates duplicated data across multiple systems and departments, ensuring consistency.
Data Consistency: Provides standardized definitions and formats for uniform data interpretation.
Heterogeneous Data Sources: Integrates data from various sources, including relational DBMS, OLTP systems, and unstructured files.
Data Warehouse - Benefits
Strategic Decision Support: Facilitates comprehensive data analysis to inform business strategy and operations.
Enhanced Data Quality: Ensures data accuracy by addressing issues like missing data and varying formats.
Cross-Functional Analysis: Enables analysis across business functions by providing a unified view of data.
Data Warehouses - Operational Support
Curates data for specific operational systems, such as accounting and billing, ensuring relevant and accurate data is available.
Supports historical analysis by retaining data over time, even as operational systems update or delete records.
Data Warehouses - Data Stability
Data remains stable and non-volatile within the warehouse, providing a consistent historical record.
Allows for longitudinal studies and trend analysis without the risk of data loss from operational changes.
Data Warehouse - Purpose
Supports strategic decision-making and advanced analytics.
Facilitates ad-hoc queries and report generation for business intelligence.
Enables data mining to uncover hidden patterns, correlations, and trends.
Data Warehouses: Data Organisation - Key Characteristics
Subject-Oriented: Focuses on specific business subjects or domains rather than operational processes.
Integrated: Combines heterogeneous data from different sources into a coherent, unified format.
Time-Variant: Maintains historical data, allowing users to analyze changes and trends over time.
Stable: Data is non-volatile, ensuring a consistent and reliable view of business operations.
Decision Support: Structured to facilitate complex queries and analyses for informed decision-making.
Data Warehouses: Architectural Properties - Key Properties
Separation: Distinction between analytical and transactional processing, minimizing interference and maximizing efficiency.
Scalability: Architecture can easily accommodate growth in data volume and complexity through hardware and software upgrades.
Extensibility: Supports the integration of new applications and technologies without extensive system redesign.
Security: Protects sensitive strategic data with robust access controls and monitoring.
Administrability: Designed for efficient management and maintenance, ensuring operational continuity.
Complementary Concepts in Data Warehousing - Data Mart
Definition: A specialized, smaller subset of a data warehouse, focusing on specific business lines or departments.
Purpose: Provides targeted data for specific analyses, improving query performance and relevance.
Complementary Concepts in Data Warehousing - ETL (Extract, Transform, Load)
Extract: Selecting and exporting data from source systems.
Transform: Reformatting data to match the destination system’s requirements, including cleaning and integration.
Load: Importing transformed data into the destination system, such as a data warehouse.
Complementary Concepts: OLTP
OLTP (Online Transaction Processing):
Manages transaction-oriented applications focused on day-to-day operations, such as sales and inventory management.
Characteristics include fast query processing and maintaining data integrity in multi-access environments.
Complementary Concepts: OLAP
OLAP (Online Analytical Processing):
Enables complex analytical queries, supporting multi-dimensional data analysis and business intelligence.
Facilitates decision-making by providing insights into historical performance, trends, and projections.
3-Layer Data Warehouse Architecture - Architecture Layers
Heterogeneous Sources: Diverse data inputs from operational databases and external sources.
ETL Tools: Perform data staging, including redundancy removal, consistency checks, and data normalization.
Reconciled Data: Processed and integrated data stored in a centralized repository for consistency and accessibility.
3-Layer Data Warehouse Architecture - Outputs
Data Marts: Customized subsets of the warehouse for specific business functions.
Analytical Tools: Enable data mining, visualization, and advanced analytics to extract value from stored data.
History of Software Deployment
Evolution Stages:
Monolithic Applications on Physical Machines: Traditional, single-unit applications with limited scalability.
Virtual Machine Abstraction: Enabled resource virtualization, improving flexibility and resource utilization.
Stateless & Horizontally Scalable Apps: Applications designed for distributed systems, enhancing scalability and fault tolerance.
Microservices & Containers: Modern architecture focused on modular, portable, and independent services.
Virtual Machines (VMs):
Each VM includes its own guest OS, application, binaries, and libraries, providing isolation but with high overhead.
Suitable for running multiple OS instances on a single hardware platform, offering strong security and resource management.
Provide isolated environments for multiple applications, each with its own OS, on shared hardware.
Suitable for legacy applications and environments needing strong isolation.