Technical Architecture Flashcards
Technical Architecture
describes the technologies in the various layers of the architecture
hence, probably better called technology architecture
4 main layers of technical architecture
how has the technical architecture become more complicated
- Data variety, volume and velocity have increased significantly
- increased need to include external data (customers, suppliers, etc.)
Technologies in Data Integration Layer
Sophisticated data integration tool suites are used
- Real-time or near real-time updates
A few years back, tooling started with ETL (Extract, Transform, Load)
- Take data from SORs, transform it, load it into DW
- Initially done daily, overnight, in batch mode
ETL architecture
Uses Data Integration Server in Transform step
- ETL extracts data from source systems
- Transform it into BI schema using Data Integration processes and using a Data Integration Server
- Load it into Data Warehouse
(There are many ETL vendors about; beware simplistic promises from ETL vendors; real life is more complicated.)
ELT architecture (not ETL)
Runs integration services on source or target
- Extract Data from Source Systems
- Load it into Data Warehouse
- Afterloading, transform it into BI schema using Data Integration processes
ELT vs. ETL
Trade-offs in capability, performance vs cost, complexity
ETL
- needs dedicated Data Integration (DI) server
ELT
- Lower Total Cost of Ownership: No need for dedicated DI server
- Uses integration services of databases at source or target
- Less powerful DI capability
- Performance penalties –using CPU of source or target
Data sources layer
- internal - from front office and back office
- external - customers, partners, FB etc.
- structured, unstructured, semi-structured
- volume, velocity, variety
another overview of technical architecture
Legend:
Information Access and Data Integration:
- tools used to query, gather, integrate, cleanse and transform data into information
Data Warehousing
- the “classic” DW + other databases where data has been transformed for analytics or integrated
Note Master Data Management in here
Business Intelligence and Analytics
Online and Mobile reports
- BI applications originally built by the IT Department, who produced reports for the business users
- Eventually these reports were put online and onto mobile.
Dashboards and Ad-hoc Analysis
- giving business people the tools to write and run their own queries
OLAP (Online Analytical Processing)
- tools enabling users to analyze multidimensional data interactively from multiple perspectives
- OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing;
Excel
- integration of spreadsheets to BI
Emerging Tools
- Predictive Analytics, Data Discovery, Data Visualization, in-Memory Analytics, BI appliances, Big Data Analytics
Describe
- BI targets
- data access APIs
- integration services
- integration applications
Targets
BI is no longer targeted at just business people; business processes and applications are increasingly important
Data Access Application Programming Interfaces (APIs)
often used in information access and data integration
Integration Services
for when BI applications may need to integrate and transform data to complete a business analysis
Integration Applications
the domain of the application developer, who may need to deploy one or more of the integration applications above
Technology architecture - Databases
- much unstructured data from sources such as emails, social media, medical records and legal documents
- Internet of Things, with networked devices monitoring, measuring and transmitting data about all sorts of things adds humongous amounts of data
- the choice of data storage systems is much more complicated; factors include type of analytics, data capture, data integration and data storage
Alternative technologies in the data layer
RDBMS still predominates, but there are alternates used in particular parts of the architecture
- OLAP databases
- Massively Parallel Processing (MPP) databases
- Data Virtualization
- In-database analytics
- In-memory analytics (e.g. SAP HANA)
- Cloud-based BI, DW, or data integration
- BI appliances
- NoSQLdatabases
MPP Databases + how we got there
- CPU = Central Processing Unit
- PU = Processing Unit
- Core = the instruction execution components of the CPU
- I/O = input/outpout
Initially a mainframe with one CPU, connected through an I/O subsystem to disks – a uniprocessor
Later added second CPU, sharing the operating system (which was modified to run across more than one CPU) – a multiprocessor.
- The CPUs tended to be identical, so it was called a Symmetrical Multiprocessor.
Nowadays, computers have more than two PUs
In a cluster, there is a shared database and the servers in the clusters work together. They use some sort of heartbeat mechanism to know if the other component has failed. If so, the surviving server may take over the workload of the disappeared server.
Massively Parallel Processing
- each server operates independently
- connected by a network
- software splits processing and coordinates communication across servers
Data Virtualisation
aka. Enterprise Information Integration
Where an application can retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.
Data remains in place
- unlike the traditional “ETL” process
- real-time access is given to the source system for the data, thus reducing data errors risks and less workload of moving data around that may never be used
Abstraction techniques used
- To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration.
- Unlike a federated database system, it does not attempt to impose a single data model on the data (heterogeneous data).
- The technology also supports the writing of transaction data updates back to the source systems.