Technical Architecture Flashcards
Technical Architecture
describes the technologies in the various layers of the architecture
hence, probably better called technology architecture
4 main layers of technical architecture

how has the technical architecture become more complicated
- Data variety, volume and velocity have increased significantly
- increased need to include external data (customers, suppliers, etc.)

Technologies in Data Integration Layer
Sophisticated data integration tool suites are used
- Real-time or near real-time updates
A few years back, tooling started with ETL (Extract, Transform, Load)
- Take data from SORs, transform it, load it into DW
- Initially done daily, overnight, in batch mode

ETL architecture
Uses Data Integration Server in Transform step
- ETL extracts data from source systems
- Transform it into BI schema using Data Integration processes and using a Data Integration Server
- Load it into Data Warehouse
(There are many ETL vendors about; beware simplistic promises from ETL vendors; real life is more complicated.)

ELT architecture (not ETL)
Runs integration services on source or target
- Extract Data from Source Systems
- Load it into Data Warehouse
- Afterloading, transform it into BI schema using Data Integration processes

ELT vs. ETL
Trade-offs in capability, performance vs cost, complexity
ETL
- needs dedicated Data Integration (DI) server
ELT
- Lower Total Cost of Ownership: No need for dedicated DI server
- Uses integration services of databases at source or target
- Less powerful DI capability
- Performance penalties –using CPU of source or target
Data sources layer
- internal - from front office and back office
- external - customers, partners, FB etc.
- structured, unstructured, semi-structured
- volume, velocity, variety

another overview of technical architecture
Legend:
Information Access and Data Integration:
- tools used to query, gather, integrate, cleanse and transform data into information
Data Warehousing
- the “classic” DW + other databases where data has been transformed for analytics or integrated
Note Master Data Management in here

Business Intelligence and Analytics
Online and Mobile reports
- BI applications originally built by the IT Department, who produced reports for the business users
- Eventually these reports were put online and onto mobile.
Dashboards and Ad-hoc Analysis
- giving business people the tools to write and run their own queries
OLAP (Online Analytical Processing)
- tools enabling users to analyze multidimensional data interactively from multiple perspectives
- OLAP consists of three basic analytical operations: consolidation (roll-up), drill-down, and slicing and dicing;
Excel
- integration of spreadsheets to BI
Emerging Tools
- Predictive Analytics, Data Discovery, Data Visualization, in-Memory Analytics, BI appliances, Big Data Analytics

Describe
- BI targets
- data access APIs
- integration services
- integration applications
Targets
BI is no longer targeted at just business people; business processes and applications are increasingly important
Data Access Application Programming Interfaces (APIs)
often used in information access and data integration
Integration Services
for when BI applications may need to integrate and transform data to complete a business analysis
Integration Applications
the domain of the application developer, who may need to deploy one or more of the integration applications above

Technology architecture - Databases
- much unstructured data from sources such as emails, social media, medical records and legal documents
- Internet of Things, with networked devices monitoring, measuring and transmitting data about all sorts of things adds humongous amounts of data
- the choice of data storage systems is much more complicated; factors include type of analytics, data capture, data integration and data storage

Alternative technologies in the data layer
RDBMS still predominates, but there are alternates used in particular parts of the architecture
- OLAP databases
- Massively Parallel Processing (MPP) databases
- Data Virtualization
- In-database analytics
- In-memory analytics (e.g. SAP HANA)
- Cloud-based BI, DW, or data integration
- BI appliances
- NoSQLdatabases
MPP Databases + how we got there
- CPU = Central Processing Unit
- PU = Processing Unit
- Core = the instruction execution components of the CPU
- I/O = input/outpout
Initially a mainframe with one CPU, connected through an I/O subsystem to disks – a uniprocessor
Later added second CPU, sharing the operating system (which was modified to run across more than one CPU) – a multiprocessor.
- The CPUs tended to be identical, so it was called a Symmetrical Multiprocessor.
Nowadays, computers have more than two PUs
In a cluster, there is a shared database and the servers in the clusters work together. They use some sort of heartbeat mechanism to know if the other component has failed. If so, the surviving server may take over the workload of the disappeared server.
Massively Parallel Processing
- each server operates independently
- connected by a network
- software splits processing and coordinates communication across servers

Data Virtualisation
aka. Enterprise Information Integration
Where an application can retrieve and manipulate data without requiring technical details about the data, such as how it is formatted or where it is physically located.
Data remains in place
- unlike the traditional “ETL” process
- real-time access is given to the source system for the data, thus reducing data errors risks and less workload of moving data around that may never be used
Abstraction techniques used
- To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration.
- Unlike a federated database system, it does not attempt to impose a single data model on the data (heterogeneous data).
- The technology also supports the writing of transaction data updates back to the source systems.
In-database analytics
- Database vendors adding BI and analytics
- Do analytics directly on the database
- compute intensive analytical processing moved directly into a DW based on top of an analytical database
- Reduces setup and data retrieval times
- Faster analytics performance
In-memory analytics
- Enabled by 64-bit architectures
- Allow for 16 exabytes of addressable memory
- Hold most or all of database in memory
- Rather than on slower disk
- Balance of cost vsspeed
- SAP HANA is a fully in-memory solution
Cloud-based BI / DW / Data Integration
▪ Cloud vendors provide, manage shared resources
▪ On-demand, fast provisioning and de-provisioning
▪ Apparently unlimited resources available as needed
▪ Flexible pricing; good security
BI Appliances: Data Warehouse Appliances
- Designed for high performance big data analytics
- Delivered as an easy-to-use packaged solution
- Hardware and Software
- integrated set of servers, storage, OS,andDBMS
- Example: IBM Netezza
Why use NoSQLDatabases
- Massive sizes of data
- Ease of programming
- Map-Reduce, Spark etc.
NoSQL (not only SQL) – distributed databases, with “eventual consistency” and a different programming model
NoSQL database categories
Four Categories
Why do we have this relatively new class of databases?
- A NoSQLdata model may make application programming easier
- To handle very large amounts of data; Many NoSQLdatabases are a better fit for Big Data
Most of businesses’ valuable data is stored in Relational Databases

Product Architecture
defines
- the products
- their configurations
- how they implement the technology requirements of the BI architecture
-
Decide what the business wants
- Power users – go deep into analytics
- Managers, executives – need a higher level view to support management decisions
- Operational users – information and analytics to support day-to-day operations
-
Offer a portfolio of analytical styles
- also offer data, technology architectures lead to product selection
- Build the product portfolio iteratively
- Add or change based on changing needs
- Will return to this in a later lecture

How do we define requirements and priorities in BI architecture?
How do we create and implement stuff in BI architecture?
define requirements and priorities top-down
create and implement bottom-up
