Integration and Interoperability Flashcards
Define integration and interoperability
Managing the movement and consolidation of data within and between applications and organisations
Deliverables of integration and interoperability
- DII architecture
- Data exchange Specifications
- Data Access Agreements
- Data Services
- Complex Event Processing (Thresholds and Alerts)
Metrics of integration and interoperability
- volumes
- latency
- value delivered
- costs and times
Integration
movement and consolidation of data into consistent forms
interoperability
providing the mechanisms for multiple systems to process data
goals of integration and interoperability
- consolidate the data and make available
- lower the cost and complexity of managing solutions
- identify meaningful events and support business analytics
four common use case of integration and interoperability
- INTEGRATION of data between data stores
- CONSOLIDATION of data stores, including application consolidation, data hub management, mergers and aquisitions.
- DISTRUBUTING data across data store and data centres.
- Moving data into ARCHIVES, and updating data from one archive technology to another to enable future use
RAID
redundant array of inexpensive discs
Styles of integration and interoperability approaches
point to point
hub distrubution
message synchronisation
Bus distrubution
ETL/ELT/CDC
Abstraction / Virtual Consolidation (API)
point to point integration approaches
device linked to device with communications each way between
Point to point advantages
- fast
- good when small number of devices
point to point disadvantages
- lots of detailed code (takes a long time to code, support may be difficult when people leave)
- run time issues with many systems
hub distrubution interoperability approach
changes are submitted to a central hub and routes it to the people who are authorised to recieve it
Messaging Bus interoperability approach
application makes a change and then a bus pushes the data into a central service which then pushes it to systems that are authorised to recieve it
Service Oriented Architechture (SOA)
based on bus distrubition
Hub compared to bus model
Bus is more scalable
Concerns of hub and bus model
single point of failure
Integration is…
database to database
ETL acronym
extract transform load
CDC acronym
change data capture (drip)
ETL, ELT
Batch distrubution for the mass movement of data collected over time from source data structure
CDC
event driven distrubution
ETL vs ELT
ETL - higher quality longer time
ELT - lower quality, quicker time (good for data scientists etc, bad because it might get forgotten about)
Message Synchronisation and propagation integration approach
put code in for application to application integration
event driven
Tight coupling
both applications know of each other (direct communication)
Loose coupling
applications remain anonymous, communicate via and API (indirect)
Integration vs interoperability
integration is data to data
interoperability is application to application
EII
Enterprise Information Integration (data to appplication, virtualisarion)
Virtualisation approach
transforms on the fly to present the data to the consuming application as though it were native to its own application
3 major components of virtualisation
- access layer
- transformation layer
- virtualisation layer
Canonical data model
data model that aims to present data entities and relationships in the simplest possible form to integrate processes across various systems and databases.
Mapping requirement and rules for moving data from source to target enables
transformation
When integrating two data stores using batch or real-time synchronous approaches the result is
latency
If two data stores are able to be inconsistent during normal operations then the integration approach is
asynchonous
A content distrubution network supporting a multi national website is like to use
a replication solution
Functions of the enterprise service bus (ESB)
Support near real time data integration
Act as an intermediary passing messages between systems
Continuously “polling” applications connected, looking for new data they’re subscribed to
Allow data integration solutions to execute more frequently than batch processing otherwise allows
Why combine data?
consolidate data from multiple sources to make it easier to understand and analyse
ETL
extract transform load
Change data capture
filters out only the data that changed, saving resources
point to point model
efficiently directly connect to exchange data.
get complicated when multiple systems are involved.
hub and spoke model
centralises data in the hub, and systems can access from here. reduces the number of interfaces.
publish and subscribe model
systems that push out data and others subscribing to receive it (consistent delivery)
Enterprise service bus
act as an intermediary, connecting systems and passing messages enabling loosely coupled and flexible data sharing
Service oriented architechtures
uses well-defined service calls between applications promoting application independence and the ability to replace systems with minimal disruption
ESB vs ETL
ETL is bulk data
ESB is smaller real time communications (application integration)
Managing the availability of data throughout the data lifecycle is…
not a goal of data integration and interoperability
In an ETL process, Lookups and Mappings are part of which step?
Transform
Activities part of the planning and analysing stage if data integration processing
- Define data integration and lifecycle requirements
- Perform data discovery
- perform data profiling
- documenting data lineages