ETL Data Pipelines Flashcards
What is process Intelligence ETL?
The data ingestion component. It automates data extractions and transformations from external source systems and loads it directly to SAP Signavio Process Intelligence
How is this setup on process intelligence?
there is not need to configure a staging environment, if you’re extracting from on-prem it requires an additional setup on the system side
[creation of ETL Data Pipelines] What happens during the Extract phase?
- configure data sources
- configure integrations
- click extract
[creation of ETL Data Pipelines] What happens during the Transform phase?
- configure business objects [sql scripts]
- preview sql scripts
[creation of ETL Data Pipelines] What happens during the Load phase?
- select or create a process
- click run
What does SAP Signavio ETL use to carry out ETL?
they use standard connectors and provides an interface to extract, transform and load data. All interaction stays within the system.
what are the 3 main components of PI ETL
1- data source management
2. integration management
3. data model management
[image 6]
What do the the 3 main components do?
they are the integrated features of ETL, which together setup data pipelines
What is data source management
the framework to manage online data sources. it includes credential mgmt and scheduling
What does data sources establish?
a connection to the source system
What is integration management
the framework to define what, how and when to extract data. it includes pseudonymisation (techniques that replace, remove or transform information that identifies individuals, and keep that information separate) and partitioning schemas
How to extract, when the system is on prem?
On-premises Extractors are needed, it can then be set up under integrations where the specific tables and schedules for continuous loads can be defined
What are the two options for integration?
- Simple method
- intricate method
What is the simple method?
select the tables and fields through a graphical interface.
you can add a partition strategy in case of large tables and define field using a delta criteria
What is the intricate method?
write your own extraction scripts with SQL
When integrating, what is the default extraction time and why?
midnight (although continuous loads can be customised)
because data extraction should be done when the load on the source system has the least impact
What is data model management
the framework to transform your data into an event-log starting from connected integrations and extractions.
This is also when you connect your data to an investigation to start the process analysis.
It includes process orientated data modeling, SQL editors for the transformation and live previews of transformed data.
What do you do in a data model?
define how the ETL data pipeline extracts and transforms data, and where to load the data.
5 steps to creating and using data model
- create a new data model
- select source system
- select the data model template
- select integration
- select the configured data source
A data model in SAP Signavio ETL has different sections - what is shown in the extraction section?
- the connector - aka how we accessed the data on the source system eg. sap ecc
- integration - what data we extracted from a source system - we can add new data or use a pre selected model template
A data model in SAP Signavio ETL has different sections - what is shown in the transformation section?
[it looks like a BPMN model]
1. process events - representing main events
2. business objects - transformation rules for the case attributes and events
What is an event collector?
the scripts for an event
How can you adjust transformation rules
using SQL