ETL Data Pipelines Flashcards
What is process Intelligence ETL?
The data ingestion component. It automates data extractions and transformations from external source systems and loads it directly to SAP Signavio Process Intelligence
How is this setup on process intelligence?
there is not need to configure a staging environment, if you’re extracting from on-prem it requires an additional setup on the system side
[creation of ETL Data Pipelines] What happens during the Extract phase?
- configure data sources
- configure integrations
- click extract
[creation of ETL Data Pipelines] What happens during the Transform phase?
- configure business objects [sql scripts]
- preview sql scripts
[creation of ETL Data Pipelines] What happens during the Load phase?
- select or create a process
- click run
What does SAP Signavio ETL use to carry out ETL?
they use standard connectors and provides an interface to extract, transform and load data. All interaction stays within the system.
what are the 3 main components of PI ETL
1- data source management
2. integration management
3. data model management
[image 6]
What do the the 3 main components do?
they are the integrated features of ETL, which together setup data pipelines
What is data source management
the framework to manage online data sources. it includes credential mgmt and scheduling
What does data sources establish?
a connection to the source system
What is integration management
the framework to define what, how and when to extract data. it includes pseudonymisation (techniques that replace, remove or transform information that identifies individuals, and keep that information separate) and partitioning schemas
How to extract, when the system is on prem?
On-premises Extractors are needed, it can then be set up under integrations where the specific tables and schedules for continuous loads can be defined
What are the two options for integration?
- Simple method
- intricate method
What is the simple method?
select the tables and fields through a graphical interface.
you can add a partition strategy in case of large tables and define field using a delta criteria
What is the intricate method?
write your own extraction scripts with SQL