Introduction to Standalone Connector Flashcards
What is standalone connector?
It handles the communication between the source system and SAP Signavio Process Intelligence
When can standalone connector be used
if it’s not covered by one of the standard connectors in Process Intelligence (or any other third party systems)
What does standalone connector do?
extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed
How do the ETL scripts run?
externally (outside of Process Intelligence) but uses the API to push the data to a process within the system.
What 4 components does the standalone connector use?
– A collection of extraction and transformation SQL scripts
– A configuration file in YAML format
–A SQLite database to ensure the correct data is loaded each time in case of regular loads
– A java application for triggering the actual extraction, transformation and loading
Where does the connector pull and store data?
Uses a SAP Technical user to pull data from the sources system and store in an S3 bucket
How does the connectors download the files?
From the transformed S3 buckets, it uses Athena to generate an eventlog file and download this file
What does the connector do to the event log files?
uploads the event log file to the Process Intelligence API
How to get an automated ETL to work?
setup the virtual machine (to set up the environment)
[setting up virtual machine] 1. what are specifications for the host
10gb ram
single core CPU
10gb free disk space
network connectivity to both signavio and source system
[size may vary depending on use case]
[setting up virtual machine] 2. what do you install?
JDK 8 or higher
-oracle
-oracle JDK
[setting up virtual machine] 3 - If the connector is running a windows environment what do you need to install as well
Microsoft visual C++ 2010 Resdistrubtable Package
[setting up virtual machine] 4. what do you download last
download and unpack the signavio connector provided by SAP Signavio
Why may you have a dedicated staging environment?
In most cases it may be faster and better-suited for process mining. It also enables you to use multiple source systems.
when staging and environment setup, what else will you need in the case of AWS?
an account is required with both S3 for data storage and Athena for running the transformation scripts
What happens once the environment setup is finished
the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP
What does the config.yaml file contain?
defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations
[connector configuration - config.yaml] 1. you will require apiToken and url - where do you get these
- PI > under process settings > API > New Token
- same settings as API token under Upload Endpoint
[connector configuration - config.yaml] 2. in the staging area what inputs do you need? (can be provided by your admin)
- access key & secret key : access information provided by the account creator
- athena Workgroup : name of athena workgroup if multiple exist
- region : region your AWS instance runs in
- bucket: name of the s3 bucket data will be stored in
- athenaDatabase : name of your database in Athena
[connector configuration - config.yaml] 3. what are typical input information required to establish a connection?
- source - name
- user - user in the system with neccessary rights
- password - for the user
- logfile - name of the extraction log file
- verbosity - amount of detail stored in the log file
Whats next after establishing connection?
extractor configuration aka. define the extraction and necessary data by looking at the parameters for delta load under tableSyncConfigurations
[extractor configuration] 1. what are general parameters for each table that should be extracted
- name - name for the table in staging area
- data source - specify it incase of muliple source systems
- sqlFilePath - name of sql file for extraction
- keyColumn - the column of the table used to merge multiple rows in staging
- mostRecentRowColumn - used to identify the newest row with most recent information
[extractor configuration] what are additional parameters for?
for scheduled data loads in case new data needs to be extracted on a regular basis
[extractor configuration] 2. what are some additional parameters?
- name - table column that distingusishes different delta load eg. creation or change date
- initial - initial date or ID to start extraction
- date case - format eg. YYYY-MM-dd and type eg. date
- ID case - idformat i.e format of the column and type e.g id