Introduction to Standalone Connector Flashcards by Farihah Kamal

What is standalone connector?

It handles the communication between the source system and SAP Signavio Process Intelligence

How well did you know this?

Not at all

Perfectly

When can standalone connector be used

if it’s not covered by one of the standard connectors in Process Intelligence (or any other third party systems)

How well did you know this?

Not at all

Perfectly

What does standalone connector do?

extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed

How well did you know this?

Not at all

Perfectly

How do the ETL scripts run?

externally (outside of Process Intelligence) but uses the API to push the data to a process within the system.

How well did you know this?

Not at all

Perfectly

What 4 components does the standalone connector use?

– A collection of extraction and transformation SQL scripts
– A configuration file in YAML format
–A SQLite database to ensure the correct data is loaded each time in case of regular loads
– A java application for triggering the actual extraction, transformation and loading

How well did you know this?

Not at all

Perfectly

Where does the connector pull and store data?

Uses a SAP Technical user to pull data from the sources system and store in an S3 bucket

How well did you know this?

Not at all

Perfectly

How does the connectors download the files?

From the transformed S3 buckets, it uses Athena to generate an eventlog file and download this file

How well did you know this?

Not at all

Perfectly

What does the connector do to the event log files?

uploads the event log file to the Process Intelligence API

How well did you know this?

Not at all

Perfectly

How to get an automated ETL to work?

setup the virtual machine (to set up the environment)

How well did you know this?

Not at all

Perfectly

[setting up virtual machine] 1. what are specifications for the host

10gb ram
single core CPU
10gb free disk space
network connectivity to both signavio and source system

[size may vary depending on use case]

How well did you know this?

Not at all

Perfectly

[setting up virtual machine] 2. what do you install?

JDK 8 or higher
-oracle
-oracle JDK

How well did you know this?

Not at all

Perfectly

[setting up virtual machine] 3 - If the connector is running a windows environment what do you need to install as well

Microsoft visual C++ 2010 Resdistrubtable Package

How well did you know this?

Not at all

Perfectly

[setting up virtual machine] 4. what do you download last

download and unpack the signavio connector provided by SAP Signavio

How well did you know this?

Not at all

Perfectly

Why may you have a dedicated staging environment?

In most cases it may be faster and better-suited for process mining. It also enables you to use multiple source systems.

How well did you know this?

Not at all

Perfectly

when staging and environment setup, what else will you need in the case of AWS?

an account is required with both S3 for data storage and Athena for running the transformation scripts

How well did you know this?

Not at all

Perfectly

What happens once the environment setup is finished

the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP

What does the config.yaml file contain?

defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations

[connector configuration - config.yaml] 1. you will require apiToken and url - where do you get these

PI > under process settings > API > New Token
same settings as API token under Upload Endpoint

[connector configuration - config.yaml] 2. in the staging area what inputs do you need? (can be provided by your admin)

access key & secret key : access information provided by the account creator
athena Workgroup : name of athena workgroup if multiple exist
region : region your AWS instance runs in
bucket: name of the s3 bucket data will be stored in
athenaDatabase : name of your database in Athena

[connector configuration - config.yaml] 3. what are typical input information required to establish a connection?

source - name
user - user in the system with neccessary rights
password - for the user
logfile - name of the extraction log file
verbosity - amount of detail stored in the log file

Whats next after establishing connection?

extractor configuration aka. define the extraction and necessary data by looking at the parameters for delta load under tableSyncConfigurations

[extractor configuration] 1. what are general parameters for each table that should be extracted

name - name for the table in staging area
data source - specify it incase of muliple source systems
sqlFilePath - name of sql file for extraction
keyColumn - the column of the table used to merge multiple rows in staging
mostRecentRowColumn - used to identify the newest row with most recent information

[extractor configuration] what are additional parameters for?

for scheduled data loads in case new data needs to be extracted on a regular basis

[extractor configuration] 2. what are some additional parameters?

name - table column that distingusishes different delta load eg. creation or change date
initial - initial date or ID to start extraction
date case - format eg. YYYY-MM-dd and type eg. date
ID case - idformat i.e format of the column and type e.g id

[extractor configuration] what do extraction scripts have to ensure no data is missed or loaded twice during multiple data loads?

a condition, which usually makes use of a left and right boundary (normally dates) to extract data between these values

[extractor configuration] 3. what is right and left boundary

left - initial one for the first load and the values from the previous load - they're saved in the previously mentioned log files on each load right - usually the current date or the newest ID

after we have our source system and extraction information what is the next step?

transformation of our source data into the event log format

To transform our data into event log format what do we need?

3 columns (case ID, event name, timestamp) under eventCollectorConfigurations

[Transformation Configuration] 1. the event log format needs 3 columns, which includes what?

-- name : distinguishing different events during transformation -- collectFrom : where source tables should be queried -- sqlFilePath - name of the transformation script for this event --mapping : to map query results to necessary event log columns - usually contain caseID, timestamp and activityname -- attributes : for date attributes the format needs to be specified - name (of event), type (i.e date) and date format

How can the connector be started?

can be started as a Java application. First, go to the source directory of the connector then execute 'java -jar signavio-connector.jar ' to begin

Based on the tableSyncConfiguration what is the 'extract => ' command?

takes raw data from source system defined in extraction scripts and uploads it to the stage area where it's saved as raw talbes

Based on the tableSyncConfiguration what is the 'createschema =>' command?

generates the schema for the raw tables

Based on the tableSyncConfiguration what is the 'transform => ' command?

optimizes the raw table schema and merges row updates in case of changes, updates on data the extracted will be recognised based on keyColumn and mostRecentRowColumn parameters

Based on the eventCollectorConfiguration what is the 'eventlog =>' command?

creates event log out of the staging system based on transformation scripts and uploads it to PI