Introduction to Standalone Connector Flashcards

1
Q

What is standalone connector?

A

It handles the communication between the source system and SAP Signavio Process Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When can standalone connector be used

A

if it’s not covered by one of the standard connectors in Process Intelligence (or any other third party systems)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does standalone connector do?

A

extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do the ETL scripts run?

A

externally (outside of Process Intelligence) but uses the API to push the data to a process within the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What 4 components does the standalone connector use?

A

– A collection of extraction and transformation SQL scripts
– A configuration file in YAML format
–A SQLite database to ensure the correct data is loaded each time in case of regular loads
– A java application for triggering the actual extraction, transformation and loading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where does the connector pull and store data?

A

Uses a SAP Technical user to pull data from the sources system and store in an S3 bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the connectors download the files?

A

From the transformed S3 buckets, it uses Athena to generate an eventlog file and download this file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the connector do to the event log files?

A

uploads the event log file to the Process Intelligence API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to get an automated ETL to work?

A

setup the virtual machine (to set up the environment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

[setting up virtual machine] 1. what are specifications for the host

A

10gb ram
single core CPU
10gb free disk space
network connectivity to both signavio and source system

[size may vary depending on use case]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

[setting up virtual machine] 2. what do you install?

A

JDK 8 or higher
-oracle
-oracle JDK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

[setting up virtual machine] 3 - If the connector is running a windows environment what do you need to install as well

A

Microsoft visual C++ 2010 Resdistrubtable Package

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

[setting up virtual machine] 4. what do you download last

A

download and unpack the signavio connector provided by SAP Signavio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why may you have a dedicated staging environment?

A

In most cases it may be faster and better-suited for process mining. It also enables you to use multiple source systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when staging and environment setup, what else will you need in the case of AWS?

A

an account is required with both S3 for data storage and Athena for running the transformation scripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens once the environment setup is finished

A

the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP

17
Q

What does the config.yaml file contain?

A

defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations

18
Q

[connector configuration - config.yaml] 1. you will require apiToken and url - where do you get these

A
  1. PI > under process settings > API > New Token
  2. same settings as API token under Upload Endpoint
19
Q

[connector configuration - config.yaml] 2. in the staging area what inputs do you need? (can be provided by your admin)

A
  1. access key & secret key : access information provided by the account creator
  2. athena Workgroup : name of athena workgroup if multiple exist
  3. region : region your AWS instance runs in
  4. bucket: name of the s3 bucket data will be stored in
  5. athenaDatabase : name of your database in Athena
20
Q

[connector configuration - config.yaml] 3. what are typical input information required to establish a connection?

A
  1. source - name
  2. user - user in the system with neccessary rights
  3. password - for the user
  4. logfile - name of the extraction log file
  5. verbosity - amount of detail stored in the log file
21
Q

Whats next after establishing connection?

A

extractor configuration aka. define the extraction and necessary data by looking at the parameters for delta load under tableSyncConfigurations

22
Q

[extractor configuration] 1. what are general parameters for each table that should be extracted

A
  1. name - name for the table in staging area
  2. data source - specify it incase of muliple source systems
  3. sqlFilePath - name of sql file for extraction
  4. keyColumn - the column of the table used to merge multiple rows in staging
  5. mostRecentRowColumn - used to identify the newest row with most recent information
23
Q

[extractor configuration] what are additional parameters for?

A

for scheduled data loads in case new data needs to be extracted on a regular basis

24
Q

[extractor configuration] 2. what are some additional parameters?

A
  1. name - table column that distingusishes different delta load eg. creation or change date
  2. initial - initial date or ID to start extraction
  3. date case - format eg. YYYY-MM-dd and type eg. date
  4. ID case - idformat i.e format of the column and type e.g id
25
[extractor configuration] what do extraction scripts have to ensure no data is missed or loaded twice during multiple data loads?
a condition, which usually makes use of a left and right boundary (normally dates) to extract data between these values
26
[extractor configuration] 3. what is right and left boundary
left - initial one for the first load and the values from the previous load - they're saved in the previously mentioned log files on each load right - usually the current date or the newest ID
27
after we have our source system and extraction information what is the next step?
transformation of our source data into the event log format
28
To transform our data into event log format what do we need?
3 columns (case ID, event name, timestamp) under eventCollectorConfigurations
29
[Transformation Configuration] 1. the event log format needs 3 columns, which includes what?
-- name : distinguishing different events during transformation -- collectFrom : where source tables should be queried -- sqlFilePath - name of the transformation script for this event --mapping : to map query results to necessary event log columns - usually contain caseID, timestamp and activityname -- attributes : for date attributes the format needs to be specified - name (of event), type (i.e date) and date format
30
How can the connector be started?
can be started as a Java application. First, go to the source directory of the connector then execute 'java -jar signavio-connector.jar ' to begin
31
Based on the tableSyncConfiguration what is the 'extract => ' command?
takes raw data from source system defined in extraction scripts and uploads it to the stage area where it's saved as raw talbes
32
Based on the tableSyncConfiguration what is the 'createschema =>' command?
generates the schema for the raw tables
33
Based on the tableSyncConfiguration what is the 'transform => ' command?
optimizes the raw table schema and merges row updates in case of changes, updates on data the extracted will be recognised based on keyColumn and mostRecentRowColumn parameters
34
Based on the eventCollectorConfiguration what is the 'eventlog =>' command?
creates event log out of the staging system based on transformation scripts and uploads it to PI