Introduction to Standalone Connector Flashcards

1
Q

What is standalone connector?

A

It handles the communication between the source system and SAP Signavio Process Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When can standalone connector be used

A

if it’s not covered by one of the standard connectors in Process Intelligence (or any other third party systems)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does standalone connector do?

A

extracts data from the source system, transforms it to event log format, and is then uploaded to Process Intelligence to be analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do the ETL scripts run?

A

externally (outside of Process Intelligence) but uses the API to push the data to a process within the system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What 4 components does the standalone connector use?

A

– A collection of extraction and transformation SQL scripts
– A configuration file in YAML format
–A SQLite database to ensure the correct data is loaded each time in case of regular loads
– A java application for triggering the actual extraction, transformation and loading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Where does the connector pull and store data?

A

Uses a SAP Technical user to pull data from the sources system and store in an S3 bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does the connectors download the files?

A

From the transformed S3 buckets, it uses Athena to generate an eventlog file and download this file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the connector do to the event log files?

A

uploads the event log file to the Process Intelligence API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to get an automated ETL to work?

A

setup the virtual machine (to set up the environment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

[setting up virtual machine] 1. what are specifications for the host

A

10gb ram
single core CPU
10gb free disk space
network connectivity to both signavio and source system

[size may vary depending on use case]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

[setting up virtual machine] 2. what do you install?

A

JDK 8 or higher
-oracle
-oracle JDK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

[setting up virtual machine] 3 - If the connector is running a windows environment what do you need to install as well

A

Microsoft visual C++ 2010 Resdistrubtable Package

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

[setting up virtual machine] 4. what do you download last

A

download and unpack the signavio connector provided by SAP Signavio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why may you have a dedicated staging environment?

A

In most cases it may be faster and better-suited for process mining. It also enables you to use multiple source systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when staging and environment setup, what else will you need in the case of AWS?

A

an account is required with both S3 for data storage and Athena for running the transformation scripts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens once the environment setup is finished

A

the connector needs to be configured to fit the specific use case. This is done in the config.yaml file provided by SAP

17
Q

What does the config.yaml file contain?

A

defines the actions required by the connector and the connection configurations, table extractions, and event collector configurations

18
Q

[connector configuration - config.yaml] 1. you will require apiToken and url - where do you get these

A
  1. PI > under process settings > API > New Token
  2. same settings as API token under Upload Endpoint
19
Q

[connector configuration - config.yaml] 2. in the staging area what inputs do you need? (can be provided by your admin)

A
  1. access key & secret key : access information provided by the account creator
  2. athena Workgroup : name of athena workgroup if multiple exist
  3. region : region your AWS instance runs in
  4. bucket: name of the s3 bucket data will be stored in
  5. athenaDatabase : name of your database in Athena
20
Q

[connector configuration - config.yaml] 3. what are typical input information required to establish a connection?

A
  1. source - name
  2. user - user in the system with neccessary rights
  3. password - for the user
  4. logfile - name of the extraction log file
  5. verbosity - amount of detail stored in the log file
21
Q

Whats next after establishing connection?

A

extractor configuration aka. define the extraction and necessary data by looking at the parameters for delta load under tableSyncConfigurations

22
Q

[extractor configuration] 1. what are general parameters for each table that should be extracted

A
  1. name - name for the table in staging area
  2. data source - specify it incase of muliple source systems
  3. sqlFilePath - name of sql file for extraction
  4. keyColumn - the column of the table used to merge multiple rows in staging
  5. mostRecentRowColumn - used to identify the newest row with most recent information
23
Q

[extractor configuration] what are additional parameters for?

A

for scheduled data loads in case new data needs to be extracted on a regular basis

24
Q

[extractor configuration] 2. what are some additional parameters?

A
  1. name - table column that distingusishes different delta load eg. creation or change date
  2. initial - initial date or ID to start extraction
  3. date case - format eg. YYYY-MM-dd and type eg. date
  4. ID case - idformat i.e format of the column and type e.g id
25
Q

[extractor configuration] what do extraction scripts have to ensure no data is missed or loaded twice during multiple data loads?

A

a condition, which usually makes use of a left and right boundary (normally dates) to extract data between these values

26
Q

[extractor configuration] 3. what is right and left boundary

A

left - initial one for the first load and the values from the previous load - they’re saved in the previously mentioned log files on each load

right - usually the current date or the newest ID

27
Q

after we have our source system and extraction information what is the next step?

A

transformation of our source data into the event log format

28
Q

To transform our data into event log format what do we need?

A

3 columns (case ID, event name, timestamp) under eventCollectorConfigurations

29
Q

[Transformation Configuration] 1. the event log format needs 3 columns, which includes what?

A

– name : distinguishing different events during transformation
– collectFrom : where source tables should be queried
– sqlFilePath - name of the transformation script for this event
–mapping : to map query results to necessary event log columns - usually contain caseID, timestamp and activityname
– attributes : for date attributes the format needs to be specified - name (of event), type (i.e date) and date format

30
Q

How can the connector be started?

A

can be started as a Java application.

First, go to the source directory of the connector then execute ‘java -jar signavio-connector.jar <command></command>’ to begin

31
Q

Based on the tableSyncConfiguration what is the ‘extract => ‘ command?

A

takes raw data from source system defined in extraction scripts and uploads it to the stage area where it’s saved as raw talbes

32
Q

Based on the tableSyncConfiguration what is the ‘createschema =>’ command?

A

generates the schema for the raw tables

33
Q

Based on the tableSyncConfiguration what is the ‘transform => ‘ command?

A

optimizes the raw table schema and merges row updates in case of changes, updates on data the extracted will be recognised based on keyColumn and mostRecentRowColumn parameters

34
Q

Based on the eventCollectorConfiguration what is the ‘eventlog =>’ command?

A

creates event log out of the staging system based on transformation scripts and uploads it to PI