Data Requirements and Setup Flashcards
What is data transformation?
changing the format, structure or values of raw data from ERP-systems in order to upload a file to the process mining system
What is the file uploaded to the process mining system called?
event log
What does event log contain?
all recorded events with their timestamp assigned to certain case IDs
Examples of how data transformation is typically achieved?
-Translation and mapping
-Filtering, aggregation, and summarization
-Enrichment and imputation
-Indexing and ordering
-Anonymization and encryption
-Modeling, typecasting, formatting, and renaming
What happens to every step in a system?
they get recorded and leave behind a trace
What happens to all changes and transactions referring to business objects/steps?
they’re stored in a database
What does process intelligence do with the data objects/steps stored?
those details can be explored, extracted and transformed in a way that allows backtracking of all the steps
The recreated steps are stored in what
the event log
For analysis purpose, what format should the data be?
uniform and standardised, there can be differences especially if they come from different source systems
Steps for data transformation
- Definition of the target format
- Conversion of the extracted data
- Saving the converted data into a new file
Why is data transformation necessary?
all data is stored in different tables and we need to ensure the extracted data is linked to specific cases
eg. system should know that Order ID 123 in the order table and Invoice ID 456 in the invoice table belong to the same case
What does case ID define?
the scope of the process. It determines where the process starts and ends
eg. in the procurement process, case ID would be the puchase document ID and every single request is a new case
What is the last part of ETL?
data load phase
What does the data load phase cover?
This covers the tasks to upload the transformed data into the process mining system
What 3 questions need to be addressed for your data load.
- which upload method is required?
could be manual eg. csv or automatic eg. API - should the existing data be replaced, or should new ones be attached?
depends on scenario, appending new data to existing might be useful eg. annual data set created from extracting at the end of each quarter - how often should new data be uploaded
depends on availability of data, could be hourly if live