Introduction to ETL and Data Transformation Flashcards

1
Q

Where is process data hidden? And what needs to happen to it?

A

in ERP systems that captures data in tables, it needs to be first extracted then transformed into a specific format before being analysed in a mining tool like process intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain the concept of ETL

A

preparing transactional data for mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does ETL stand for?

A

[Data] Extraction
[Data] Transformation
[Data] Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In terms of business processes what is data extractions/

A

retrieval of all business-related data within the system used for process mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What questions surrounding ETL should we ask?

A

what data is required and where is it stored?
What process is it?
Which IT-Systems are used?
What is the timeframe?
Does all recorded activity have a timestamp?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the minimum key requirements for ETL?

A

a valid case with a case ID, event name identifier and a timestamp for each event.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What would the time frame be?

A

ideally all records but as this is a lot of data, its usually a smaller parameter such as 1 year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the issue with a smaller time frame?

A

not all cases have been executed so you need to consider whether they should be removed from extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

3 steps of data extraction

A
  1. define the process scope [where it starts and end, what events are important]
  2. Identify the relevant business objects [state transitions of these objects that allow you to track the progress]
  3. Identify the required systems and tables [for each event idetify id, event name, timestamp]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When you’re extracting data from multiple systems what is the recommendation?

A

start small by by extracting the data from one system to get your first results then you can expand the process with more data in the next iteration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can you do if the data is difficult to extract (external systems) or there is no unique identifier?

A

you can combine 2 values eg. order value and order time or reduce the timeframe if ID can’t be created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly