Hard cards Flashcards

1
Q

The constraint that all primary keys must have non-null values is referred as

A

Entity integrity rule

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

What are 4 data requirements for relational databases

A
  1. Every column must be single valued
  2. Entity integrity rule
  3. Referential integrity
  4. All non-key attributes must describe a characteristic of the entity identified by the primary key.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Entity Relationship Modelling (ERM)

A

ERM is a modelling notation used to model the characteristics and relationships of data. Useful to formalize and visualize the structure of data for implementation of databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In text mining, stemming is the process of:

A

Reducing multiple words to their base or root.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does pivot do?

A

Swap the attributes in the horizontal and vertical fields.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a slice do?

A

Filter on one item form a dimension. Such as one product, one life cycle, or one client name.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can be done with a roll-up

A

Insert a KPI diagram for showing the total sales.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is the accuracy calculated?

A

TP+TN / ALL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How is the precision calculated?

A

TP / TP+FP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is the true positive rate calculated?

A

TP / TP+FN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does the information about coverage in Celonis mean?

A

The amount of cases which are represented by the shown model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean when the start note has multiple arrows going to different places?

A

The process has different start activities depending on the different cases that are represented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a misfitting model result in for auditors?

A

Misfitting model –> false negative audit results (compliance violations are not detected)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a imprecise model result in for auditors?

A

Imprecise model –> false positive audit results (compliance violations are indicated that did not occur in reality).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the tokenize operator do?

A

Splits text into tokens, often removing non letters.

Break up the text in individual parts which we then can analyze.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When is the conversion criteria met?

A

The convergence criterion is usually met when the assignment of points to clusters becomes stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is escalation of commitment?

A

A pattern of behavior in which an individual or group will continue to rationalize their decisions, actions and investments when faced with increasingly negative outcomes rather than alter their course.

Many things come together here: sunk cost fallacy, status quo bias, omission bias, confirmation bias, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the formula for the fitness?

A

Fitness = Re-playable cases / total cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does process mining encompas?

A

Techniques, tools and methods to discover, monitor and improve real processes by extracting knowledge from event logs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are some of the use cases of clustering?

A
  • Identify natural groupings of customers;
  • Identify rules for assigning new cases to classes for targeting/diagnostic purposes;
  • Provide characterization, definition, labeling of populations;
  • Decrease the size and complexity of problems for other data mining methods;
  • Identify outliers in a specific domain.5
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What can cluster analysis be used for?

A

Cluster analysis can be used for automatic identification of natural groupings of things

21
Q

What does link analysis try to achieve?

A

Find patterns in relatioHnship to each other

22
Q

How does clustering work?

A

It works by learning the clusters of things from past data, then assigning new instances.

23
Q

In text mining, what are 3 methods used to reduce the size of a sparse matrix?

A
  • Using a domain expert
  • Using singular value decomposition (SVD)
  • Eliminating rarely occurring terms.
24
Q

What is the true positive rate also called?

A

Sensitivity or recall

25
Q

What is a concept in text mining?

A

Concept refers to words that might relate to a similar topic. A concept could be crime, which has multiple words related to it.

26
Q

What are terms in text mining?

A

These are individual words of small groups of words that belong together.

27
Q

What are prescriptive analysis?

A

Prescribes what should I do and why should I do it.
Enables optimization, simulation, decision modelling and expert systems.
Outcomes: best possible business decisions and transactions

28
Q

What is a dice?

A

A slice on more than two dimensions

29
Q

What is a slice?

A

Subset of multidimensional array

selection on one dimension of a 3-dimensional cube, resulting in a 2-dimension site.

30
Q

What is roll up?

A

Aggregation on a data cube

Individual temperatures –> cold/mild/hot
Location cities –> location countries

31
Q

What is a pivot?

A

Also called a rotation,. Rotate the axes in view to provide an alternative presentation of data.

32
Q

What is referential integrity?

A

Foreign keys must contain the same data as the primary key in another table

33
Q

What is the entity integrity rule?

A

All primary keys must have non-null values

34
Q

How is the true negative rate measured?

A

TN / (TN+FP)

35
Q

What type of techniques are part of predictions? is this supervised or unsupervised?

A

Classification and regression
Supervised

36
Q

What type of techniques are part of clustering?

A

Outlier analysis

37
Q

What type of techniques are part of association?

A

Link analysis and sequence analysis

38
Q

What does the accuracy of data mining refer to?

A

Its ability to predict the outcome of a previously unknown data set accurately.

39
Q

What is a process instance?

A

A single execution of a business process (Every time someone makes a cake using a recipe).

40
Q

What does the initial path (first variant) display?

A

The initial path shows the most frequent ‘as is’ process flow across all process patterns.

41
Q

What does high precision mean in process mining?

A

High precision means that the model does not produce too many “false positives” or additional traces that were never seen in the actual logs, ensuring that the model is not overly general.

42
Q

What is a process model?

A

An abstraction from the real business process. It’s5 a graphical representation of activities that need to be executed collectively for realizing a specific business objective.

43
Q

How is the precision calculated in process mining?

A

Number of re-playable traces observed in the event log divided by the traces playable by the model.

44
Q

What are process variants?

A

Process executions of a specific process that represent identical traces.

45
Q

What is the result of an unfitting model?

A

False negative audit results, compliance violations are not detected.

46
Q

What is precision in process mining?

A

Measures how much behavior allowed by the model is actually observed in the event log. A model should not allow additional behavior very different from the behavior recorded in the event log.

47
Q

What is generalization?

A

Evaluates how well the model can generalize to unseen instances of the process. A process model is not exclusively restricted to display the eventually limited record of observed behavior in the event log.

48
Q

What is a trace?

A

The sequence of recorded events in a case.

49
Q
A