Hard cards Flashcards
The constraint that all primary keys must have non-null values is referred as
Entity integrity rule
What are 4 data requirements for relational databases
- Every column must be single valued
- Entity integrity rule
- Referential integrity
- All non-key attributes must describe a characteristic of the entity identified by the primary key.
What is Entity Relationship Modelling (ERM)
ERM is a modelling notation used to model the characteristics and relationships of data. Useful to formalize and visualize the structure of data for implementation of databases.
In text mining, stemming is the process of:
Reducing multiple words to their base or root.
What does pivot do?
Swap the attributes in the horizontal and vertical fields.
What does a slice do?
Filter on one item form a dimension. Such as one product, one life cycle, or one client name.
What can be done with a roll-up
Insert a KPI diagram for showing the total sales.
How is the accuracy calculated?
TP+TN / ALL
How is the precision calculated?
TP / TP+FP
How is the true positive rate calculated?
TP / TP+FN
What does the information about coverage in Celonis mean?
The amount of cases which are represented by the shown model.
What does it mean when the start note has multiple arrows going to different places?
The process has different start activities depending on the different cases that are represented.
What does a misfitting model result in for auditors?
Misfitting model –> false negative audit results (compliance violations are not detected)
What does a imprecise model result in for auditors?
Imprecise model –> false positive audit results (compliance violations are indicated that did not occur in reality).
What does the tokenize operator do?
Splits text into tokens, often removing non letters.
Break up the text in individual parts which we then can analyze.
When is the conversion criteria met?
The convergence criterion is usually met when the assignment of points to clusters becomes stable.
What is escalation of commitment?
A pattern of behavior in which an individual or group will continue to rationalize their decisions, actions and investments when faced with increasingly negative outcomes rather than alter their course.
Many things come together here: sunk cost fallacy, status quo bias, omission bias, confirmation bias, etc.
What is the formula for the fitness?
Fitness = Re-playable cases / total cases
What does process mining encompas?
Techniques, tools and methods to discover, monitor and improve real processes by extracting knowledge from event logs.
What are some of the use cases of clustering?
- Identify natural groupings of customers;
- Identify rules for assigning new cases to classes for targeting/diagnostic purposes;
- Provide characterization, definition, labeling of populations;
- Decrease the size and complexity of problems for other data mining methods;
- Identify outliers in a specific domain.5
What can cluster analysis be used for?
Cluster analysis can be used for automatic identification of natural groupings of things
What does link analysis try to achieve?
Find patterns in relatioHnship to each other
How does clustering work?
It works by learning the clusters of things from past data, then assigning new instances.
In text mining, what are 3 methods used to reduce the size of a sparse matrix?
- Using a domain expert
- Using singular value decomposition (SVD)
- Eliminating rarely occurring terms.
What is the true positive rate also called?
Sensitivity or recall
What is a concept in text mining?
Concept refers to words that might relate to a similar topic. A concept could be crime, which has multiple words related to it.
What are terms in text mining?
These are individual words of small groups of words that belong together.
What are prescriptive analysis?
Prescribes what should I do and why should I do it.
Enables optimization, simulation, decision modelling and expert systems.
Outcomes: best possible business decisions and transactions
What is a dice?
A slice on more than two dimensions
What is a slice?
Subset of multidimensional array
selection on one dimension of a 3-dimensional cube, resulting in a 2-dimension site.
What is roll up?
Aggregation on a data cube
Individual temperatures –> cold/mild/hot
Location cities –> location countries
What is a pivot?
Also called a rotation,. Rotate the axes in view to provide an alternative presentation of data.
What is referential integrity?
Foreign keys must contain the same data as the primary key in another table
What is the entity integrity rule?
All primary keys must have non-null values
How is the true negative rate measured?
TN / (TN+FP)
What type of techniques are part of predictions? is this supervised or unsupervised?
Classification and regression
Supervised
What type of techniques are part of clustering?
Outlier analysis
What type of techniques are part of association?
Link analysis and sequence analysis
What does the accuracy of data mining refer to?
Its ability to predict the outcome of a previously unknown data set accurately.
What is a process instance?
A single execution of a business process (Every time someone makes a cake using a recipe).
What does the initial path (first variant) display?
The initial path shows the most frequent ‘as is’ process flow across all process patterns.
What does high precision mean in process mining?
High precision means that the model does not produce too many “false positives” or additional traces that were never seen in the actual logs, ensuring that the model is not overly general.
What is a process model?
An abstraction from the real business process. It’s5 a graphical representation of activities that need to be executed collectively for realizing a specific business objective.
How is the precision calculated in process mining?
Number of re-playable traces observed in the event log divided by the traces playable by the model.
What are process variants?
Process executions of a specific process that represent identical traces.
What is the result of an unfitting model?
False negative audit results, compliance violations are not detected.
What is precision in process mining?
Measures how much behavior allowed by the model is actually observed in the event log. A model should not allow additional behavior very different from the behavior recorded in the event log.
What is generalization?
Evaluates how well the model can generalize to unseen instances of the process. A process model is not exclusively restricted to display the eventually limited record of observed behavior in the event log.
What is a trace?
The sequence of recorded events in a case.