Midterm Flashcards

1
Q

A -> B
B -> C
so Link A-> C

A

Transitive Closure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

After an ER process links a cluster of references, only one reference is retained

A

Survivor Record MDM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Survivor EIS where you select one ‘best’ record from the cluster to represent the identity

A

Best Reference Style

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Survivor EIS where you create a new record from the best parts of records in the cluster

A

Exemplar Reference Style

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • Master identifier
  • Identity attributes
  • Application information
A

Components of MDM architecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What MDM architecture is good for cybersecurity?

A

External reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What MDM architecture stores no identify information in the IKB?

A

External reference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What MDM architecture components are stored in IKB in external reference?

A

Master identifier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What MDM architecture components are stored in IKP in registry?

A

Master identifier and Identity attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What MDM architecture is the registry schematic with the cross walk added?

A

Reconciliation Engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What MDM architecture stores all components in the IKB?

A

Transaction Hub

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What MDM architecture components are stored in the IKB in Transaction Hub?

A

Master identifier, identity attributes, and application information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the methods of updating IKB in MDM?

A

Automatic and manual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the output of Entity Resolution?

A

A link index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

1 cause of data quality issues?

A

Multiple sources of the same information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Three parts of entity-based data integration

A
  1. Standardization
  2. Entity Resolution
  3. Rationalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

References to the same entity are called…

A

Equivalent References

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Every entity reference in an information system is created with the intention to reference one, and only one, real-world entity.

A

Unique Reference Assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The higher the degree of similarity between two entity references, the higher the probability the references are
equivalent, and the less similar, the less likely they are equivalent

A

Reference Similarity Assumption

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Precision Formula

A

P=TP/(TP+FP)

21
Q

Recall Formula

A

R=TP/(TP+FN)

22
Q

F-Measure Formula

A

F= (2 x P x R)/(P + R)

23
Q

Accuracy Formula

A

(TP+TN)/(TP+FP+TN+FN) = (TP+TN)/D

24
Q

How do you calculate unordered pairs?

A

N*(N-1)/2

25
Q

CSRUD Model

A

• Capture of Entity Identity Information
• Store and Share Entity Identity Information
• Resolve and Retrieve Entity Identifiers
• Update Entity Identity Information
• Dispose (Retire) Entity Identity Information

26
Q

Capture Phase Activities

A
  1. Assess the data quality of each identity source, plan
    the cleansing and standardization processes
  2. Profile and select candidates for primary and
    supporting identity attributes
  3. Setup entity identity integrity assessment methods,
  4. Craft the matching rules or build ML model
  5. Evaluate and refine the rules/model to acceptable levels of false positive and false negative error
  6. Develop blocking strategy
27
Q

biggest enemies of ER

A

inconsistent representation of
values and missing values

28
Q

Measures of discrimination power

A

• Attribute Uniqueness
• Attribute Entropy
• Attribute Weight

29
Q

Attribute Uniqueness Formula

A

𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑈𝑛𝑖𝑞𝑢𝑒 𝑁𝑜𝑛𝑁𝑢𝑙𝑙 𝑉𝑎𝑙𝑢𝑒s /𝐶𝑜𝑢𝑛𝑡 𝑜𝑓 𝑁𝑜𝑛𝑁𝑢𝑙𝑙 𝑉𝑎𝑙𝑢𝑒

30
Q

What are the three levels of ER matching?

A
  1. Attribute level
  2. Record level
  3. Cluster level
31
Q

What do you call and algo to compare attribute values?

A

Comparator

32
Q

What are the three types of similarity functions for string values?

A
  1. Approximate Syntatic Match (ASM)
  2. Approximate Semantic Match
  3. Phonetic Match
33
Q

The minimum number of single character changes that will
transform one string into the other

A

Levenshtein Edit Distance

34
Q

Same as Levensthein, but allows one additional string manipulation, transpose adjacent characters

A

Damereau-Levenshtein Edit Distance

35
Q

Based on the number of characters in common and number of
transpositions between two strings S1 and S2

A

Jaro String Comparator

36
Q

Modification of Jaro Comparator which gives added weight to the first four prefix characters

A

Jaro-Winkler

37
Q

Replace ‘A’, E’, ‘I’, ‘O’, ‘U’, ‘H’, ‘W’, ‘Y‘ with “0” (zero) after the first
letter, change letters to digits

A

Soundex Algorithm

38
Q

Tries to measure similarity according to linguistic meaning rather than by character structure.

A

Approximate Semantic Match

39
Q

Two types of supervised MDM

A
  1. Bring-Your-Own-Identifier MDM
  2. Once-and-Done MDM
40
Q

Two types of unsupervised MDM

A
  1. Survivor Record MDM
  2. Full-Context MDM
41
Q

Two types of record updates in MDM

A
  1. Automated (Unsupervised)
  2. Manual (Supervised)
42
Q

Two types of assertions

A
  1. Correction affirmation
  2. Confirmation affirmation
43
Q

Type of assertion to correct the error that two structures are false negatives of
each other

A

Structure-to-Structure

44
Q

Type of assertion to correct the error that a structure has references to more than one identity

A

Structure-split Assertion

45
Q

Type of assertion which corrects both a false positive and false negative in one
operation

A

Reference-transfer Assertion

46
Q

Asserts that a structure has been reviewed and found to be a
true positive

A

True Positive Assertion

47
Q

Asserts two or more structures have been reviewed and found to be true negatives

A

True Negative Assertion

48
Q

Which type of assertion creates a cluster from specific set of input references

A

Reference to Reference Assertion

49
Q

Which type of assertion adds a specific set of input references to a specific structure

A

Reference-to-Structure Assertion