Chapter 15 Flashcards

Question 1

Q

What are the phases of information integration?

Answer

A

Analysis; discovery; planning; deployment; runtime

Question 2

Q

What are the approaches of information integration?

Answer

A

Bottom-up design (when the available data sources are well known); top-down design (when the available data sources are not known a priori); hybrid design (based on requirements)

Question 3

Q

What is schema matching?

Answer

A

An automatic process to obtain a mapping. A mapping is a set of correspondences between two schemas.

Question 4

Q

what is the difference between individual and combining matchers?

Answer

A

Individual matchers exploit only one kind of information for identifying matches, meanwhile combining matchers use several (hybrid and composite)

Question 5

Q

what is the difference between schema-only and instance-based matching?

Answer

A

Schema-only techniques operate solely on metadata, meanwhile instance-based techniques also consider properties of the data(use statistical info on data values)

Question 6

Q

structural schema matching examples:

Answer

A

cupid and similarity flooding

Question 7

Q

which approach cupid uses?

Answer

A

Hybrid approach: element-based and structure-based

Question 8

Q

what are the phases of cupid?

Answer

A

Linguistic matching; structure matching; creation of mapping/matches

Question 9

Q

What is the goal of schema integration?

Answer

A

To create an integrated schema T from a set of schema S that is:
Complete; minimal, correct, intelligible

Question 10

Q

What are the four phases of schema integration?

Answer

A

preintegration; comparing the schemas; conforming the schemas, schema merging and restructuring

Question 11

Q

What are the goals of integration planning?

Answer

A

Creation of an executable mapping

Question 12

Q

mention 4 single-source data level problems:

Answer

A

typos; dummy values, wrong values, deprecated values, cryptic values, wrong reference, duplicates

Question 13

Q

mention 4 multi-source data level problems:

Answer

A

contradictory values, deffering representation, different physical units, different precision, different levels of details

Question 14

Q

how to handle data quality problems?

Answer

A

two phases: individual records and multiple records.
In the first phase (individual records) we try to normalise data, execute convertions, remove outliers,. In the second phase (involving multiple records) we detect duplicate entries and execute data fusion.

Chapter 15 Flashcards

(14 cards)