Chapter 15 Flashcards

1
Q

What are the phases of information integration?

A

Analysis; discovery; planning; deployment; runtime

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the approaches of information integration?

A

Bottom-up design (when the available data sources are well known); top-down design (when the available data sources are not known a priori); hybrid design (based on requirements)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is schema matching?

A

An automatic process to obtain a mapping. A mapping is a set of correspondences between two schemas.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the difference between individual and combining matchers?

A

Individual matchers exploit only one kind of information for identifying matches, meanwhile combining matchers use several (hybrid and composite)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is the difference between schema-only and instance-based matching?

A

Schema-only techniques operate solely on metadata, meanwhile instance-based techniques also consider properties of the data(use statistical info on data values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

structural schema matching examples:

A

cupid and similarity flooding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

which approach cupid uses?

A

Hybrid approach: element-based and structure-based

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the phases of cupid?

A

Linguistic matching; structure matching; creation of mapping/matches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the goal of schema integration?

A

To create an integrated schema T from a set of schema S that is:
Complete; minimal, correct, intelligible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the four phases of schema integration?

A

preintegration; comparing the schemas; conforming the schemas, schema merging and restructuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the goals of integration planning?

A

Creation of an executable mapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mention 4 single-source data level problems:

A

typos; dummy values, wrong values, deprecated values, cryptic values, wrong reference, duplicates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

mention 4 multi-source data level problems:

A

contradictory values, deffering representation, different physical units, different precision, different levels of details

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to handle data quality problems?

A

two phases: individual records and multiple records.
In the first phase (individual records) we try to normalise data, execute convertions, remove outliers,. In the second phase (involving multiple records) we detect duplicate entries and execute data fusion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly