lec 6(done) Flashcards

Question 1

Q

data Integration:

Answer

A

Combines data from multiple sources into a coherent store

Careful integration can help reduce and avoid redundancies and inconsistencies in the resulting data set.

Question 2

Q

Data Integration Issues:

Answer

A

1-Entity identification problem
2-Redundancy and correlation analysis
3-Tuple duplication
4-Data value conflict detection and resolution

Question 3

Q

example of schema Integration and Object Matching

Answer

A

e.g., customer_id in one database andcust_number in another.

Question 4

Q

what is metadata?

Answer

A

a set of data that describes and gives information about other data.

this include (attribute name, meaning, data type, range of values, null rules)

Question 5

Q

how can we avoid errors in schema integration?

Answer

A

by using metadata

Question 6

Q

Redundant data occurs often when integration of multiple databases due to:

Answer

A

Dimension naming: The same attribute may have different names in different databases

Derivable data: The same attribute can be “derived” from another attribute or set of attributes. e.g., annual revenue

Question 7

Q

Redundant attributes can be detected by

Answer

A

correlation analysis

Question 8

Q

Correlation does not imply causality explain:

Answer

A

If A and B are correlated, this does not necessarily imply that A causes B or that B causes A.

# of hospitals and # of car-theft in a city are correlated
Both are causally linked to the third variable: population

Question 9

Q

Correlation Test for Nominal Data we use

Answer

A

X2 (chi-square) test

Question 10

Q

X2 (chi-square) test

Answer

A

slide 7-11

Question 11

Q

Correlation Test for Numeric Data we use

Answer

A

Correlation Coefficient

Question 12

Q

Correlation Coefficient

Question 13

Q

Scatter plots can be used to

Answer

A

view correlations between attributes

Question 14

Q

Tuple Duplication

Answer

A

Two or more identical tuples for a given data entry

Question 15

Q

Inconsistencies often arise with tuple duplicates, due to what?

Answer

A

due to updating some but not all data occurrences.

For example, a database contains three duplicate purchase tuples and we updatethe purchaser’s name for only one or two tuples, so that might cause inconsistencies.

Question 16

Q

Data Value Conflict Detection and Resolution

Answer

Study These Flashcards

A

slide 15

lec 6(done) Flashcards

(16 cards)