lec 6(done) Flashcards

1
Q

data Integration:

A

Combines data from multiple sources into a coherent store

Careful integration can help reduce and avoid redundancies and inconsistencies in the resulting data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data Integration Issues:

A

1-Entity identification problem
2-Redundancy and correlation analysis
3-Tuple duplication
4-Data value conflict detection and resolution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

example of schema Integration and Object Matching

A

e.g., customer_id in one database andcust_number in another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is metadata?

A

a set of data that describes and gives information about other data.

this include (attribute name, meaning, data type, range of values, null rules)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how can we avoid errors in schema integration?

A

by using metadata

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Redundant data occurs often when integration of multiple databases due to:

A

Dimension naming: The same attribute may have different names in different databases

Derivable data: The same attribute can be “derived” from another attribute or set of attributes. e.g., annual revenue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Redundant attributes can be detected by

A

correlation analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Correlation does not imply causality explain:

A

If A and B are correlated, this does not necessarily imply that A causes B or that B causes A.

  • # of hospitals and # of car-theft in a city are correlated
  • Both are causally linked to the third variable: population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Correlation Test for Nominal Data we use

A

X2 (chi-square) test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

X2 (chi-square) test

A

slide 7-11

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Correlation Test for Numeric Data we use

A

Correlation Coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Correlation Coefficient

A

slide 12

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Scatter plots can be used to

A

view correlations between attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Tuple Duplication

A

Two or more identical tuples for a given data entry

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Inconsistencies often arise with tuple duplicates, due to what?

A

due to updating some but not all data occurrences.

For example, a database contains three duplicate purchase tuples and we updatethe purchaser’s name for only one or two tuples, so that might cause inconsistencies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data Value Conflict Detection and Resolution

A

slide 15