Week 4: Integration & Processing Pipeline Flashcards

Question 1

Q

What is a data lake ?

Answer

A

Question 2

Q

Data swamp

Answer

A

Highly disorganised data repository

Question 3

Q

Data lake (RUDEAS)

Answer

A

Question 4

Q

Data Warehouse (DDRUSS)

Answer

A

Question 5

Q

3 Techniques of big data integration (DRS)

Answer

A

Question 6

Q

Schema Mapping

Answer

A

Create a mediated global schema that is relevant to the business

Identfiy mappings between the schema and the data source

Question 7

Q

Record Linkage

Answer

A

Identify records that refer to teh same logical entity across different data sources

Question 8

Q

3 Record Linkage techniques (PCB)

Answer

A

Question 9

Q

Data Fushion

Answer

A

A combination of techniques that aims to resolve conflicts from a collection of sources to find truth

Question 10

Q

Data Fushion (techniques) (CSV)

Answer

A

Question 11

Q

Big Data Processing Pipeline 3 steps

Answer

A