Chapter 20 Flashcards
What are types of data duplication on non-unique primary key
- Duplicate identification numbers
- House holding
- Individualization
What are 2 sub types of Duplicate identification numbers
- Multiple customer numbers
- Multiple employee numbers
What is House holding: data duplication problem
Multiple people living in same one house. (e.g. there are many people and families living in one home. All have bank accounts. When bank send advertise to its account holders, it will be sending multiple brochures to one house)
What is individualization: data duplication problem
There is one person behind but looks more than one. (e.g. Mr ahad, Major. ahad)
What is purge
Remove duplicate records
What 0 and 1 represents in degree of similarity
1 - near
0 - far
What is BSN method
Basic sorted neighborhood method is a method for removal of duplication record. It has 3 steps.
1- Identify attributes of record that will make a key of each record.
2- Sort that record on the basis of key.
3- Data is organized in a set. Each set is called window. Each window representing number can help us to identify duplicates.
What is BSN window
Data is organized in a set. Each set is called window . Each window representing number can help us to identify duplicates. BSN window size minimum will be 2 and maximum can be entire list.
What is BSN method: selection of keys
In this method, we select which parts of attributes of records and those part will help to make key. That key will be sorted and do BSN.
What is BSN method: matching candidates
Merging of duplicate records is complex inferential process