6. Organisation of Data Flashcards

Question 1

Q

What is a file?

Answer

A

A collection of related records or data handled as a single unit. It has a filename which users can use to access data at a later time

Question 2

Q

What is a record?

Answer

A

A collection of related fields (a field is a piece of data about an entity e.g. surname is a field that could be in customer data)

Question 3

Q

What is a fixed length record?

Answer

A

When the length of the record is stated at the beginning and cannot be changed. If data is too large to be stored, it is truncated.

Examples:
- DOB
- gender

Question 4

Q

What is a variable length record?

Answer

A

When the length of the record can change depending on what data needs to be stored.

Examples:
- name
- address

Question 5

Q

How do fixed length and variable length records compare?

Answer

A

fixed length:
- same number of bytes in each field
- easier to program as its easier to calculate how much storage is required
- quicker to process
- fields with blank space waste storage

variable length:
- different number of bytes in each field
- harder to program
- slower to process
- no blank space so only necessary amount of storage is taken up

Question 6

Q

What is a master file?

Answer

A

They store records of everything that has ever happened and are updated with new batches of information to keep them up to date. Due to this they are large and accessed infrequently. They are stored in a logical and sequential way.

Question 7

Q

What is a transaction file?

Answer

A

They store day to day data that is copied to the master file at the end of the day. Data is stored using a serial method (no fixed order) for a set period of time.

Question 8

Q

How is the master file updated?

Answer

A

The transaction files are sorted in order of primary key. For every record in the sorted transaction file, the master file is updated by comparing the new transaction file to the data within the master file. After this has repeated, a new master file is produced that contains the new data, error reports and printed reports to utilise within the company, e.g. gas bills.

Question 9

Q

What is serial file access?

Answer

A

When records are stored in order of when they are added to the file. They are used when no order is required. It is slow to search through them as a linear search must be used, however adding a file is very quick as it is simply appended to the end.

Question 10

Q

What is sequential file access?

Answer

A

When records are stored in order of record key. When a new record is added, the records from the old file are copied to the new, up until where the new record needs to be inserted. Once the new record is inserted, the rest of the records from the old file are copied. Records in sequential files can be searched for using a binary search.

Question 11

Q

How are records deleted from files?

Answer

A

A new copy of the file is made with every record in except the one being deleted. The original file is then deleted and the new file remains.

Question 12

Q

What is an indexed sequential file?

Answer

A

Records within indexed sequential files are split into two components: the index, and the bulk of the record. The index acts as a record key and a pointer to where the rest of the record is stored on the disk. When copying the file to add or delete a record, only the indexes need to be copied, and the rest of the record can remain where it is.

Question 13

Q

What is multilevel indexing?

Answer

A

When the top level index of a record points to the second level index of the record, which contains pointers to the rest of the record. As many levels as needed can be used. This is useful for when different clusters of records are stored in different locations.

Question 14

Q

What is data validation?

Answer

A

Validation checks data inputted by a user before it is committed to storage. It is used to ensure that input provided by the user is possible and/or sensible.

Question 15

Q

What are the types of validation?

Answer

A

type check: ensures inputs are correct data type
range check: ensures data falls within specified range
presence check: ensures data has been entered into a field
format check: ensures data conforms to a certain format e.g. dates as dd/mm/yy
length check: ensures inputs are within a certain length

Question 16

Q

What is data verification and what are the two types?

Answer

A

Verification ensures that data entered is consistent.
Double entry: data is entered twice and compared to ensure accuracy e.g. entering a password twice
Proof reading: reading over the input to manually check it

Question 17

Q

What is an archive file?

Answer

A

When files are not used frequently but still need to be kept to be used again in the future, they are archived. This frees up space on the master file and helps speed up the main system, while the archived files can still be accessed if needed.

Question 18

Q

What are some file security methods?

Answer

A

firewalls
encryption
backups
access rights
passwords
transaction logs (incremental backup)

Question 19

Q

What is hashing?

Answer

A

An algorithm which associates an index with a record. The key is converted into the index by the algorithm. Each record is sorted into a block, of which can contain a fixed number of records. The blocks start off blank and have data entered into them.

Question 20

Q

What is random access in terms of searching for a record?

Answer

A

When a computer system jumps to a record directly without having to search though numerous records. The physical location is calculated by using a hashing algorithm. Data collisions can occur when two data items are hashed to the same location.

Question 21

Q

How can data collisions be dealt with?

Answer

A

Overflow: when two items are hashed to the same location, one of them is put in the overflow block. When there are many items in the overflow block, searching can become slow, in which case a new hashing algorithm is required. Overflow is searched using a linear search, which is inefficient, however overflow is the cheapest method.

Creating new files: records are rehashed to a new file which contains more blocks. This however does not eliminate the possibility of overflow and can be more expensive.