Topic 6: Database systems Flashcards
Describe a relational database.
A relational database stores data in tables linked via relationships by primary and foreign keys.
Give 2 advantages of a relational database over a flat-file spreadsheet approach.
Data will be stored in one place and not be redundant. For example the customer’s phone number will only be stored in one field of one table.
Data will be independent of the database structure. For example, you could create an application for a shop for entering their sales using the same data tables they already use for marketing.
Data will be consistent from input to processing to output. For example, the date should always remain in the format DD/MM/YY.
Data is more secure because you can set different access levels for different users. A receptionist would see some data, whereas the manager can see all of it.
What is data redundancy?
Data is redundant when it is duplicated unnecessarily. For example, a customer’s address should only be stored in one place, so it can be updated easily.
What is data consistency?
Data must be consistent as it moves from input to processing to output. For example, if the designer decides the date should be in the format 16/03/17, it shouldn’t sometimes appear as 16th March 2017.
What is data integrity?
Data integrity is the correctness of the data. If a value is updated in one place it should be the same updated value that appears in all the applications built on top of that database.
What is data independence?
Data should be independent of the underlying database structure. For example, if a business has an application for tracking sales, you should be able to build another application for them without having to change structure or relationships between the data tables.
What is data normalisation?
A staged mathematical process that removes repeated groups of data.
A staged mathematical process that removes data duplication and inconsistencies.
What is a data warehouse?
A large collection of archived data from multiple sources used for decision making. Huge quantities of data stored in a consistent order to make interrogation more productive. A huge database specifically structured for information access and reporting.
What is data mining?
Data mining is the ‘interrogation’ of a data warehouse to help the organisation make decisions. It involves drilling down into the structure of the data to discover meaningful patterns. Users can identify trends over time and can test theories for accuracy.
Give two advantages to a company of using a data warehouse (note, these are quite similar to the advantages given for data mining)
To store information about every sale. To see who has bought what items and when. To find the most popular product. To target customers with special offers. To plan future changes or developments in their business. To use data mining…
Give two advantages of data mining to an organisation.
Identify a list of customers likely to buy a certain product, which they can then use to target with a mail shot. Make comparisons with competitors. Run ‘what if’ queries for modelling exercises. Predict future sales. Find the best locations for new shops. Analyse sales patterns. Test analysis of results for plausibility. Create reports to help decision making.
A hospital unit uses a relational database. A patient is allocated to a ward and to a physiotherapist. One table in this database is structured as follows: tblWard (WardID, WardName, Capacity, FreeBeds) Give two other tables you could expect to see in this database.
Something like:
tblPatient (PatientID, FirstName, Surname, DateOfBirth, BloodType, Address, NextOfKin, WardID*, PhysioID*)
tblPhysio (PhysioID, FirstName, Surname, Phone)
Define a distributed database.
A distributed database has data stored on a number of computers at different locations, but it appears as one logical database to the user. Users access the data as if it were held centrally in a single source.
What are the advantages to a large company of having a distributed database?
Less network traffic. Data is held closer to where is it processed. This means less chance of bottlenecks in the network and data queries will run more quickly.
No ‘single point of failure’. If a network connection is lost, it is likely that data will still be available to users.
Improved security because data is not all held in one place that could be vulnerable to attack or accidental loss.
Resilience - some data is replicated, so applications will continue to run if one copy of the data is lost.
What are the disadvantages of using a distributed
database?
They are more complex so require experienced staff to manage and maintain.
Data must be synchronised across multiple locations. This can lead to corruption, especially when trying to restore after a crash.
Transfer of data across networks could be vulnerable to attack or loss, unless encryption is used.
Security passwords must be enforced at all locations. If an employee leaves, their account must be deleted on all systems.
All locations must be protected from viruses with anti-virus software and employee code of conduct.
Data may become inaccessible if a critical server fails.