2.5 Databases Flashcards
What is meant by Big Data
Refers to data sets so large and complex that it becomes difficult to process using standard database techniques
Data mining
The analysis of a large amount of data in a data warehouse
Predictive analysis and Example
consist of a variety of statistical techniques including modelling, machine learning, and data mining.
Example: In business, predictive models analyses patterns found in historical and transactional data to identify patterns that may present risks or opportunities.
Data normalisation (1NF,2NF,3NF)
Moving from:
· unnormalised data to 1NF involves ensuring there are no repeating attributes, attributes should be atomic
· 1NF -> 2NF involves ensuring there are no partial dependencies
· 2NF -> 3NF all data items depend on nothing but the primary key
Primary key and foreign key
PK - uniquely identifies a record in a database
FK - a field in a table which links to a primary key in another table
enables data in different tables to be linked together
Indexes definition
An index is a list of key fields to improve access times to records and sort the records
Data consistency definition
For data to be consistent, it must be added only if it satisfies the rules of the database.
Different views of the data
3 points
allow users to access/read/write to/amend/delete only part of DB
Allow database users to access only certain records or certain fields
May link tables together so users view is as if only one table
Describe three problems associated with paper-based systems and how computerised databases could solve these problem
- Difficult and time consuming to amend/easy to make mistakes
S. It is easy to amend / update data in a database to minimise errors - Difficult to encrypt so accessible if stolen
S. Easy to encrypt so not compromised if stolen - Difficult for multiple persons to look at the same record
S. Many people can view the same record (only one can update)
Describe four benefits (for the college) of using a computerised database system.
· Database would be easy to and quick to search for a student or course details
· Easy to back up student or course details in a computerised database
· It is easy to overwrite / amend / update student or course details in a database
· Database allows different access rights for different college staff
Verification and it’s purpose
Verification checks are carried out when data is being entered and when data is being transferred from one place to another
Purpose is to ensure data are consistent and ensure data have not been corrupted
Double entry verification
Ask customer to type password twice and compare both inputs to check that they are the same
Data validation techniques
…
Data dictionary definition
A list of information about all the fields used in a database
It will usually include the table names, fields, primary keys and the field validation
Outline the role of a database administrator
The person in a company who is responsible for the structure, security and management of the database system and the data in it
Describe why distributed databases are often used and identify one difficulty associated with using distributed databases. Explain what is actually distributed in a distributed database.
It is often more efficient to store data on a number of different computers to maximise performance.
It is difficult to ensure that all the data in all the computers is always up-to-date / maintain integrity.
Both processing and data are distributed across the different computers that the data is stored upon
How a random access file operates
6 Marks
· Physical location for new record is calculated from the key field
· A hashing algorithm is used for this calculation to find the location
· If data collision /something there, the record is stored instead in an overflow area
· Data in the overflow area is normally stored and searched in a linear manner
· File may need reorganising if overflow becomes too large
· Existing records are accessed in the same way.
Explain what is meant by data normalisation in a relational database. With benefits
Normalisation:
· is a way of structuring data according to theoretical rules
· normalising data usually reduces data duplication/redundancy
· avoids danger of inconsistency / maintains integrity
· avoids danger of data being lost during update
· avoids wasting processing time
· probably enables easier maintenance of the database
· allows different views of the data
Describe the difference between flat file and relational database systems
A flat file system may contain a number of single tables with no links between them, whereas a relational database normally contains a number of linked tables
Advantages of using a distributed database
5 Marks
· Resilient. A problem in one site will not stop other sites from working.
· Security. Staff access can be limited to only their portion of the database.
· Network traffic is reduced so reducing bandwidth costs.
· A single site database still works even if the connection between sites is temporarily broken
· Expense: either cheaper or more expensive but has to be properly qualified
Advantages of a relational over a flat file
Redundancy is reduced
Risk of inconsistent data is reduced
Data independence allows different views of the same data
Allows easy extension to the structure of the database