1.3.2 Flashcards
Database
An organized collection of data.
Organizing data in a database allows for easy:
- Adding/ modification/ deletion/searching
Tables
In a table:
- The data is stored in a table (entity/file)
- The table contains records (rows/tuples)
- The table contains fields (columns/attributes)
Relational databases
A database needs an actual link to infer this logic, this link = relationship.
An entity is an item of interest about which information is stored.
A relational database is a database which recognises the differences between entities by creating different tables for each entity.
relational database adv and dis
Data is only stored once- No multiple record changes needed, More efficient storage, Simple to delete or modify details
Better security. By splitting data into tables, certain tables can be made confidential.
Avoids inconsistent records
Easier to change data format
Cost- quite expensive. there must be separate software which needs to be purchased.
physical memory since it is with rows and columns.
large organizations use more number of database systems with more tables. These information can be used to be transferred from one system to another. This could pose a risk of data loss.
Flat file
This database consists of a single file/table.
It can be created very easily using either database/spreadsheet software and is often saved as a comma-separated value file (CSV).
Flat file adv and dis
- Very simple
- Require little to set up
- Suitable for storing small amounts of data
- Take up unnecessary space
- Be slow to query
- Become difficult to maintain
Harder to change data format.
Potential duplication. As more and more records are added to the database it becomes difficult to avoid duplicate records.
Flat file uses
Typical uses:
- Storing contact details
- Small product database
- Maintaining a game or music collection
The flat file will most likely be based around a single entity and its attributes.
Flat files are typically written out in the following way:
Entity1(Attribute1, Attribute2, Attribute3 …)
1:1
1:1
One student has one planner, and one student planner belongs to one student.
1:M
1:M
One tutor group contains many students, or many students belong to one tutor group.
M:M
M:M
One student can be taught many teachers or one teacher can teach many students.
Primary key
This is a unique identifier for each record in the table.
The primary key is shown by underlining it.
Why would a primary key not be a suitable
It is not unique
Foreign key
This is the attribute which links two tables together.
The foreign key will exist in one table as the primary key and act as the foreign key in another.
The foreign key is shown using an asterisk.
Indexes
Database automatically maintains an index of primary keys so that a specific primary key can be located quickly and easily.
It provides a position of each record according to its primary key, makes it quicker.
Secondary key
It allows a database to be searched quickly.
Therefore, a secondary index (secondary key) is set up on the surname attribute. This makes it possible to order and search by surname which makes it easier to find specific students in the database.
why would it be suitable secondary key
It is likely to be used to search for / index
Handling databases
Handling databases goes in four ways:
- Capturing
- Selecting
- Managing
- Exchanging
Capturing
Paper based forms.
A large amount of data is still captured using paper-based data capture forms.
Data input is mainly manual: human reading the from and typing the info into a computer-based system.
To help reduce the chance of the person inputting the data incorrectly:
- Every part of the form clearly labelled
- Instruction to complete the form in a black pen
- Instruction to complete form in capital letters
- Use of tick boxes
- Squares for entering each latter separately
Optical character recognition
CD Often used in post office and road cameras.
To speed up + automate data input, we use OCR.
It automatically reads text by interpreting the shape of the letters, works better with printed text than handwriting.
Optical mark recognition
CD Often used for multiple choice texts and lottery tickets.
The fast and efficient way of collecting data and inputting it in a database while significantly reducing the possibility of human error.
Selecting data SQL
Once captured the info in a DB, we need an efficient way to retrieve it.
Structured query language
SQL allows for fast and efficient retrieval, selection and manipulation of data using a simple set of commands: SELECT/FROM/WHERE
Query by example
SD First graphical query language, making use of visual tables where the user would enter commands and conditions.
Once a query is built using QBE, its converted into statements that can be executed against the database.
By utilizing QBE, the user doesn’t have to remember the finer details of SQL syntax.
DBMS
MD Provide a layer of abstraction for the user and programmer.
Hides the underlying structure of the data and ensures it remains integral by:
- Preventing the creation of duplicate primary keys
- Enforcing validation rules
- Providing secure access
- Providing encryption
- Providing program data independence
- Managing multiple users
Exchanging data
Last of all, we need to make sure we can exchange data between different databases/other systems/applications such as:
- Spreadsheet software
- Accounting programs
To do this, we need to consider: Common formats used for exchanging data/ manual methods + of automatic exchange data exchange
Common formats
XML and JSON, both human readable, open formats for structuring data. Both designed for strong and transporting data.
One system exports data in, another system can accept the same data and be sure of the format it will arrive in.
CSV is another format, each record is stored on a separate line in the file, and each field is separated by a comma.
The structure is fixed and know, so import routines can be written to extract the data from a CSV file.
Once the format for data exchange has bene agreed, we still need to exchange the data and consider how data will move from one system to another.
Manual methods
- Memory stick
- Optical media
- Removable hard disk
- Paper-based
Automatic methods
Two databases interface with one another so they can read and write directly to and from each other tables: a live connection method is known as electronic data interchange (EDI) which is a protocol between 2 systems to facilitate the exchange of data.
- Increases the speed of data transmission and the efficiency of processes for users.
- This process can be entirely automated without any human involvement.
normalising
A way of structuring the data in a relational database according to formal rules, in order to avoid problems of efficiency and security in accessing and maintain the data
0NF
flat file before normalisation
1NF
All field names must be unique
Values in fields must be the same domain (same type of data)
Values in fields must be atomic (one item in each field)
No two records can be identical (remove duplicates)
The table has a primary key (composite is easiest to move to 2NF)
2NF
Be in 1NF
Partial dependencies removed
(move fields to new tables to remove duplicate fields)”
Must remove any duplicates created to continue to be in 1NF
Primary/foreign keys must be added to be in 1NF & 2NF
Linking tables used to remove M:M relationships
3NF
Be in 2NF
Remove transient dependencies
(move or remove fields dependent on non-key fields)
problems that arise from
transaction processing, and how these can be overcome
record locking prevents the overriding of data, and understand how record locking takes place
why would a form be considered 1NF
-No Repeating fields/data
-Data is atomic
-Has a primary Key
why would a form be considered 2NF
Is in First Normal Form
-Every field is dependent on the primary key.
methods/format to provide data
CSV/Comma Separated Value (file)…
- A (text) file/format with values separated by
commas (or some other delimiter)
- XML/eXstensible Markup Language…
- …A markup language that uses tags to denote
data. - SQL/Structured Query Language…
- …A language for creating/querying databases
SQL
A declarative language, meaning it expresses what needs to be achieved as opposed to a procedural language, which would express the logic of how something is achieved, used to manipulate databases.
It enables efficient retrieval, deletion and manipulation of data held in RDB.
Scope of SQL allows us to:
- Query data
- Manipulate data
- Define data
- Control data access
SQL select
SELECT statement is used to collect fields from a given
table and can be paired with the FROM statement to specify
which table(s) the information will come from.
SQL order
The ORDER BY part of the code specifies whether you want it in ascending or descending
order. Values are automatically placed in ascending order and adding ‘Desc’ to the end of
statement will cause values to be displayed in descending order
SQL insert
SQL delete
SQL update
referential integrity
Referential integrity is a data quality concept that ensures that when you make changes to data in one place, those changes are reflected in other related records.
This is done by enforcing a rule that says that the foreign key in one table can only refer to the primary key of another table. This means that if you change the information in one column, it will automatically be updated in all other related columns.
transaction processing
ACID
Atomicity Consistency Isolation Durability.
A relational database that fails to meet any of these four goals cannot be considered reliable.
atomicty
Atomicity guarantees that there are only two possible outcomes from a task (known as a transaction) that involves changing multiple sets of interdependent data: either every step in the task completes successfully and can be committed to the database as a single unit or, if one step should fail, all other steps can be undone (“rolled back”), leaving the global state of the database unchanged.
consistency
Consistency is the capability of the database engine to protect a database from any change of state that could leave any set of data in out-of-synch with any other set.
Referential integrity is the process of ensuring consistency. This ensures that information is not removed if it is required elsewhere in a linked database.
The practical implementation of such dependencies is by means of declarative referential integrity constraints (“foreign keys”) enforced by automatically-generated triggers.
isolation
Isolation refers to the capability of the database to allow each user (or transaction) to operate as though it were the only user (or transaction).
The isolation mechanism must be capable of keeping each transaction’s view of database state consistent as long as that transaction is running, regardless of any changes that are performed by other transactions.
durability
Durability ensures that changes made to the database (transactions) that are successfully committed will survive permanently, even in the case of system failures. They will be committed to permeant storage. This ensures that the data within the database will not be corrupted by:
Service outages
Crashes
Other cases of failure
Durability is achieved through the use of changelogs that are referenced when databases (or portions of the database) are restarted.
record lock
The process of preventing simultaneous access to records in a database is called record locking and it is used in order to prevent inconsistencies or a loss of updates. While one person is editing a record, this ‘locks’ the record so others cannot access the same record.
Redundancy
There is some information that people and companies cannot afford to lose. Redundancy is the process of having one or more copies of the data in physically different locations. This means that if there is any damage to one copy the others can be recovered.