Exchanging Data Flashcards
Lossy Compression
Actual data is removed from the file in order to reduce its size. An algorithm is used to strip out the least important data. The original file cannot be restored as data is physically removed.
Lossless Compression
Reduces the file size but allows the original data be perfectly reconstructed from the compressed data. Actual data is still removed however the data is encoded in such a way that the original can be recreated.
Less effect at reducing file size than lossy.
Run Length Encoding
Run length encoding is used when there is repeated redundant data. It stores the item once and then an index with how often the item is repeated.
Dictionary Coding
Dictionary coding is when an index is built where every data item in the file is recorded along with an indexed reference. The final compressed file will just consist of the dictionary index along with the sequence of occurrences needed to recreate the original file.
Purpose of Compression
- Reduce download times
- Reduce requirements on file storage
- Make best use of bandwidth
uses of lossy
multimedia files e.g. MP3, JPEG, MPEG
uses of lossless
text files and computer programs
Encryption
Encryption is the process of encoding a message so that it can be read only by the sender and the intended recipient.
Symmetric Encryption
Symmetric Encryption is when the same key is used to encrypt and decrypt the message. Both party’s must know the key and also keep it secret.
Negatives of Symmetric encryption
There is a security risk as the key may be intercepted or the process of creating the key may be duplicated meaning the data can be decrypted by a third party.
Asymmetric Encryption
Asymmetric Encryption (Public Key Cryptography) is when a public key and a private key is used. The public key is used to encrypt the data and the private key is used to decrypt the data. If you use person X’s public key to encrypt the data only person X’s private key will be able to decrypt it
Hashing
Hashing is the process used to transform a data item into something different. A hashing function provides a mapping between an arbitrary length input and a fixed length output. It is a one-way transformation meaning you cannot get back to the original form.
Uses of Hashing
- generating disk address
- storing and checking passwords
digital signature
Data is encrypted using the users private key. If the receiver can decryptit using the senders public key they know that the message is authentic
Uses of assymetric encryption
used for transfering data e.g. online shopping
Uses of symmetric encryption
Used when the same person is accessing and saving data e.g. for backing up
benefits of symmetric
- encrypted very quickly
- simple and easy
Benefits of asymmetric
- no movement of keys, more secure
- can be used for digital signatures
drawbacks of assymmetric
- not as fast
Database
Databases are structured, persistent collections of data.
Flat File
single table database. It is inefficient as it is difficult to query and leads to data redundancy which can cause errors.
Relational Database
A relational database has more than one related table. It is more efficient that a flat file database as no data is duplicated meaning querying is easier.
Entity Relationship Modelling
when the relationships between tables are shown in abstracted view.
relationships
one-to-one
one-to-many
many-to-many
primary key
A field that uniquely identifies each record
Foreign key
a foreign key is a field in one table that uniquely identifies a row of another table
composite key
A composite key is a combination of two or more columns in a table that can be used to uniquely identify a row.
Secondary key
any field in a database which is not a primary, candidate or foreign key. They are used to order queries.
Method of capturing data (Automated)
barcode readers, scanners, sensors, magnetic ink character recognition, smart card readers
Method of capturing data (Manual)
Paper data capture form (read by OCR & OMR or typed in manually)
Data redundancy
The unnecessary duplication of data in a database
Variable length fields
an element may use a different number of bytes to another element for example a different number of characters.
Means it only uses the necessary amount of storage
Hashing Databases
transforms a string of characters in a record into a shortened form that can be used as a disk address
Attribute/field
the columns (describes the characteristics of each record)
Record/Tuple
the rows (a data set that applies to one item)
Normalisation
Organising the attributes and relations of a relational database to minimise redundancy
Atomic
data is in its lowest level of detail and cannot be split into separate attributes
partial key dependency
When a secondary key is dependent on another secondary key. (Not the primary key).
non key dependency
where the value of an attribute is determined by the value of another attribute which is not part of the key.
all attributes are dependent on..
the key, the whole key, and nothing but the key
first normal form
- Eliminate duplicate columns
- Get rid of any groups of repeating data
- Identify the primary key
- Separate out any attributes which are not atomic into separate attributes
second normal form
- Check the data is already in 1NF
- Remove any partial dependencies
- ‘fix’ any many to many relationships you discover
third normal form
- Check the data is already in 2NF
2. Check there are no non-key-dependencies
SQL
SQL is a language which allows for fast, efficient querying and reporting of vast amounts of data held in a relational database. It is a very high-level language.
create a new table SQL
CREATE TABLE tblName
Insert data into a table SQL
INSERT INTO tblName
Query SQL
SELECT *
FROM tblName
WHERE conditions
Order Queried data SQL
ORDER BY
update attribute SQL
SET
Condition between two values SQL
WHERE x BETWEEN a AND b
Check if an attribute is similar to a given value SQL
LIKE
Data Integrity
Data integrity is the maintenance and consistency of data in a data store.
The data store must reflect the reality that it represents.
Referential Integrity
Referential Integrity is where table relationships must always be consistent.
Any foreign key field must agree with the primary key that is referenced by the foreign key. Thus, any changes to the primary key field must be applied to all foreign keys in another table, or not at all.
Transaction Processing
any information processing which is divided into individual, indivisible operations each operation must secede or fail as a complete unit.
ACID
Atomicity
Consistency
Isolation
Durability
Atomicity
a change to a database is either completely performed or not performed at all. A half-completed change MUST NOT be saved back to the database.
Consistency
Any change in the database must retain the overall state of the database.
Isolation
a transaction must not be able to be interrupted by anther transaction. The transaction must occur in isolation so that other users or processes cannot have access to the data concerned.
Durability
Once a change has been made to a database it must not be lost due to a system failure.
Record Locking
Recording Locking prevents simultaneous access to objects in a database in order to prevent updates being lost or inconsistencies in the data arising.
Database Management System
software that handles the data that is stored in secondary storage.
What does the DMBS provide
Security backups index updating enforcement of referential integrity facilities to update and query the database
Why Do RISC processors result in increased battery life
- smaller instruction set
- fewer transistors/ less complex circuitry
- less power required
Composite Key
A unique identifier made from different fields
Secondary Key
Any field which isn’t a primary/composite key which can help make a query more efficient
Methods of sending data
- Electronic Data Interchange (EDI)
- Application Programming Interface (API)
- A URL to a file
EDI
Is the computer-to-computer exchange of business documents in a standard electronic format between business partners.
API
a prewritten set of subroutines that provide access to the
companies data
Formats of data
- CSV (Comma Separated Value)
- XML (eXstensible Markup Language)
- SQL (Structured Query Language)
CSV (Comma Separated Value)
A text file/format with values separated by
commas
XML (eXstensible Markup Language)
A markup language that uses tags to denote data
SQL (Structured Query Language)
A language for creating/querying databases
SQL query
SELECT
FROM
WHERE
SQL similar elements
LIKE
delete rows of a table
DELETE
FROM
WHERE
insert items into a table
INSERT INTO table_name (column….)
VALUES (value…)
delete full tables SQL
DROP TABLE table_name;
combine rows from multiple rows
JOIN
wildcard
*, selects all the elements
positives of the DBMS?
- provide security
- provide automatic backup
- enforce data integrity rules
- control data redundancy
- provide users with controlled access to data they need
what does security keeping data safe from?
- accidental or deliberate loss
- malicious access
Data integrity is the state of data being
as intended and accurate
persistent
the data remains for as long as its required
structured
the data is organised in a logical way
indexing
used to quickly access and locate data in a table
Data redundancy
the unnecessary duplication of data in a database. When updates occur, all the instances of a data item must be changed. This leads to errors and also wastes storage space.
relational databases
multiple tables linked together
deadlock
when both users try and access the database at the exact same time meaning neither can edit it
Timestamp ordering
every object has a read timestamp and a write timestamp meaning if the read timestamps are not the same the DBMS knows another user has accessed the same object
how is redundancy used usefully
organisations build in redundancy e.g. duplicate hardware so that if the main system fails the backup can take over
Two characteristics to look for in a hashing algorithm
- quick to calculate
- minimizes the collisions
- provides a smaller output than input
EDI
Electronic data interchange is the computer to computer exchange of documents. All documents must be in a standard format/
Serialisation techniques
Timestamp ordering
Commitment ordering
Serialisation
The process of ensuring transactions do not overlap in time and therefore cannot interfere with each other
why is removing many-to-many relationships good?
- takes the table to 3NF
- reduces data inconsistencies/ redundancies
Uses of sequential files
address/ telephone books
Uses of serial files
storing transactions of a shop
structured
Data arranged in a logical and consistent pattern
Benefits of fixed length fields
- easier to write the software
- easier to search
- file size can be estimated or planned
what order are records stored in a database
no order, whichever order they’re inputted
Why are primary keys neccessary?
- to identify a row unambiguously
- To link to the foreign keys of other tables
Difference between data redundancy and data integrity
- Data redundancy is unnecessary repetition of data
- Data integrity is accurate data which reflects reality