Section Four: Exchanging data Flashcards
Chapter 15 – Compression, encryption and hashing
What is compression?
Compression is a technique that reduces file size. Some compression techniques are referred to as lossy, because they cause data to be lost during the compression process. Other techniques are referred to as lossless, because they allow the original data to be exactly reconstructed from the compressed data
Chapter 15 – Compression, encryption and hashing
What is lossy compression?
Lossy compression is a term used to describe compression techniques that always result in a loss of data, i.e. data is permanently removed from the file. When the file is decompressed, the data that was removed (in the compression process) is re-created from the data that remains in the file.
Chapter 15 – Compression, encryption and hashing
What is lossless compression?
Lossless compression restores and rebuilds file data in its original form after the file is decompressed. For example, when a picture’s file size is compressed, its quality remains the same. The file can be decompressed to its original quality without any loss of data.
Chapter 15 – Compression, encryption and hashing
What is a Run Length Encoding (RLE)?
Run length encoding (RLE) is a lossless compression technique. It works by finding runs of repeated binary patterns and replacing them with a single instance of the pattern and a number that specifies how many times the pattern is repeated.
Chapter 15 – Compression, encryption and hashing
Dictionary-based compression techniques
Dictionary-based compression is another lossless technique that relies on finding patterns in data. However, it does not look for repeated runs of characters, as the repetitions do not need to be adjacent to each other. Instead of storing the pattern itself, a shorter code is substituted.
Chapter 15 – Compression, encryption and hashing
Encryption
Encryption is the transformation of data from one form to another to prevent an unauthorised third party from being able to understand it. The original data or message is known as plaintext. The encrypted data is known as ciphertext. The encryption method or algorithm is known as the cipher, and the secret information to lock or unlock the message is known as a key.
Chapter 15 – Compression, encryption and hashing
The Caesar cipher
The Caesar cipher (also known as a shift cipher) is a type of substitution cipher and works by shifting the letters of the alphabet along by a given number of characters; this parameter being the key. Below is an example of a shift cipher using a key of 5.
Chapter 15 – Compression, encryption and hashing
The Vernam cipher
The Vernam cipher, invented in 1917 by the American scientist Gilbert Vernam, is the only cipher still proven to be unbreakable. All others are based on computational security and are theoretically discoverable given enough time, ciphertext and computational power.
Chapter 15 – Compression, encryption and hashing
One-time pad
The encryption key or one-time pad must be equal to or longer in characters than the plaintext, be truly random and be used only once. One-time pads are used in pairs where the sender and recipient are both party to the key. Both must meet in person to securely share the key and destroy it after encryption or decryption. Since the key is random, so will be the distribution of the characters meaning that no amount of cryptanalysis will produce any meaningful results.
Chapter 15 – Compression, encryption and hashing
The bitwise exclusive or XOR
An XOR operation is carried out between the binary character value of the first character of the plaintext and the first character of the one-time pad.
Chapter 15 – Compression, encryption and hashing
Cryptanalysis and perfect security
Other ciphers that use non-random keys are open to a cryptanalytic attack and can be solved given enough time and resources. Even ciphers that use a computer generated random key can be broken since mathematically generated random numbers are not actually random; they just appear to be random.
A truly random sequence must be collected from a physical and unpredictable phenomenon such as white noise, the timing of a hard disk read/write head or radioactive decay. Only a truly random key can be used with a Vernam cipher to ensure it is mathematically impossible to break.
Chapter 15 – Compression, encryption and hashing
Symmetric (private key) encryption
Symmetric encryption, also known as private key encryption, uses the same key to encrypt and decrypt data. This means that the key must also be transferred (known as key exchange) to the same destination as the ciphertext, which causes obvious security problems. The key can be intercepted as easily as the ciphertext message to decrypt the data. For this reason asymmetric encryption can be used instead.
Chapter 15 – Compression, encryption and hashing
Asymmetric (public key) encryption
Asymmetric encryption uses two separate, but related keys. One key, known as the public key, is made public so that others wishing to send you data can use this to encrypt the data. This public key cannot decrypt data. Another private key is known only by you and only this can be used to decrypt the data.
It is virtually impossible to deduce the private key from the public key. It is possible that a message could be encrypted using your own public key and sent to you by a malicious third party impersonating a trusted individual. To prevent this, a message can be digitally ‘signed’ to authenticate the sender.
Chapter 15 – Compression, encryption and hashing
Hashing
A hashing function provides a mapping between an arbitrary length input and a usually fixed length or smaller output. Unlike the encryption techniques described above, it is one-way; you cannot get back to the original. This is useful for storing encrypted PINs and passwords so that they cannot be read by a hacker. To verify a user’s password, the software applies the hash function to the user input and compares it with the one stored.
Chapter 15 – Compression, encryption and hashing
Cryptographic hash functions
A hash total is a mathematical value calculated from unencrypted message data. This value is also referred to as a checksum or digest. The process is irreversible and impossible to crack other than by trying all of the possible inputs until a match is found. Since the hash total is generated from the entire message, even the slightest change in the message will produce a different total.
Chapter 15 – Compression, encryption and hashing
Digital signatures
A digital signature is a way to ensure that an electronic message or document is authentic.
The signature is created when the message is sent, using a private encryption key.
This is the opposite to normal PKE.
The signature is then paired with a public key and sent with the message. When the message run through the public key the result should match the signature.
If they don’t match then the message has been altered en route. This shows that the message has been intercepted and compromised.
Chapter 15 – Compression, encryption and hashing
Digital certificates
While digital signatures verify the trustworthiness of message content, a digital certificate is issued by official Certificate Authorities (CAs) such as Symantec or Verisign and verifies the trustworthiness of a message sender or website. This certificate allows the holder to use the Public Key Infrastructure or PKI. The certificate contains the certificate’s serial number, the expiry date, the name of the holder, a copy of their public
Chapter 16 – Database concepts
Modelling data requirements
When a systems designer begins work on a new proposed computer system, one of the first things they need to do is to examine the data that needs to be input, processed and stored and determine what the data entities are.
Definition: An entity is a category of object, person, event or thing of interest to an organisation about which data is to be recorded.
Examples of entities are: Employee, Film, Actor, Product, Recipe, Ingredient. Each entity in a database system has attributes.
Chapter 16 – Database concepts
A flat file database
A flat file database consists of a single file. It might be a suitable structure to hold the names and addresses of all members of a sports club, or information about all the DVDs in your personal collection.
Most databases, however, are concerned with more than one entity, and the relationships between the entities.
Chapter 16 – Database concepts
Entity identifier and primary key
Each entity needs to have an identifier which uniquely identifies the entity. In a relational database, the entity identifier is known as the primary key and it will be referred to as such in this section. Clearly none of the attributes so far identified for Dentist and Patient is suitable as a primary key. A numeric or string ID such as D13649 could be used. In the entity description, the primary key is underlined.
Chapter 16 – Database concepts
Secondary key
A database needs to be set up so that it can be searched quickly. An index of all the primary keys in the database, and where the record is held, is automatically maintained by the database software.
However, more than one index may be needed. If for example a patient rings up to make an appointment with the dentist, they are unlikely to know their patient ID, A secondary index on surname is likely to be held.
Chapter 16 – Database concepts
Relationships between entities
The different entities in a system may be linked in some way, and the two entities are said to be related.
There are only three different ‘degrees’ of relationship between two entities.
A relationship may be
1. One-to-one
Examples of such a relationship include the relationship between Husband and Wife, Country and Prime Minister.
- One-to-many
Examples include the relationship between Mother and Child, Customer and Order, Borrower and Library Book. - Many-to-many
Examples include the relationship between Student and Course, Stock Item and Supplier, Film and Actor.
Chapter 16 – Database concepts
Entity relationship modelling
The relationship between entities can be modelled graphically. An entity relationship diagram is a diagrammatic way of representing the relationships between the entities in a database. To show the relationship between two entities, both the degree and the name of the relationship need to be specified.
Chapter 16 – Database concepts
The concept of a relational database
In a relational database, a separate table is created for each entity identified in the system. Where a relationship exists between entities, an extra field called a foreign key links the two tables.
Chapter 16 – Database concepts
Foreign key
A foreign key is an attribute that creates a join between two tables. It is the attribute that is common to both tables, and the primary key in one table is the foreign key in the table to which it is linked.
Example: In the one-to-many relationship between Dentist and Patient, the entity on the ‘many’ side of the relationship will have DentistID as an extra attribute. .
Chapter 16 – Database concepts
Linking tables in a many-to-many relationship
When there is a many-to-many relationship between two entities, tables cannot be directly linked in this way.
For example, consider the relationship between Student and Course. A student takes many courses, and the same course is taken by many students.
In this case, an extra table is needed to link the Student and Course tables. We could call this StudentCourse, or Enrolment, for example.
The three tables will now have attributes something like those shown below:
Student (StudentID, Name, Address)
Enrolment (StudentID, CourseID)
Course (CourseID, Subject, Level)
Chapter 16 – Database concepts
Composite key
In this data model, the table linking Student and Course has two foreign keys, each linking to one of the two main tables. The two foreign keys also act as the primary key of this table. A primary key which consists of more than one attribute is called a composite primary key.
Chapter 16 – Database concepts
Drawing an entity relationship diagram
A database system will frequently involve many different entities linked to each other, and an entity relationship diagram can be drawn to show all the relationships.