Compression, Encryption, Hashing Flashcards
Run length encoding
Finds runs of repeated binary patterns and replaces them with a single instance of the pattern and the number of how many times it is repeated.
Well suited to image data as these often have repeated blocks of colour.
Lossless.
Dictionary coding
Algorithm allocates a code for each different word and store the code instead of the word. Table stores which word each code represents.
Lossless
Needs enough bits for all the unique words.
Lossy compression
Compression techniques that always result in a loss of data.
The recreated data will never be an exact copy of the original - it will be an approximation.
Used in audio files, JPEG (photos), MPEG-1 and MPEG-2 (video files)
Lossless compression
Compression techniques where no data is lost.
When decompressed, the contents are identical to the original.
Used when data cannot be approximated eg text files.
Common formats: PNG (high quality pictures), GIF and ZIP files.
Encryption
The process of converting a plain text message into cipher text using a cipher and a key.
Cipher
A set of instructions (algorithm) for encrypting plain text. The same algorithm can be used with many different key values.
Key
Additional information used by the cipher algorithm. Also needed for the decryption process.
Symmetric encryption
The same key is used to encrypt and decrypt so the key must be shared between the sender and receiver.
The Diffie-Hellman key exchange allows both parties to generate a shared key using a set of public and private keys. This means there is no exchange of keys, so the encryption is more secure.
Generates the key over the Transport Layer Security at the start of a session.
Asymmetric encryption
Different keys (mathematically linked) are used to encrypt and decrypt. This means keys do not need to be shared.
The RSA algorithm (Rivest, Shamir, Adleman) generates several components which are mathematically linked so that a message encrypted using one set can be decrypted using the other set.
These components are known collectively as a public and private key pair.
Public key is widely shared but private key is only known to its owner.
Sender authentication
Encryption: sender’s private key + receiver’s public key
Decryption: receiver’s private key + sender’s public key
If it was not the correct sender, the receiver would not be able to decrypt it.
Hash table
Must be an array - have to access each position directly by specifying the index
Positions are sometimes called buckets - each is used to store data. Can be single data or more complex eg a record
Size must be big enough to store everything but not so big is wastes space. Must have some free space.
Load factor = n/k
k = number of buckets
n = number of occupied buckets
Load factor of about 0.75 is optimal
Hash functions
Always produce the same hash value for the same hash key
Provide a uniform distribution of hash values
Minimise clustering - when many different keys produce the same hash value
Be very fast to compute
Inserting data into a hash table
The hash function generates the index of the position in the array that will be used to store the data.
The key must be stored as part of the data
Retrieving values from a hash table
The hash function is applied to the key to generate the index.
Always O(1) unless collisions have occured
If no data in that index, the data is not in the table.
Collisions
When two or more keys produce the same hash value.
Linear probing: the second data is placed in the next available position. To retrieve, mush check the index to see if it matches the key. If not, linear search until match is found.
Chaining: a linked list is created at that index. To retrieve search along the linked list until the key is found.