L16 - Data Compression 2, LZW Coding Flashcards
What is data compression?
The removal of redundancy from data packets being transmitted. This improves reliability by decreasing the extent to which noise can interfere with the data.
What is Source Coding?
Another word for data compression.
What are the 3 models that compression algorithms can be based on? Define each…
Static Model: Applies the same compression technique for all text in the file.
Pro -> Fast
Con -> Non-optimal since different text have different statistical properties.
E.g. -> ASCII, Morse Code.
Adaptive Model: Compression models that progressively learn and update the model based on incoming text.
Pro -> Accuracy → Leads to better compression of data.
Con -> Decoding must start from the beginning of the data.
E.g. LZW Coding
Dynamic Model:
- Learn the statistics of the text, and generates a model based on what’s learnt.
- A preliminary test is required to generate the model.
- Model must be transmitted to the receiver.
- E.g Huffman Code
What is LZW coding?
- An adaptive dictionary coding scheme for lossless data compression.
- Widely used for UNIX file compression and GIF image format compression.
LZW Coding uses Dictionary Coding, explain how dictionary coding works…
- Doesn’t use statistical knowledge of data.
-
Encoding → Reads input data stream and searches for its pattern in the dictionary.
- Found → Replaces the input data stream with the index of the dictionary entry that the pattern matches.
- Not found → Inserts the input data pattern into the dictionary.
- Decoding → Inverts the encoding process by following the encoded indices to reconstruct the original data.
Give pros and cons of LZW coding…
Pros:
- Efficiently compresses data with redundant data or repeating sequences.
Cons:
- If data is already compressed or is randomised, compression wont be as effective.
Walk through the steps of how the LZW algorithm compresses (encodes) data…
- Have an initial dictionary to build off, usually ASCII.
- From the first 2 elements in the input message, create the compression dictionary…
- If the substring is a key in the dictionary, it has been visited before. Thus, move the right pointer 1 element to the left to create a new, larger substring and check again.
- If substring is not a key in the dictionary, it hasn’t been visited before. Thus, this new substring becomes a key with an associated unique value.
- Repeat until all unique substrings in the data have associated key-value pairs in the dictionary.
- Using the dictionary, the input data can be compressed.
Explain what LZW encoding is and how it works…
- A dictionary based compression algorithm.
- Builds off an initial dictionary, usually ASCII codes.
- Works by creating a dictionary item for each unique sub-string encountered in the input message.
- A specific dictionary value can be used whenever that item is encountered.
- For example, if the substring ‘ought’ has 3 instances in the input text, the same code will be used for each to represent that substring in the compressed output.
What through the steps of how the LZW algorithm decompresses (decodes) data…
- Start with the same base dictionary as used in encoding.
- For each encoded element in the compressed data, check if it’s in the dictionary.
- If yes, move right pointer to the next encoded element, append the codes together to create a new code. Then add the substring and code to the dictionary.
- Due to the above, any time an encoded value re-encountered, the same substring is decoded.