Unit 2 Text and Image Processing Flashcards
Types of text:
- Unformated text (plain text, ASCII character code)
- Formatted text
- Hypertext
Text file formats
File formats are application specific
TXT - plain text document without much formatting options
DOC - created by document editing softwares like Microsoft word and is in binary format
RTF - Rich Text Format using Microsoft
PDF - Portable Document Format (binary format)
PS - Post Scripts (programming language the defines look of a printed page)
What is meant by text compression?
Reducing the size of a text document without changing the contents of the document while storing on a disk
Components of a compression technique:
Encoder
Storage or network
Decoder
What is the compression ratio?
The ratio of the total number of bits required before compression to the total number of bits required after compression
What is Huffman coding?
Instead of fixed sized code words variable length code words are derived such that the shortest code words are used for the words occurring frequently
What is the requirement of Huffman coding?
Used in applications where the text to be compressed has some known characteristics in terms of characters and their relative frequency of occurrences
Applications of Huffman coding
Fax machines, JPEG, MPEG
Image file formats
BMP - lossless, developed by Microsoft
TIFF - lossless file format, high quality, large size
JPEG - lossy, small size
GIF - lossless compression, larger size than JPEG, limited color range(256 colors)
PNG - lossless, support 16 million colors
Run–Length–Encoding (RLE)
- Applied across a sequence of characters
- Sequence contes repetition of characters
- Main idea is to replace the repetition of character by the frequency count of occurrence of that character
- Can easily encode images which are binary that is only black and white with continuous regions
- Can be used for two dimensional image data for every row
Shannon-Fano Algorithm
- It is a variable length coding (VLC) method, codewords generator is variable
- Uses top-down approach
- can be implemented using binary tree
Arithmetic Coding
- Modern coding method that outperforms Huffman coding
Vector Quantization
- Lossy compression
- Algorithm:
1. Vectorisation
2. Codebook generation
3. Encoding
4. Decoding - Key points: Variable bitrate, vector size and complexity, codebook design
Fractal Compression technique
- Depends on the concept of fractals
- Algorithm:
1. Partition (image divided into smaller non overlapping blocks or regions)
2. Iterative Matching (affine transformations)
3. Error calculation (differences in pixel values)
4. Encoding (parameters of transformation and errors)
5. Decoding (application of inverse transformation)
Transform Coding
- Lossy compression technique
- Algorithm:
1. Dividing image into blocks
2. Apply transform like DCT(discrete cosine transform) on each block to generate transform coefficients
3. Quantization
4. Encoding (any symbol encoding algorithm is used)
5. Decoding entire above process reversed - Used for JPEG compression