B2P2 - Information theory and lossless compression Flashcards
ASCII coding
a standard 7-bit code for repping a source alphabet of 128 symbols, comprising the letters A-Z (upper and lower case), the numerals 0-9 and various other symbols and control characters;
ASCII = American Standard Code for Information Interchange;
binary-coded decimal (BCD)
a code for binary rep of decimal numbers in which each of the decimal digits 0-9 is repped by a four-bit code word that is the binary equivalent of the decimal value;
code tree (decision tree)
a graphical rep of a procedure for the generation and decoding of instantaneous codes;
[e.g. a Huffman code tree?]
compression
in the context of source coding, the reduction of the number of bits required to rep a source;
compression may be either lossy or lossless;
entropy
in information theory, the entropy of an information source is the theoretical minimum number of bits per symbol needed to rep the transmitted symbols;
it is a measure of the rate at which information is being transmitted from the source;
Huffman coding
a procedure in which source symbols with the highest probabilities are allocated systematically to the shortest code words;
Huffman coding results in a maximally efficient source code;
information theory
a body of theory that addresses fundamental performance limits of communications systems;
instantaneously decodable code
a code that is not only uniquely decodable, but for which each received code word can be decoded immediately after it has been received;
it is not necessary to wait for any later code words;
Lempel-Ziv-Welch (LZW)
a method of compression based on detecting patterns in data, assigning short codes to them, and then using those codes when the same patterns appear again;
[is a generalisation of RLE];
lossless compression
any source coding technique that enables an exact reconstruction of the original source data from the compressed rep;
lossy compression
any source coding technique where there is an irreversible loss of information, and only an approximate version of the original source data can be reconstructed from the compressed data;
message
in the context of information theory as formulated by Claude Shannon, an information source generates a sequence of messages to be sent over a comms channel;
each message consists of a source symbol that is selected from the source alphabet;
prefix
in the context of source coding, a code that also forms the first part of another code;
e.g. 11 would be a prefix code for the code 1101;
pseudocode
a semi-formal description of an algorithm using the conventions of computer programming but intended for a human reader;
run-length encoding (RLE)
a method fo data compression based on sending numbers that indicate the lengths of runs of symbols of the same type;