2) Databases and Representation of data Flashcards
What is data compression?
Compression is the process of reducing the number of bits required to represent data.
Different types of techniques of compression are used dependent on the data type being compressed.
Why is compression needed?
Compression is important as machines have a finite amount of storage space, meaning that there is limited space as the quality of videos and images has become more prevalent.
The better the compression algorithm used, the faster you will be able to download the same quantity of data over the internet.
Lossy Compression
Lossy compression is where compression is achieved by permanently removing un-needed data ( Meta data), but once removed then it can never be retrieved.
Formats are JPG and GIF.
Advantages:
- Significantly decreases the size of the data in a file from the original, ( 60-70%) reduction.
- Removes unwanted data (Meta data).
Disadvantages:
- Image quality decreases
- Irreversible change so once data has been removed it is permanently gone.
Lossless Compression
Lossless compression is where compression is achieved without any data being lost, so the entire document can be retained.
Formats are PNG and RAW.
Advantages:
- Data removed can be retrieved
- Produces a reduced image size with image quality remaining the same.
Disadvantages:
- Very low in reduction of file size.
- Resultant file is not as small.
Run Length Encoding (RLE)
RLE is a method of compressing data by eliminating repeated data.
For example, if there are large blocks of colour it is highly inefficient to hold 32 bits per pixel as instead a colour and repetition value can be stored.
Dictionary Based Compression
This form of compression is used to compress a text file as letters and words can be repeated. For each word in a file, a corresponding code is given.
For this compression, a file with all the codes must be saved with the file. This makes dictionary methods impractical for small files.
Example;
In : 00000
Have : 00011
Can : 01100
How does Lossless Compression Work?
Lossy Compression- MP3
Works by recording patterns in the data rather than the data itself. Using pattern information, a new file can be replicated exactly without any loss of data.
Lossy compression removes the sounds that are out of the range of human hearing. Quieter notes are played at the same time as louder sounds are also removed.
RLE of sound
A sound recording can have thousands of samples taken per second. The same sound can be played for a second can result in hundreds of identical samples.
RLE records one example of the sample and how many times it should be repeated.
Caesar Cipher
The caesar cipher is the most basic type of encryption as all letters of the alphabet are shifted by a consistent amount. Spaces are often removed to mask the word length.
To brute forcibly decrypt you could assume the letter that occurs the most frequently are vowels.
E is the most common letter followed by T, A, O, I, N, S, R, H.
Vernam Cipher
The Vernam cipher is a method of encryption that uses a one-time pad to create a ciphertext that is mathematically impossible to break.
The one-time pad is a key used once to encrypt and decrypt a message before being discarded.
The one-time pad must be a random sequence and only ever used once.
Must be shared with the recipient by hand and immediately destroyed after use.
Decoding
Encryption and decryption of the message is performed bit by bit using an exclusive or (XOR) operation with a shared key.
L = 01001100
XOR
c = 01100011
00101111 = /
Capturing an image
A digital camera breaks up the light through its lens into a grid of pixels.
A light sensor measures the intensity of colour in each pixel. Each measurement is converted into binary code using an analogue-to-digital converter.
The number of pixels recorded in the grid affects the number of bits used and therefore, the size of the file created.
Bitmapped Graphics
Bitmapped graphics use a grid of pixels, with each pixel given a specific colour value. E.g. (X=21, Y=3) 75a248
Number of pixels
Resolution = number of pixels wide x the number of pixels high
2438 by 2124 = 5,178,312 pixels 5.2 Megapixels
Resolution
Resolution is the width x height or pixel per inch.
If an image is made bigger or smaller, the size of each pixel grows or shrinks to maintain the required resolution. This is why there is a deterioration in quality when a bitmap is resized.