Block 2 Part 3 Flashcards
Perceptual redundancy
- information contained in audio or visual signal that can be removed without affecting recipients experience of signal
Compression level (coding efficiency
- this is how far you can compress a file
- there is a trade off between how far you can compress a file and keeping enough of the original signal
Permissible distortion
- once acquired a digital source representation, need to represent it using the smallest number of bits possible for permissible distortion
Coding source into fewest possible number of bits
- allows either lower bit rate (bandwidth) to be used for transmitting compressed data
- or transmission to be completed faster
Rate distortion (RD)
- in all source coding algorithms, relationship between compression level achieved and resulting distortion formalised by RD
- every source coding algorithm has RD
Pulse code modulation (PCM)
- digitising analogue signal normally done by PCM
- analogue signal first subjected to sampling to create pulse amplitude modulation (PAM) signal
- each sample assigned to one of finite number of possible discrete values in process called quantising
- resulting bitstream goes through further lossless encoding to minimise final bit rate
Aliasing
- means not enough samples taken so wave is just an alias of original
- still has same shape but more spread out
Analogue-to-digital converter (ADC)
- combined process of sampling and quantising usually performed by ADC
Quantisation noise (quantisation error)
- difference between original and digital signals
Differential pulse-code modulation (DPCM)
- variant of PCM that also converts source analogue signal to digital representation
- able to achieve lower bit rate by including sample prediction in its coding
Advantages of DPCM over PCM
- successive samples not very different from each other
- encoder and decoder predict next sample will be same as current one
- transmitted difference value is then error in prediction
- difference values also known as prediction errors
MPEG-1
- mainly used for efficient storage of moving pictures for multimedia on CD-ROM
MPEG-2
- toolbox of optimised compression techniques for DTV systems to support both SD and HD picture resolutions
MPEG-4
- intended to provide high compression rates, allowing for transmission of moving pictures at bit rates below 64 kbit s-1
MPEG-7
- specifies way multimedia can be indexed, and thus searched for in variety of ways relating to specific medium
MPEG-21
- extends this notion further by including additional digital rights management (DRM) into MPEG systems
Objective of JPEG and MPEG coding
- removal of as much statistical and perceptual redundancy as possible, to achieve highest compression
- this achieved in two stages
- Spatial compression and Temporal compression
Spatial compression
exploits fact that in many real pictures considerable similarity (correlation) exists between neighbouring areas of image
Spatial compression - intra-frame compression
- each individual picture able to be compressed
- basis of JPEG image compression standard
Temporal compression
- exploits fact that in most sequences, very little changes between consecutive frames
Temporal compression - inter-frame compression
- high correlation between frames offers further lossy compression opportunities, by removing detail without loss of quality
JPEG coding
- de facto lossy compression standard for colour and greyscale images
though known as lossy it does have lossless mode
JPEG limitations
- no interactive functionality, cannot compress region of interest at different bit rate from remainder of image
- not optimised for either natural images or synthetic computer generated images
- poor compression of compound documents containing both images and text
- degraded performance in noisy channel conditions
JPEG2000
- low-bit rate image compression standard
- offers interactive, multi resolution and scalable functionality
- superior coding performance with fewer visually perceptible artefacts
JPEG2000 bitstream scalability
- image can change its representation to satisfy requirements of application or receiver
JPEG2000 discrete wavelet transform (DWT)
- decomposes image into four sub images, each having different resolution corresponding to different frequency band
- original image can be reconstructed by combining four images
Region of interest
- if in an image you are wanting to ensure a certain area has better quality than surrounding area then JPEG2000 can be used
- Lossless compression is applied to this area to ensure the image is of high standard
- the rest of the image can be coded at a much lower resolution
Motion JPEG(M-JPEG)
- allows moving images to be compressed
- uses only intra frame compression
Motion vectors
- means the difference in frame movement between two frames in a movie or video
Motion prediction
- idea is to predict current frame from previous frame by calculating set of MVs then determine motion prediction error
- this prediction can then be compensated for at the decoder
Three main picture types supported by MPEG
- I-frames
- P-frames
- B-frames
I-frames
- (intra frame) are JPEG-coded and used as reference for random access in MPEG bitstreams
- coded independently without reference to other picture types
- don’t use motion vectors
- achieve only low compression
- used any time shot changes from one sequence to another
P-frames
- (prediction) use motion prediction and compensation to achieve higher compression than I-frames
- used as reference for both future and past predictions
- don’t offer random access capability within coded bitstream
B-frames
- (bidirectional prediction) interpolated frames between _ and P-frames in both forward and backward directions
- not used as reference but fill in missing frames
- provide highest compression and don’t propagate coding errors
Correcting prediction
- find best prediction using block-matching algorithm to determine set of motion vectors
- calculate prediction error between estimated and actual object positions, transmit alongside motion vectors
Group of pictures
- used by MPEG to refer to particular combination of frames that represent sequence
- always start with reference I-frame
- defined by two parameter, total number of frames in GOP and number of adjacent B-frames plus one
H.264/AVC
- supports high quality delivery of audio and video
- also low bit rate IP based streaming applications
- offers range of profiles
Switching P and I-frames (known as SP and SI)
- incorporated into GOP format
- designed to support efficient switching between bitstreams
Perceptual masking
- composition of sound can alter ear’s ability to perceive specific frequencies at specific amplitudes
- two types of masking; frequency masking and temporal masking
- together referred to as noise masking
Frequency masking
- arises because of inherent property of ear
- relatively loud sound at particular frequency reduces sensitivity to neighbouring frequencies
Temporal masking
- refers to fact perceptual hearing sensitivity to sounds in narrow frequency range reduced for short period
Speech coding methods
- waveform encoding, process the source data using either time or frequency techniques
- vocoder (voice encoder), formulate a mathematical model of voice production process that can be represented by small number of parameters
Linear predictive coding (LPC)
- estimates key speech production parameters relating to acoustics of vocal tract for both voiced and unvoiced signals
Code-excited linear prediction(CELP)
- not coding algorithm per se
- grouping of low-bit rate speech-coding solutions that employ LPC as core compression model
- constructs codebook of quantised excitation vectors, known as code words
- all entered into codebook
- transmits model coefficients and gain to decoder and sends index pointer to one codebook entry as best excitation