Lesson 11: Applications - Video Flashcards
Compare the bit rate for video, photos, and audio.
Video is high: 100 kbps to over 3Mbps
Audio: 128 kpbs
Photos: 320 kbps
What are the characteristics of streaming stored video?
- it starts playing immediately
- interactive, as users expect to control the playback
- has continuous playback
What are the characteristics of streaming live audio and video?
- many simultaneous users
- delay-sensitive
What are the characteristics of conversational voice and video over IP?
- delay-sensitive
- packet loss-tolerant
How does the encoding of analog audio work (in simple terms)?
audio is encoded by taking many (as in, thousands) of samples per second, and then rounding each sample’s value to a discrete number within a particular range. (This “rounding” to a discrete number is calledquantization.)
What are the three major categories of VoIP encoding schemes?
narrowband, broadband and multimode
What are the functions that signaling protocols are responsible for?
1) User location - the caller locating where the callee is.
2) Session establishment - handling the callee accepting, rejecting, or redirecting a call.
3) Session negotiation - the endpoints synchronizing with each other on a set of properties for the session.
4) Call participation management - handling endpoints joining or leaving an existing session.
What are three QoS VoIP metrics?
- end-to-end delay
- jitter
- packet loss
What kind of delays are included in “end-to-end delay”?
- the time it takes to encode the audio (which we discussed earlier),
- the time it takes to put it in packets,
- all the normal sources of network delay that network traffic encounters such as queueing delays,
- “playback delay,” which comes from the receiver’s playback buffer (which is a mitigation technique for delay jitter, which we’ll be discussing next),
- and decoding delay, which is the time it takes to reconstruct the signal.
How does “delay jitter” occur?
Between all the different buffer sizes and queueing delays and network congestion levels that a packet might experience, different voice packets can end up with different amounts of delay. One voice packet may be delayed by 100ms, and another by 300ms. We call this phenomenon “jitter,” “packet jitter,” or “delay jitter.”
What are the mitigation techniques for delay jitter?
The main VoIP application mechanism for mitigating jitter is maintaining a buffer, called the “jitter buffer” or the “play-out buffer.”
Compare the three major methods for dealing with packet loss in VoIP protocols.
- FEC(Forward Error Correction): works by transmitting redundant data alongside the main transmission, which allows the receiver to replace lost data with the redundant data
- Interleaving: Interleaving works by mixing chunks of audio together so that if one set of chunks is lost, the lost chunks aren’t consecutive. The idea is that many smaller audio gaps are preferable to one large audio gap.
- Error concealment: basically “guessing” what the lost audio packet might be.
How does FEC (Forward Error Correction) deal with the packet loss in VoIP? What are the tradeoffs of FEC?
works by transmitting redundant data alongside the main transmission, which allows the receiver to replace lost data with the redundant data
tradeoffs- higher bandwidth usage due to transmitting redundant data. Also, some of these FEC techniques require the receiving end to receive more chunks before playing out the audio, and that increases playout delay.
How does interleaving deal with the packet loss in VoIP/streaming stored audio? What are the tradeoffs of interleaving?
Interleaving works by mixing chunks of audio together so that if one set of chunks is lost, the lost chunks aren’t consecutive. The idea is that many smaller audio gaps are preferable to one large audio gap.
The tradeoff for interleaving is that the receiving side has to wait longer to receive consecutive chunks of audio, and that increases latency. Unfortunately, that means this technique is limited in usefulness for VoIP, although it can have good performance for streaming stored audio.
How does the error concealment technique deal with the packet loss in VoIP?
basically “guessing” what the lost audio packet might be.
tradeoffs- replacement strategy is computationally cheap and interpolation is expensive but a better one.
What developments lead to the popularity of consuming media content over the Internet?
- the bandwidth for both the core network and last-mile access links have increased tremendously over the years.
- the video compression technologies have become more efficient. This enables to stream high-quality video without using a lot of bandwidth.
- the development of Digital Rights Management culture has encouraged content providers to put their content on the Internet.
Provide a high-level overview of adaptive video streaming.
- content is created
- compressed using an encoding algo
- secured using a DRM
- hosted over a server
(Optional) What are two ways to achieve efficient video compression?
i) within an image – pixels that are nearby in a picture tend to be similar – known as spatial redundancy, and
ii) across images – in a continuous scene, consecutive pictures are similar – known as temporal redundancy.
(Optional) What are the four steps of JPEG compression?
- Transform image into color components and luminance.
- Separate the matrices, compress and encode them. Apply discrete cosine transformation to transform into the frequency domain.
- Compress the matrices using pre-defined quantization table.
- Lossless encoding to store the coefficients.
(Optional) Explain video compression and temporal redundancy using I-, B-, and P-frames.
Well, consecutive frames in a video are similar to each other. Thus, instead of encoding every image as a JPEG, we encode the first (known as I-frame) as a JPEG and then encode the difference between two frames. The in-between encoded frame is known as Predicted or P-frame. An additional way to improve encoding efficiency is to encode a frame as a function of the past and the future I (or P)-frames. Such a frame is known as a Bi-directional or B-frame.
(Optional) Why is video compression unable to use P-frames all the time?
- when a scene changes, the frame becomes drastically different from the last frame and even if we are using the difference, it is almost like encoding a fresh image. This can lead to loss in image quality.
- In addition, it also increases the decoding time in case a user skips ahead in the video as it will need to download all frames in between the last I-frame and the current frame to decode the video.
(Optional) What is the difference between constant bitrate encoding and variable bitrate encoding (CBR vs VBR)?
CBR: the output size of the video is fixed over time
VBR: the output size remains the same on an average, but varies here and there based on the underlying scene complexity
Which protocol is preferred for video content delivery - UDP or TCP? Why?
TCP for video delivery as it provides reliability.
What was the original vision of the application-level protocol for video content delivery and why was HTTP chosen eventually?
The original vision was to have specialized video servers that remembered the state of the clients. These servers would control the sending rate to the client. In the case, client paused the video, it would send a signal to the server and the server would stop sending video. Thus, all the intelligence would be stored at a centralized point and the clients, which can be quite diverse, would have to do minimal amount of work. All this required content providers to buy specialized hardware.
A major advantage of this is that content providers could use the already existing CDN infrastructure. Moreover, it also made bypassing middleboxes and firewalls easier as they already understood HTTP.