Warning: This is a development version. The latest stable version is Version 7.0.0.

Content-Aware Coding

One of the key goals of Rely is to provide low latency loss recovery using ECC/FEC algorithms for latency sensitive applications. In order to achieve this goal in the best way, the ECC/FEC algorithm sometimes need to be “aware” of the structure of the content it’s protecting.

Background

Let’s first consider how Rely works in the general case, i.e., without knowledge of the content structure. Rely is based on a sliding window ECC/FEC algorithm. The following figure illustrates how this works for an input stream of Source symbols (could be, e.g., a video stream).

../_images/rely_content_aware_background.svg

As can be seen the Coding window moves as packets are being generated. Exactly how the coding window moves is dependent on the desired packet loss protection, maximum latency allowed etc. In the figure above we see that for every four source symbols a single repair packet is generated. That would give a repair rate of every one out of five symbols 1 / 5 = 20%. The basic promise of the ECC/FEC algorithm is then that as long as no more than 20% of the symbols are lost during transmission, the original source symbols can be decoded - without the need for retransmissions.

Note

This is not fully accurate (we need to account for, e.g., the loss distribution, and the size of the coding window), but as a general rule-of-thumb we should be able to recover from most of the observed packet loss in this way.

This works well for general traffic like IP packets where we just want to protect a generic network stream with packets coming from several independent applications (an example of this would be using Rely to increase reliability of a VPN type service).

However let’s consider what can happen when we want to protect the traffic coming from a latency sensitive application where the structure of the content is important.

Example: Video Streaming

Here we will use video streaming as an example, but in general this should hold true for most other types of content delivery where boundaries in the content are important.

The Problem

Let us consider a live video streaming service using H264 video compression. The typical frame rate for such an application is 30 fps (frames per second). This means that every 1/30 = 33 ms a frame is generated. The frames will have different sizes and therefore translate into a variable number of network packets. The reason H264 produces different sized frames is that the compression depends on the amount of activity in the video. If lots of stuff is happening the compression will be less effective, and the frames will be larger, if very little is changing frame to frame, the compression can be very efficient and the frames become very small. In addition H264 can be configured to regularly output what is called an I-frame that contains all information needed for a receiver to start decoding the video and are required for on-the-fly joining or to catch up after a loss of data. These frames tend to be quite large.

Let look at an example output from a H264 video encoder:

../_images/video_encoder_symbols.svg

As can be seen each source symbol belongs to a specific frame (indicated by their color).

Let imagine that we configure Rely to produce one repair packet for every five input packets. This would translate into the following network flow:

../_images/rely_video_encoding_problem.svg

The problem we can observe here is that the repair for frame 0 is not scheduled until after frame 1 is produced by the H264 video encoder. This means that if one of the packets from frame 0 was lost we would have to wait 33 ms to get repair for it.

Of course this problems gets even worse if the repair_interval is larger and the frame sizes smaller. An example of this problem is illustrated here:

../_images/worse_rely_video_encoding_problem.svg

In addition we may observe a related problem namely that since repair is unaware of the content structure sometimes repair is generated which only covers part of the content. Again this means that if losses occur in unfortunate locations we may have to wait additional frames in order to repair a specific frame.

This is in fact not a new problem but also one that we will often see when using traditional block codes such as Reed-Solomon.

Rely’s Solutions

In order to avoid this problem using Rely we have two options to choose from:

  1. If we know the structure of the content e.g. which network packets belong to which video frames we can use “manual repair” via the rely::encoder::generate_repair() function.
  2. If we do not know the structure we can use the rely::encoder::flush_repair() approach.

Solution: Content Structure Known

If the content structure is know it is possible to generate repair directly after e.g. each video frame using the rely::encoder::generate_repair() function.

The following basic pseudo code shows how this could look like:

// Keep track of how many source symbols we will transmit - fragmentation
// may cause sources_symbols > payloads

std::size_t source_symbols = 0

// First frame consists of 5 video packets
source_symbols += encoder.consume(payload #1)
source_symbols += encoder.consume(payload #2)
source_symbols += encoder.consume(payload #3)
source_symbols += encoder.consume(payload #4)
source_symbols += encoder.consume(payload #5)

// Produce two repair symbols
encoder.generate_repair(2)

In real applications we would typically calculate the amount of repair needed based on the number of source symbols and the desired level of protection we want. Using this approach we may also differentiate between different video frame types i.e. we may choose to protect more important video frames with more repair.

Solution: Content Structure Unknown

If the content structure is unknown it will be very difficult to generate repair exactly at the content boundaries. Instead we may use the “automatic repair” mode combined with the rely::encoder::flush_repair() functionality to ensure that repair is generated for any unprotected source symbols currently stored within the encoder.

Note

We call a source symbol unprotected if it has not yet been part of any repair symbols.

The basic idea would be to combine the rely::encoder::flush_repair() with a short idle timeout to avoid having unprotected symbols waiting in the encoder.

The following basic pseudo code shows how this could look like:

// We consume some payloads
encoder.consume(payload #1)
encoder.consume(payload #2)
encoder.consume(payload #3)
encoder.consume(payload #4)
encoder.consume(payload #5)

// If no other data arrives we may have timer fire after, e.g.,
// 5 ms of idle time and call flush_repair to ensure that if
// we have any unprotected symbols they have repair generated
encoder.flush_repair()

Lets look at the effect of using this mechanism on the repair_rate and therefore also the bandwidth consumption.

Content smaller than the repair_interval

Let’s look at an example where we always call rely::encoder::flush_repair() after consuming a video frame:

repair_interval = 4
repair_target = 1

repair_rate = 1 / (1 + 4) = 20%

If the number of source symbols in the video frame equals 3. Then repair generated would be:

repair_generated = ceil(3 / (1 - 0.2)) - 3 = 1

The actual repair generated would then in this case be:

1 / (1 + 3) = 25%.

Which is slightly larger than our target of 20%.

Content Larger or Equal to the repair_interval

If more symbols are added than the size of the repair_interval. We may encode as usual, but use rely::encoder::flush_repair() to ensure that repair is also generated even if the number of source symbols does not evenly divide the repair_interval.

Let’s take again an example with the same setup as before:

repair_interval = 4
repair_target = 1

repair_rate = 1 / (1 + 4) = 20%

If our input video frame covers five source symbols we will generate repair twice:

  1. First time when the repair_interval is filled.
  2. Seconds time after consuming the last source symbol from the video frame.

The first time we generate:

repair_generated = ceil(4 / (1 - 0.2)) - 4 = 1

The second time we generate:

repair_generated = ceil(1 / (1 - 0.2)) - 1 = 1

For the entire frame we have two repair packets generated for the five source symbols so our actual repair_rate will be:

2 / (2 + 5) = 28%.

Considerations and Observations

As seen in the two examples above we may avoid “waiting” for data by using Rely’s rely::encoder::generate_repair() or rely::encoder::flush_repair() functionality. However, as it is also shown this may cause the bandwidth usage of the application to increase.

So before considering this as a solution a few observations and considerations should be made.

  1. If latency is more important than bandwidth usage (and bandwidth is available), this may be a good solution.
  2. If the video bitrate is high, i.e. the source symbols of a single video frame spans multiple repair_interval, then the overhead of flushing repair on the last interval may be amortized and therefore present a good solution.
  3. If the incoming rate of source symbols is high (e.g. due to multiplexing of several streams) using flush repair may be infrequent and therefore present a good solution in cases where the input rate temporarily drops.