Warning: This is a development version. The latest stable version is Version 4.0.1.
One of the key goals of Rely is to provide low latency loss recovery using ECC/FEC algorithms for latency sensitive application. In order to achieve this goal sometimes we need the ECC/FEC algorithm to be “aware” of the structure of the content we are protecting.
Before we get to that part though, let’s first consider how Rely works in the general case. Rely is based on a sliding window ECC/FEC algorithm. The following figure illustrates how this works for an input stream of source symbols (could be, e.g., a video stream).
As can be seen the coding window moves as packets are being generated. Exactly
how the coding window moves is dependent on the desired packet loss protection,
maximum latency allowed etc. In the figure above we see that for every four
source symbols a single repair packet is generated. That would give a repair
rate of every one out of five symbols 1 / 5 = 20%
. The basic promise of the
ECC/FEC algorithm is then that as long as no more than 20%
of the symbols
are lost during transmission we will be able to decode original source symbols -
without the need for retransmissions. This is not totally accurate (we need
to account for the loss distribution and the size of the coding window), but as
a general rule-of-thumb we should be able to recover from most of the observed
packet loss in this way.
This works well for general traffic like IP packets where we just want to protect a generic network stream with packets coming from several independent applications (an example of this would be using Rely increase reliability of a VPN type service).
However let’s consider what can happen when we want to protect the traffic coming from a latency sensitive application where the structure of the content is important.
Here we will use video streaming as an example, but in general this should hold true for most other types of content delivery where boundaries in the content are important.
Let us consider a live video streaming service using H264 video compression. The
typical frame rate for such an application is 30 fps (frames per second). This
means that every 1/30 = 33 ms
a frame is generated. The frames will have
different sizes and therefore translate into a variable number of network
packets. The reason H264 produces different sized frames is that the compression
depends on the amount of activity in the video. If lots of stuff is happening
the compression will be lower and the frames will be larger, if only very little
is changing frame to frame the compression can be very efficient and the frames
very small. In addition H264 can be configured to regularly output what is
called an I-frame that contains all information needed for a receiver to start
decoding the video and are required for on-the-fly joining or to catch up after
a loss of data. These frames tend to be quite large.
Let look at an example output from a H264 video encoder:
As can be seen each source symbol belongs to a specific frame (indicated by their color).
Let imagine that we configure Rely to produce one repair packet for every five input packets. This would translate into the following network flow:
The problem we can observe here is that the repair for frame 0
is not
scheduled until after frame 1
is produced by the H264 video encoder. This
means that if one of the packets from frame 0
was lost we would have to wait
33 ms to get repair for it.
Of course this problems gets even worse if the repair_interval is large and the frame sizes small. An example of this problem is illustrated here:
In addition we may observe a related problem namely that since repair is unaware of the content structure sometimes repair is generated which only covers part of the content. Again this means that if losses occur in unfortunate locations we may have to wait additional frames in order to repair a specific frame.
This is in fact not a new problem but also one that we will often see when using traditional block codes such as Reed-Solomon.
In Rely this problem can be mostly addressed by specifying a suitable
timeout
for the symbols in the encoder, see
rely::encoder::configure().
In which case rely::encoder::upcoming_flush() would
indicate that the encoder has source symbols that are about to expire and
therefore we need to run rely::encoder::flush().
In order to avoid this problem using Rely we can ensure that repair is generated for any unprotected source symbols currently stored within the encoder.
Note
We call a source symbol unprotected if it has not yet been part of any repair symbols.
This is done by calling rely::encoder::flush_repair().
Lets look at the effect of using this mechanism on the repair_rate
and
therefore also the bandwidth consumption.
repair_interval
¶Let’s look at an example where we always call rely::encoder::flush_repair() after consuming a video frame:
repair_interval = 4
repair_target = 1
repair_rate = 1 / (1 + 4) = 20%
If the number of source symbols in the video frame equals 3. Then repair generated would be:
repair_generated = ceil(3 / (1 - 0.2)) - 3 = 1
The actual repair generated would then in this case be:
1 / (1 + 3) = 25%.
Which is slightly larger than our target of 20%
.
repair_interval
¶If more symbols are added than the size of the repair_interval
. We may
encode as usual, but use rely::encoder::flush_repair() to ensure that
repair is also generated even if the number of source symbols does not evenly
divide the repair_interval
.
Let’s take again an example with the same setup as before:
repair_interval = 4
repair_target = 1
repair_rate = 1 / (1 + 4) = 20%
If our input video frame covers five source symbols we will generate repair twice:
repair_interval
is filled.The first time we generate:
repair_generated = ceil(4 / (1 - 0.2)) - 4 = 1
The second time we generate:
repair_generated = ceil(1 / (1 - 0.2)) - 1 = 1
For the entire frame we have two repair packets generated for the five source
symbols so our actual repair_rate
will be:
2 / (2 + 5) = 28%.
As seen in the two examples above we may avoid “waiting” for data by using Rely’s rely::encoder::flush_repair() functionality. However, as it is also shown this may cause the bandwidth usage of the application to increase.
So before considering this as a solution a few observations and considerations should be made.
repair_interval
, then the overhead of flushing repair on
the last interval may be amortized and therefore present a good solution.From an implementation point-of-view there are multiple ways we can use rely::encoder::flush_repair():