Warning: This is a development version. The latest stable version is Version 7.0.0.
Rely aims at providing an FEC/ECC component compatible with low-latency applications. To achieve this goal time has to be dealt with, both at the encoder and the decoder. In the following we will walk through how Rely handles this:
Table of Contents
Jump straight to Timeout Summary (TL;DR) to get the summary and basic suggestion for timeout configuration.
On the encoder side, the following questions must be answered:
On the decoder side the questions are a bit different.
Many different answers can be given to these seemingly simple questions. To avoid complex solutions, such as time synchronization between devices and network latency estimation, Rely’s timeout mechanism is based purely on the arrival time of source symbols.
Let’s look at what this means at both the encoder and the decoder.
At the encoder received payloads may be fragmented into one or more
source symbols. Each source symbol is time-stamped with the current time.
The current time is provided via the
now parameter in
Internally in Rely this builds a timeline where source symbols each get a timestamp. Using the timeout value passed to rely::encoder::configure(), this allows Rely to answer both question 1 and 2 from above.
Lets look at an example where the Rely encoder receives 6 source symbols.
In this case the
repair_interval = 4 and
repair_target = 1, i.e.,
Rely will produce one repair packet for every four source symbols.
If you are using “manual repair” (Rely’s Two Modes) flushing on the encoder is not enabled. However, the timeout calculation / expire time for individual sources symbols is still the same.
As shown the first four source symbols are protected by
Whereas the two last symbols
s6 are still unprotected.
To ensure that no symbols expire without being protected, Rely will calculate a
flush time based on the oldest unprotected source symbol, in this case
Three things can happen after this point:
Case 1: No more source symbols are added.
In this case the Rely flush timeout should be triggered and generate repair
for the unprotected source symbols
This will ensure that the repair requirement is protected before the source
Notice that r2 will only include
s6 if all prior symbols will
Case 2: More source symbols are added.
In this case Rely will update the timeline with the new symbols and produce both source symbols and repair as needed.
As can be seen when generating
r2 Rely will include all symbols that have
not yet expired.
In our simple example this includes
A new flush time is calculated since
s9 repair requirement remains
Case 3: Trigger flush (low latency repair) In order to avoid waiting for the flush before generating repair for unprotected symbols we can call directly rely::encoder::flush_repair(). This will ensure that repair is generated for the unprotected symbols immediately. This is sometimes needed for applications such as low-latency video, you can read more about this here Content-Aware Coding.
As in Case 2 the repair
r2 will include all source symbols at the encoder
that have not yet expired. Notice that using this approach will increase Rely’s
Rather than flushing immediately it is also possible to build a solution where rely::encoder::flush_repair() is called after a small timeout if no new source symbols arrive. Essentially creating a hybrid of Case 2 and Case 3.
The following API controls the flush behaviour of the encoder:
The flush mechanism is primarily useful in applications where the traffic pattern is unpredictable, i.e., where data is received in bursts or similar. If the incoming rate of packets is high, it is unlikely that unprotected symbols will remain in the encoder for long before repair is generated.
Using the flush mechanisms Rely ensures that no symbols are left unprotected in the encoder for longer than the specified timeout.
Before diving into how Rely’s decoder deals with time, it may be useful to quickly discuss why this is important.
Latency builds up in the decoder when losses occur. This happens since Rely will try to always release decoded payloads in order. If a loss occurs Rely cannot release any payloads until it has been handled. This happens with high probability when the next repair symbol arrives. If the decoder is unable to recover a loss at some point it has to give up and expose the loss to the application. How long Rely waits before it gives up is controlled by the timeout parameter given in rely::decoder::configure().
Rely uses two principles in order to control the release of source symbols and the latency of the decoder.
Principle #1: Source symbols have zero latency.
A Rely encoder is expected to produce source symbols immediately and without delay. This means that all latency associated with coding is added on the decoder side.
Principle #2: The encoder’s coding window is the truth.
All source and repair symbols sent by a Rely encoder contain the size of the current coding window. Using the size of the coding window, the decoder is able to know whether a specific source symbol has expired or is still valid.
Let’s explain how Rely utilizes this by looking at a simple example.
Initially the decoder receives the source symbol
s1, since this is the next
expected symbol it is immediately released and forwarded to the application.
The next source symbol received is
s3 which triggers loss detection for
In this case the release of
s3 will be blocked until either
dropped or decoded.
This can happen in one of three ways:
s3triggers. In this case a flush on the decoder will drop
s3to the application.
s2has been dropped from the encoder’s coding window. In this case the decoder will also drop
s3to the application.
s2. In this case both
s3are released to the application.
As with the encoder, flushing on the decoder side is necessary to ensure that packets are released in a timely fashion even though no more data arrives.
The following API controls the flush behavior of the decoder:
In this section we will discuss how to pick a timeout configuration on both the encoder and decoder.
For many real-time or low latency applications we can often work with an overall
latency budget, e.g., for video conferencing, where the overall latency budget
may be, e.g.,
150 ms, after which the user experience degrades.
This overall budget can then be broken down into individual elements for example:
150 - (35 - 30 - 15) = 70 ms can then be used, e.g., for Rely.
How does this affect the timeout configuration on the encoder and decoder. An
intuitive idea would be to split the
70 ms between the encoder and decoder.
However, that would in fact not be correct.
To understand why, we consider two things:
Recall from our previous descriptions that the timeouts on both the encoder and decoder side are relative to the arrival time of the source symbols on either side.
Let’s look at a simple example to see what this means:
In this case the encoder uses a flush to generate the repair symbol
s1 is about to expire.
r1 is useful when received at the decoder only depends on the
arrival time of
s1 at the decoder.
This makes the timeout mechanism agnostic to the network latency, i.e., it does
not matter whether the average network latency is
50 ms or
Source symbols are not delayed
Finally notice that the encoder will send source symbols immediately and without delay. This means that from a latency budget perspective the decoder is the latency inducing element.
The correct way to configure the timeout is therefore to assign both the encoder
and the decoder
70 ms as their timeout parameter.
Although the average network latency does not affect the timeout settings of the encoder and decoder we do have to take into account the network jitter. To illustrate this let’s look at an example:
In the above example we see the encoder producing a repair packet in response to a timeout. Although produced and sent in time, variance in the network latency means that the packet arrives too late to be useful.
This example illustrates that the timeout for the encoder needs to take into account the jitter/variance of the network latency. This is primarily an issue when generating repair in response to a source symbol expiring, as this means that the repair will not be generated until the last possible moment.
There are roughly two ways to ensure that jitter on the network can be accounted for:
X msbefore a symbol is about to expire.
In Rely we recommend to use option (2) for the following reason:
The flush threshold can be specified for the encoder using rely::encoder::set_flush_threshold(). The flush threshold will be used when calculating when a flush needs to happen. Lets take an example:
With no flush threshold: Say we have consumed a payload at
100 ms and time
70 ms the rely::encoder::upcoming_flush()
should tell us that a flush is pending in
30 ms - since that is when the
With a flush threshold: E.g. of
5 ms the
rely::encoder::upcoming_flush() should tell us that a
flush is pending in
25 ms. This ensures that the flush will happen before
expiring the symbol and ensure we have more room for jitter on the network.
Overall this ensures that repair generated can suffer some delay compared to the other packets sent but without arriving too late at the decoder. If you are doing Content-Aware Coding you may have to take into account, e.g., video frame delay.
If you are doing Content-Aware Coding then it s unlikely that you will ever need to do a rely::encoder::flush() to generate repair for unprotected source symbols. However, you may still have a case where one piece of content (e.g. a video frame) is just about to expire when repair is scheduled due to the arrival of more content. In this case we still need to take jitter into account as illustrated below:
In the above example repair is generated after each frame. As seen the blue
s3) is just about to expire as the yellow frame
s9) arrives. The repair packet
therefore contain all symbols from all three frames. However, due to jitter the
repair arrives too late to be useful at the decoder.
The following following questions should allow you to pick a reasonable timeout value for both the encoder and decoder.
What is the latency budget for Rely, i.e., how much time can Rely spend in trying to fix packet loss before it should give up?
What is the
jitter on the network, i.e., how much does the latency
Then we specify:
Example (using the numbers from The Latency Budget: