Warning: This is a development version. The latest stable version is Version 7.0.0.

Timeout Configuration

Rely aims at providing an FEC/ECC component compatible with low-latency applications. To achieve this goal time has to be dealt with, both at the encoder and the decoder. In the following we will walk through how Rely handles this:

Note

Jump straight to Timeout Summary (TL;DR) to get the summary and basic suggestion for timeout configuration.

Timeout Considerations

On the encoder side, the following questions must be answered:

  1. For how long should source symbols be to included (i.e., encoded) in the repair packets?
  2. When should repair be “flushed”, in order to ensure that all source symbols gets the required amount of protection before timing out?

On the decoder side the questions are a bit different.

  1. For how long should we continue to try to repair a specific lost source symbol?
  2. At what point do we need to give up on decoding a specific source symbol and expose packet loss to the receiving application?

Many different answers can be given to these seemingly simple questions. To avoid complex solutions, such as time synchronization between devices and network latency estimation, Rely’s timeout mechanism is based purely on the arrival time of source symbols.

Let’s look at what this means at both the encoder and the decoder.

Encoder Timeout Mechanism

At the encoder received payloads may be fragmented into one or more source symbols. Each source symbol is time-stamped with the current time. The current time is provided via the now parameter in rely::encoder::consume().

Internally in Rely this builds a timeline where source symbols each get a timestamp. Using the timeout value passed to rely::encoder::configure(), this allows Rely to answer both question 1 and 2 from above.

Lets look at an example where the Rely encoder receives 6 source symbols. In this case the repair_interval = 4 and repair_target = 1, i.e., Rely will produce one repair packet for every four source symbols.

Note

If you are using “manual repair” (Rely’s Two Modes) flushing on the encoder is not enabled. However, the timeout calculation / expire time for individual sources symbols is still the same.

../_images/timeout_encoder_flush_needed.svg

As shown the first four source symbols are protected by r1. Whereas the two last symbols s5 and s6 are still unprotected. To ensure that no symbols expire without being protected, Rely will calculate a flush time based on the oldest unprotected source symbol, in this case s5.

Three things can happen after this point:

Case 1: No more source symbols are added.

In this case the Rely flush timeout should be triggered and generate repair for the unprotected source symbols s5 and s6. This will ensure that the repair requirement is protected before the source symbols expire.

../_images/timeout_encoder_case_1.svg

Notice that r2 will only include s5 and s6 if all prior symbols will have expired.

Case 2: More source symbols are added.

In this case Rely will update the timeline with the new symbols and produce both source symbols and repair as needed.

../_images/timeout_encoder_case_2.svg

As can be seen when generating r2 Rely will include all symbols that have not yet expired. In our simple example this includes s3 and s4. A new flush time is calculated since s9 repair requirement remains unprotected.

Case 3: Trigger flush (low latency repair) In order to avoid waiting for the flush before generating repair for unprotected symbols we can call directly rely::encoder::flush_repair(). This will ensure that repair is generated for the unprotected symbols immediately. This is sometimes needed for applications such as low-latency video, you can read more about this here Content-Aware Coding.

../_images/timeout_encoder_case_3.svg

As in Case 2 the repair r2 will include all source symbols at the encoder that have not yet expired. Notice that using this approach will increase Rely’s repair rate.

Rather than flushing immediately it is also possible to build a solution where rely::encoder::flush_repair() is called after a small timeout if no new source symbols arrive. Essentially creating a hybrid of Case 2 and Case 3.

../_images/timeout_encoder_flush_repair.svg

Encoder Implementation

The following API controls the flush behaviour of the encoder:

  1. rely::encoder::flush_repair(): Manually generate repair for unprotected symbols.
  2. rely::encoder::has_upcoming_flush(): Check if a flush is needed, i.e., the encoder has unprotected source symbols.
  3. rely::encoder::upcoming_flush(): If needed, returns the time until the flush.
  4. rely::encoder::flush(): Generates the needed repair and remove any expired symbols.

Note

The flush mechanism is primarily useful in applications where the traffic pattern is unpredictable, i.e., where data is received in bursts or similar. If the incoming rate of packets is high, it is unlikely that unprotected symbols will remain in the encoder for long before repair is generated.

Using the flush mechanisms Rely ensures that no symbols are left unprotected in the encoder for longer than the specified timeout.

Decoder Timeout Mechanism

Before diving into how Rely’s decoder deals with time, it may be useful to quickly discuss why this is important.

Latency builds up in the decoder when losses occur. This happens since Rely will try to always release decoded payloads in order. If a loss occurs Rely cannot release any payloads until it has been handled. This happens with high probability when the next repair symbol arrives. If the decoder is unable to recover a loss at some point it has to give up and expose the loss to the application. How long Rely waits before it gives up is controlled by the timeout parameter given in rely::decoder::configure().

Rely uses two principles in order to control the release of source symbols and the latency of the decoder.

Principle #1: Source symbols have zero latency.

A Rely encoder is expected to produce source symbols immediately and without delay. This means that all latency associated with coding is added on the decoder side.

Principle #2: The encoder’s coding window is the truth.

All source and repair symbols sent by a Rely encoder contain the size of the current coding window. Using the size of the coding window, the decoder is able to know whether a specific source symbol has expired or is still valid.

Let’s explain how Rely utilizes this by looking at a simple example.

../_images/timeout_decoder_example.svg

Initially the decoder receives the source symbol s1, since this is the next expected symbol it is immediately released and forwarded to the application. The next source symbol received is s3 which triggers loss detection for s2. In this case the release of s3 will be blocked until either s2 is dropped or decoded.

This can happen in one of three ways:

  1. The timeout for s3 triggers. In this case a flush on the decoder will drop s2 and release s3 to the application.
  2. A new repair or source symbol is received that signals that s2 has been dropped from the encoder’s coding window. In this case the decoder will also drop s2 and release s3 to the application.
  3. A repair symbol arrives that triggers the decoding of s2. In this case both s2 and s3 are released to the application.

As with the encoder, flushing on the decoder side is necessary to ensure that packets are released in a timely fashion even though no more data arrives.

Decoder Implementation

The following API controls the flush behavior of the decoder:

The Latency Budget

In this section we will discuss how to pick a timeout configuration on both the encoder and decoder.

For many real-time or low latency applications we can often work with an overall latency budget, e.g., for video conferencing, where the overall latency budget may be, e.g., 150 ms, after which the user experience degrades.

This overall budget can then be broken down into individual elements for example:

  • Video encoding 35 ms
  • Network latency 30 ms
  • Video decoding 15 ms

The remaining 150 - (35 - 30 - 15) = 70 ms can then be used, e.g., for Rely. How does this affect the timeout configuration on the encoder and decoder. An intuitive idea would be to split the 70 ms between the encoder and decoder. However, that would in fact not be correct.

To understand why, we consider two things:

Independent timelines

Recall from our previous descriptions that the timeouts on both the encoder and decoder side are relative to the arrival time of the source symbols on either side.

Let’s look at a simple example to see what this means:

../_images/timeout_latency_budget.svg

In this case the encoder uses a flush to generate the repair symbol r1 since s1 is about to expire. Whether r1 is useful when received at the decoder only depends on the arrival time of s1 at the decoder. This makes the timeout mechanism agnostic to the network latency, i.e., it does not matter whether the average network latency is 50 ms or 100 ms.

Source symbols are not delayed

Finally notice that the encoder will send source symbols immediately and without delay. This means that from a latency budget perspective the decoder is the latency inducing element.

The correct way to configure the timeout is therefore to assign both the encoder and the decoder 70 ms as their timeout parameter.

Taking Jitter Into Account

Although the average network latency does not affect the timeout settings of the encoder and decoder we do have to take into account the network jitter. To illustrate this let’s look at an example:

../_images/timeout_jitter.svg

In the above example we see the encoder producing a repair packet in response to a timeout. Although produced and sent in time, variance in the network latency means that the packet arrives too late to be useful.

This example illustrates that the timeout for the encoder needs to take into account the jitter/variance of the network latency. This is primarily an issue when generating repair in response to a source symbol expiring, as this means that the repair will not be generated until the last possible moment.

There are roughly two ways to ensure that jitter on the network can be accounted for:

  1. We can use a smaller timeout value on the encoder compared to the decoder.
  2. We can specify a flush threshold to ensure that flush happens X ms before a symbol is about to expire.

In Rely we recommend to use option (2) for the following reason:

  • In cases where jitter may be large, say 10 ms option (1) suffers the drawback that it will drop source symbols 10 ms before they actually expire - thereby potentially reducing the repair traffic efficiency. Whereas option (2) just ensures that those source symbols are protected - if later more data arrives the source symbols can still participate in repair packets.

The flush threshold can be specified for the encoder using rely::encoder::set_flush_threshold(). The flush threshold will be used when calculating when a flush needs to happen. Lets take an example:

With no flush threshold: Say we have consumed a payload at 100 ms and time is now 70 ms the rely::encoder::upcoming_flush() should tell us that a flush is pending in 30 ms - since that is when the payload expires.

With a flush threshold: E.g. of 5 ms the rely::encoder::upcoming_flush() should tell us that a flush is pending in 25 ms. This ensures that the flush will happen before expiring the symbol and ensure we have more room for jitter on the network.

Overall this ensures that repair generated can suffer some delay compared to the other packets sent but without arriving too late at the decoder. If you are doing Content-Aware Coding you may have to take into account, e.g., video frame delay.

Content Based Timeout

If you are doing Content-Aware Coding then it s unlikely that you will ever need to do a rely::encoder::flush() to generate repair for unprotected source symbols. However, you may still have a case where one piece of content (e.g. a video frame) is just about to expire when repair is scheduled due to the arrival of more content. In this case we still need to take jitter into account as illustrated below:

../_images/timeout_content_aware.svg

In the above example repair is generated after each frame. As seen the blue frame (s1, s2 and s3) is just about to expire as the yellow frame (s6, s7, s8 and s9) arrives. The repair packet r3 will therefore contain all symbols from all three frames. However, due to jitter the repair arrives too late to be useful at the decoder.

Timeout Summary (TL;DR)

The following following questions should allow you to pick a reasonable timeout value for both the encoder and decoder.

What is the latency budget for Rely, i.e., how much time can Rely spend in trying to fix packet loss before it should give up?

What is the jitter on the network, i.e., how much does the latency fluctuate.

Then we specify:

  • Encoder timeout: latency_budget
  • Decoder timeout: latency budget

Example (using the numbers from The Latency Budget:

  • Overall latency budget: 150 ms
  • Rely’s budget: 70 ms
  • Network jitter: 10 ms
  • Encoder timeout: 70 ms
  • Decoder timeout: 70 ms