How does Fluentd/Fluent-bit handle backpressure?

Hi FluentD community, my senior wanted to verify that FluentD can handle backpressure in a similar way as to how Logstash (log aggregator) can let Filebeat (log collector/forwarder) know that it’s busy and that it should retry after some time to write logs to Logstash.

I think the buffer section in documentation explains this to some extent but he’s saying that it applies to just regular network disconnects (which I also suspect) and is not a “true” backpressure handling solution. So, my queries to confirm are these:

  1. Does a FluentD/bit aggregator instance let the FluentD/bit forwarding instance (on all app servers) know that “Hey I’m busy, don’t send in logs right now” which would lead to forwarding instance knowing to throttle the tries to send logs? OR If the FluentD aggregator instance is busy and is not able to accept logs, will it just stop listening on the port?

  2. If it does or does not, how and what does the aggregator instance communicate with the forwarding instance? Is there a description of the algorithm that aggregator instances use to let the log forwarder instance know the health/load of the aggregator instance?

  3. Is there any confirmation that the aggregator Fluentd instance sends the forwarding Fluentd/bit instance that the log message it forwarded was successfully received and recorded? How can the forwarding instance be sure that the log was received successfully by the aggregator instance?

If someone can point to the documentation stating the backpressure handling scenario between forwarding and aggregator instances, that would be great. If the above is not documented anywhere, can someone point me to the code in the GitHub repository? (I did not find it myself)

Thanks for this question, there are quite a few benefits when using Fluentd + Fluent Bit in a forwarder/aggregator pattern.

  1. Backpressure handling. The way that fluentd/ fluent bit handle acknowledgements is that they wait for data to be buffered prior to sending a response Ack. This allows reliability in cases when an aggregator might go down. Additionally when a buffer is full on an aggregator then the acknowledgment will not be sent which will then kick off a few options. For the tail plugin backpressure will cause fluent bit to pause. For TCP/UDP plugins backpresure requires you to choose which data to prioritize as the data is short lived.

  2. The main algorithms at work when using forwarder and aggregator patterns the main is the phi accrual algorithm. forward - Fluentd. This is especially useful when using multiple aggregators for high availability and failover detection. Note that the parameters of the algorithm can be customized as well.

  3. Backpressure is mainly handled at the input side, we are looking at more variable adaptation of pressure throughout the system in the next few months, specifically when using fluent bit as the forwarder

Hope that helps, let me know if there are any questions

Thanks @agup006. Can you explain a bit more in 1st point about how would we prioritize data for TCP/UDP plugins? We mainly intend to use tail plugins for filesystem logs, but could still use TCP for some tasks.