Read_bytes_limit_per_second feature in Fluentd

I had a question regarding the feature of “throttling bytes per second” in in_tail plugin (PR: Add log throttling per file (revised) by cosmo0920 · Pull Request #3185 · fluent/fluentd · GitHub).

Let’s say you’re reading from a file with throttling enabled. After some time the file rotated, then there will be another tail watcher object for the same file reading the new incoming logs. The original (older) tail watcher object will be attached to a timer_execute function for detaching it from the event loop after waiting for rotate_wait seconds and ensuring that all the logs from that file have been read (EOF reached). (Link: fluentd/in_tail.rb at master · fluent/fluentd · GitHub)

My question is, what if the reading is slow and file rotation rate is so fast that new tail watchers are generated by Fluentd even before the previous tail watchers are closed (ignore the missing log rotations). Let’s say there are n tail watcher objects for a file now. So now, n-1 tw objects will be waiting to be closed (from detach_watcher_after_rotate_wait method) and the last tw object will be reading the incoming log lines.

Will this scenario cause excessive cpu usage and slow down Fluentd? Also, is this scenario even possible in for Fluentd? Please help in clarifying my understanding of in_tail plugin…
Thanks !

Probably it doesn’t cause because reading pace is limited by read_bytes_limit_per_second. The limit size is per <source>, not per file.

It’s possible (multiple tw for waiting close can be exist).

Figure 1 shows the plot of CPU usage for the Fluentd Pod. It steadily increases to 100 % before being thrown into an evicted state (i.e. closed the pod due to excessive memory usage). Once the evicted pod has been removed, a new Fluentd pod is generated to restart the log collection process.

Figure 2 shows the rate of log lines collected by Prometheus Counter. It shows a sharp decline in the rate of log collection right after the Pod is evicted.

@ashie If we tweak the values of read_bytes_limit_per_second, we can cause this erroneous situation.

Steps to replicate :

  1. Set BYTES_TO_READ as 1000 (instead of 8192) - This is done so that we can reach faster to the point where Fluentd will exceed the max CPU allocated to it.

  2. We deploy our applications with the following workloads:

    1. 8 workloads - each generating logs at the rate of 1000 lines/sec
    2. 4 workloads - each generating logs at the rate of 2000 lines/sec
  3. Configuration File:

<source>
  @type tail
  @id container-input
  @log_level debug
  path "/var/log/containers/low-log-stress*.log,/var/log/containers/heavy-log-stress*.log"
  pos_file "/var/log/containers.log.pos"
  read_bytes_limit_per_second 1000
  path_key path
  refresh_interval 1s

  rotate_wait 5s
  tag kubernetes.*
  @label @MEASURE
  <parse>
    @type multi_format
    <pattern>
      format json
      time_format '%Y-%m-%dT%H:%M:%S.%N%Z'
      keep_time_key true
    </pattern>
    <pattern>
      format regexp
      expression /^(?<time>[^\s]+) (?<stream>stdout|stderr)( (?<logtag>.))? (?<log>.*)$/
      time_format '%Y-%m-%dT%H:%M:%S.%N%:z'
      keep_time_key true
    </pattern>
  </parse>
</source>


<source>
  @type prometheus
</source>

<label @MEASURE>
  <filter **>
    @type prometheus
    <metric> 
      name fluentd_input_status_num_records_total
      type counter
      desc The total number of incoming records
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </filter>

  <match kubernetes.var.log.containers.low-log-stress**>
    @type file
    @id low-log
    @log_level debug
    path /var/log/containers/fluentd.low.stresslog
    <buffer time>
      @type memory
      overflow_action throw_exception
      timekey 10s
      timekey_wait 1s
      flush_thread_interval 10s
      total_limit_size 1073741824 # 1 GB buffer size
    </buffer>
  </match>

  <match kubernetes.var.log.containers.heavy-log-stress**>
    @type file
    @id heavy-log
    @log_level debug
    path /var/log/containers/fluentd.heavy.stresslog
    <buffer time>
      @type memory
      overflow_action throw_exception
      timekey 10s
      timekey_wait 1s
      flush_thread_interval 10s
      total_limit_size 1073741824 # 1 GB buffer size
    </buffer>
  </match>
</label>

My Observation:
Since throttling limit is very low as compared to the log generation rate of workloads, i am finding that a lot of tw objects gets created to read lines from log. After sometime, CPU usage also exceeds the provided limit and the pod is getting evicted.

Total Time taken for reaching this evicted state: ~3h

Lower than 8192 is meaningless since minimum value is 8192.
in_tail plugin resets it to 8192 forcedly when the specified value is lower than it.

Figure 1 shows the plot of CPU usage for the Fluentd Pod. It steadily increases to 100 % before being thrown into an evicted state (i.e. closed the pod due to excessive memory usage).

How many files are you watching and how often the files are rotated?