I have incoming logs with invalid encodings. It is normal and expected: logs actually contain (verbatim) illegal input from the users, like mail log of an extremely bogus username, for example, or a server log listing Postgres error message rejecting invalid UTF-8.
However this cause a lot of pain for fluentd, mainly getting unfriendly errors from Elasticsearch, and keep trying to send inifitely (or, maybe in newer versions: actually losing the logline!).
I have tried - and failed - to sanitize like
<filter> @type record_modifier # try to replace invalid encoding char_encoding utf-8:utf-8 </filter>
but it doesn’t like my idea at all.
Is there anyone with an idea how to [forcibly] replace illegal encoding with U+FFFD (�) or even a question mark?