Unable to stream fluentbit nginx ingress parsed logs into the bigQuery

Hi,

I’m unable to stream nginx ingress parsed logs into the Google bigQuery.

**** Fluent-bit configuration file.

apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
labels:
k8s-app: fluent-bit
data:

Configuration files: server, input, filters, and output

=======================================================

fluent-bit.conf: |
[SERVICE]
Flush 1
Log_Level default
Daemon off
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020

Plugins_File /fluent-bit/etc/plugins.conf

@INCLUDE input-kubernetes.conf
@INCLUDE filter-kubernetes.conf
@INCLUDE output-bigquery.conf

input-kubernetes.conf: |
[INPUT]
Name tail
Path /var/log/containers/nginx-ingress*.log
Tag kube.*
Parser nginx
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10

Control the log line length

Buffer_Chunk_Size 256k
Buffer_Max_Size 10240k

Using the docker mode to deal with multiline messages emitted by docker

Docker_Mode On
replace_info.lua: |
function replace_sensitive_info(tag, timestamp, record)
– mask social security number
record[“log”] = string.gsub(record[“log”], “%d%d%d%- %d%d%- %d%d%d%d”, “xxx-xx-xxxx”)
– mask credit card number
record[“log”] = string.gsub(record[“log”], “%d%d%d%d *%d%d%d%d *%d%d%d%d %d%d%d%d”, “xxxx xxxx xxxx xxxx”)
– mask email address
record[“log”] = string.gsub(record[“log”], “[%w+%.%- ]+@[%w+%.%- ]+%.%a%a+”, “us…@email.tld”)
return 1, timestamp, record
end
filter-kubernetes.conf: |
[FILTER]
Name kubernetes
Match kube.

Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.

Try to merge the log messages

Merge_Log On
Merge_Log_Key log_processed
K8S-Logging.Parser On
#K8S-Logging.Exclude On

#    ### sample log scrubbing filters    
#[FILTER]
#    Name                lua
#    Match               kube.*
#    # lua script to redact sensitive data in log messages
#    script              replace_info.lua
#    call                replace_sensitive_info
#    ### end sample log scrubbing

output-bigquery.conf: |
[OUTPUT]

write the log records that still have the 'kube. ’ tags.

Name bigquery
Match kube.

The following fields are necessary. They allow filtering.

based on resource types. Change them accordingly based on your setup.

google_service_credentials /etc/bigquery-volume/bigquery.json
project_id a8platformdev
dataset_id fluentbit
table_id nginx-ingress
fetch_schema true
parsers.conf: |
[PARSER]
Name k8s-nginx-ingress
Format regex
Regex ^(?[^ ]) (?[^ ]) (?[^ ]) [(?[^]])] “(?\S+)(?: +(?[^”]?)(?: +\S)?)?" (?[^ ]) (?[^ ])(?: “(?[^”])" “(?[^”])")?
Time_Key time
Time_Format %d/%b/%Y:%H:%M:%S %z
[PARSER]
Name nginx
Format regex
Regex ^(?(?[^ ]) (?[^ ]) (?[^ ]) [(?[^]])] “(?\S+)(?: +(?[^”]) +\S)?" (?[^ ]) (?[^ ])(?: “(?[^”])" “(?[^”])")? (?<request_time>[^ ]) (?<upstream_time>[^ ]) (?[^ ]*)) Time_Key time Time_Format %d/%b/%Y:%H:%M:%S %z [PARSER] Name json Format json Time_Key time Time_Format %d/%b/%Y:%H:%M:%S %z [PARSER] Name docker Format json Time_Key time Time_Format %Y-%m-%dT%H:%M:%S.%L Time_Keep On Decode_Field_As escaped log [PARSER] Name syslog Format regex Regex ^\<(?<pri>[0-9]+)\>(?<time>[^ ]* {1,2}[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)
Time_Key time
Time_Format %b %d %H:%M:%S

**** Fluent-bit logs
Fluent Bit v1.7.2

  • Copyright © 2019-2021 The Fluent Bit Authors
  • Copyright © 2015-2018 Treasure Data
  • Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
  • https://fluentbit.io

[2021/03/16 11:16:39] [ info] [engine] started (pid=1)
[2021/03/16 11:16:39] [ info] [storage] version=1.1.1, initializing…
[2021/03/16 11:16:39] [ info] [storage] in-memory
[2021/03/16 11:16:39] [ info] [storage] normal synchronization mode, checksum disabled, max_chunks_up=128
[2021/03/16 11:16:39] [ info] [filter:kubernetes:kubernetes.0] https=1 host=kubernetes.default.svc port=443
[2021/03/16 11:16:39] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2021/03/16 11:16:39] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with API server…
[2021/03/16 11:16:39] [ info] [filter:kubernetes:kubernetes.0] connectivity OK
[2021/03/16 11:16:39] [ info] [output:bigquery:bigquery.0] project=‘a8platformdev’ dataset=‘fluentbit’ table=‘nginx-access’
[2021/03/16 11:16:39] [ info] [oauth2] HTTP Status=200
[2021/03/16 11:16:39] [ info] [oauth2] access token from ‘www.googleapis.com:443’ retrieved
[2021/03/16 11:16:40] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2021/03/16 11:16:40] [ info] [sp] stream processor started
[2021/03/16 11:16:40] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1827846 watch_fd=1 name=/var/log/containers/nginx-ingress-7dbd96b965-5d4vp_nginx-ingress_nginx-ingress-e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93.log
[2021/03/16 11:27:46] [ info] [input:tail:tail.0] inode=1827846 handle rotation(): /var/log/containers/nginx-ingress-7dbd96b965-5d4vp_nginx-ingress_nginx-ingress-e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93.log => /var/lib/docker/containers/e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93/e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93-json.log.1
[2021/03/16 11:27:46] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=1827846 watch_fd=1
[2021/03/16 11:27:46] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1827846 watch_fd=2 name=/var/lib/docker/containers/e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93/e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93-json.log.1
[2021/03/16 11:27:46] [ info] [input:tail:tail.0] inotify_fs_add(): inode=1827845 watch_fd=3 name=/var/log/containers/nginx-ingress-7dbd96b965-5d4vp_nginx-ingress_nginx-ingress-e6a31fa8f51c9556e5a78a052b8b5967f640d2beb24e6adcc0bc9045a1b2cd93.log

**** Nginx Log Format (Nginx.conf).log_format json_combined escape=json '{ "httpRequest": {' '"requestMethod": "$request_method", ' '"requestUrl": "$request_uri", ' '"responseSize": "$bytes_sent", ' '"status": "$status", ' '"userAgent": "$http_user_agent", ' '"remoteIp": "$remote_addr", ' '"referer": "$http_referer", ' '"host": "$host", ' '"requestTime": "$request_time", ' '"upstreamResponseTime": "$upstream_response_time" }, ' '"time": "$time_local" }'; access_log /var/log/nginx/access.log json_combined;

**** BigQuery table schema.

Field nameTypeModePolicy tags Description
httpRequest
RECORD
NULLABLE

httpRequest. requestMethod
STRING
NULLABLE

httpRequest. requestUrl
STRING
NULLABLE

httpRequest. responseSize
INTEGER
NULLABLE

httpRequest. status
INTEGER
NULLABLE

httpRequest. userAgent
STRING
NULLABLE

httpRequest. remoteIp
STRING
NULLABLE

httpRequest. referer
FLOAT
NULLABLE

httpRequest. host
STRING
NULLABLE

httpRequest. requestTime
FLOAT
NULLABLE

httpRequest. upstreamResponseTime
FLOAT
NULLABLE

time
STRING
NULLABLE

Could you help me to stream logs into bigQuery?

Thanks,
Sai