Setup Graylog in K8s and Istio

Background

系列文章第2篇. 本文讲述如何在K8s下搭建Graylog, 作为demo环境; 并配置Istio ingress使其internet accessible.

主要内容:

  • Graylog基础
  • GKE创建K8s+Istion
  • Deploy Graylog, 及dependency: mongo和elasticsearch
  • 运行demo app, 确认日志到达Graylog

Introduction to Graylog

Graylog, 开源的日志一体化解决方案,其实是 Graylog stack, 包含:

  • Graylog本身: UI, config management
  • MongoDB: 存储Graylog metadata(非log data)
  • ElasticSearch: 存储log, searching engine

Graylog’s Input

Very flexible, 支持常见的format:

  • Ingest syslog
  • Ingest journald
  • Ingest Raw/Plaintext
  • Ingest GELF: TCP, UDP, HTTP, Kafka
  • Ingest from files
  • Ingest JSON path from HTTP API
  • AWS logs

Graylog可以config多个input; 每个Input独立accept message;

如GELF TCP, 需config bind addr, port, tls等; Config完成后, Graylog会listen此addr+port.

20210519080009

Graylog Streams

Graylog存储时支持logging message分离, 分为不同的stream(每个stream ElasticSearch有独立的index);

即分治法, 将logging message按自定义规则group为不同的逻辑分组,如 HTTP500, HTTP200; 搜索时只搜索HTTP500, 没必要每次全局搜索.

Note:

  • Graylog可以config多个stream, stream间独立
  • 每个incoming message都会根据routing rule, route到特定的stream
  • 一个Message可以被route到多个stream

e.g following message:

1
2
3
4
5
6
7
8
9
10
11
message: INSERT failed (out of disk space)
level: 3 (error)
source: database-host-1

message: Added user 'foo'.
level: 6 (informational)
source: database-host-2

message: smtp ERR: remote closed the connection
level: 3 (error)
source: application-x

只想看DB error, create a stream, rule为:

  • Field level must be greater than 4
  • Field source must match regular expression ^database-host-\d+

Graylog stream matching其实是为message add field: array type streams, 存储stream的ID. 后续ElasticSearch可以根据此field建立索引.

20210519081504

Graylog doc: Streams

Demo version

  • Kubernetes: 1.19
  • Graylog: 3.0
  • MongoDB: 3
  • ElasticSearch: 6.7.2

All demo code can be downloaded in github

Graylog in Docker

官方文档提供了Graylog的docker-compose file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
version: '3'
services:
# MongoDB: https://hub.docker.com/_/mongo/
mongo:
image: mongo:4.2
networks:
- graylog
# Elasticsearch: https://www.elastic.co/guide/en/elasticsearch/reference/7.10/docker.html
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
environment:
- http.host=0.0.0.0
- transport.host=localhost
- network.host=0.0.0.0
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
deploy:
resources:
limits:
memory: 1g
networks:
- graylog
# Graylog: https://hub.docker.com/r/graylog/graylog/
graylog:
image: graylog/graylog:4.0
environment:
# CHANGE ME (must be at least 16 characters)!
- GRAYLOG_PASSWORD_SECRET=somepasswordpepper
# Password: admin
- GRAYLOG_ROOT_PASSWORD_SHA2=8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918
- GRAYLOG_HTTP_EXTERNAL_URI=http://127.0.0.1:9000/
entrypoint: /usr/bin/tini -- wait-for-it elasticsearch:9200 -- /docker-entrypoint.sh
networks:
- graylog
restart: always
depends_on:
- mongo
- elasticsearch
ports:
# Graylog web interface and REST API
- 9000:9000
# Syslog TCP
- 1514:1514
# Syslog UDP
- 1514:1514/udp
# GELF TCP
- 12201:12201
# GELF UDP
- 12201:12201/udp
networks:
graylog:
driver: bridge

Note:

  • 本地放访问URL需要与GRAYLOG_HTTP_EXTERNAL_URI设置的一致, localhost无法访问127.0.0.1
  • 初始ID/Pass: admin/admin

Create Kubernetes on GKE

1
2
3
4
5
6
7
8
9
10
11
# create cluster, v1.17
gcloud container clusters create ${CLUSTER_NAME} --cluster-version=1.17 --num-nodes=3
gcloud container clusters get-credentials ${CLUSTER_NAME}
# install istio 1.7
istioctl install --set profile=demo
kubectl create namespace istioinaction
kubectl config set-context $(kubectl config current-context) --namespace=istioinaction
# rename current context
kubectl ctx istio_test=.
# get ingress's IP addr
export URL=$(kubectl -n istio-system get svc istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')

Note:

  • 需要安装gcloud, Istioctl 1.7, kubectl的两个插件: ctx, ns
  • Istio默认安装好后会spinup LB, 作为cluster的ingress point; 我们需要记录下来之后config graylog
  • 此脚本将当前context设置为新建立的cluster, 并命名为

Deploy Graylog in K8s

首先create demo namespace: k create ns graylog-demo

Deploy Graylog dependent component, 代码在这里下载:

1
2
k apply -f mongo-deploy.yaml
k apply -f es-deploy.yaml

修改graylog-deploy.yaml, 设置GRAYLOG_HTTP_EXTERNAL_URI为ingress-controller的IP addr, 即上一步得到的$URL, e.g:

1
2
- name: GRAYLOG_HTTP_EXTERNAL_URI
value: http://34.116.94.91/

Deploy graylog: k apply -f graylog-deploy.yaml

确认deploy正常: k get pods -w

1
2
3
4
5
NAME                              READY   STATUS    RESTARTS   AGE
es-deploy-86c7dfcb7b-7684m 1/1 Running 0 7m17s
es-deploy-86c7dfcb7b-g4ks5 1/1 Running 0 7m17s
graylog-deploy-6866cc494d-sc4tm 1/1 Running 0 4s
mongo-deploy-5864d85d5b-cx7jt 1/1 Running 0 7m42s

Config Istio Ingress

Istio默认安装提供了ingress controller, 我们需要配置route, 将HTTP request引入到Graylog中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: myk-ingress-gateway
namespace: graylog-demo
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: graylog-virtualservice
namespace: graylog-demo
spec:
hosts:
- "*"
gateways:
- myk-ingress-gateway
http:
- route:
- destination:
host: graylog3
port:
number: 80
  • Gateway用来设置ingress-controller的listen port: listen 80端口
  • VirtualService设置listen port的routing rule: route到K8s service graylog3 (k8s service 对应内部DNS name)

回顾graylog3定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Service
metadata:
name: graylog3
namespace: graylog-demo
spec:
selector:
service: graylog-deploy
ports:
- name: http-dashboard
port: 80
targetPort: 9000
- name: tcp-input
port: 12201
targetPort: 12201

Port 80 route到Pod 9000端口,即Graylog的Web UI.

我们访问http://34.116.94.91即可看到Graylog dashboard, ID/Pass: admin/admin

20210519094139

到此,我们Graylog demo deploy完成; 工作环境需要给Graylog stack附加持久化存储,即PVC.

Structured log and GELF

一般建议log输出为JSON, 并采用structure log: 即将关键信息分离到各个fields, 而不是混合输出为同一个fields

不可取:

1
log.Errorf("requestID %s failed with HTTP code %d", requestID, httpCode)

应该:

1
2
3
log.WithField("requestID", %s).
WithField("HTTP code", %d).
Error("request failed")

而GELF是Graylog建议的log fields”约定”, 遵循structured log, 标准化一些fields, 用来替代之前流行的syslog标准:

  • GELF message是JSON string
  • GELF内置data types, log需要遵循data type约定,否则Graylog parse时会报错
  • Mandatory fields:
  • version: type string (UTF-8), GELF spec version, e.g 1.1
  • host: type string (UTF-8), name of the host, source or application
  • short_message: string (UTF-8), short descriptive message
  • Optional GELF fields
  • full_message:
  • timestamp:
  • level: type number, standard syslog levels, DEFAULT 1 (ALERT)
  • _[additional field] : Other custom fields + type string or number, 程序自定义的fields + Log library需要给fields附加prefix_

Example GELF message payload:

1
2
3
4
5
6
7
8
9
10
11
{
"version": "1.1",
"host": "example.org",
"short_message": "A short message that helps you identify what is going on",
"full_message": "Backtrace here\n\nmore stuff",
"timestamp": 1385053862.3072,
"level": 1,
"_user_id": 9001,
"_some_info": "foo",
"_some_env_var": "bar"
}

Syslog的severity level:

20210520085811

本文的Demo设置Graylog 接受GELF TCP, 即用TCP协议传输log message, 可以直接测试 via TCP message:

1
echo -n -e '{ "version": "1.1", "host": "example.org", "short_message": "A short message", "level": 5, "_some_info": "foo" }'"\0" | nc -w0 graylog.example.com 12201

官方doc: GELF

Send logs in k8s cluster to graylog

Config Graylog Input

Setup 两个Input: HTTP 和 TCP

GELF HTTP, listen 12201

20210520083416

Send log directly via cronjob

我们设置cronjob来通过HTTP向Graylog发送日志, using GELF format

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: curl-cron-job
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: curl-job
image: alpine:3.9.4
args:
- /bin/sh
- -c
- apk add curl -y; while true; do curl -XPOST http://graylog3:12201/gelf -p0 -d '{"short_message":"Hello there, i am your corn job ;)", "host":"alpine-k8s.org", "facility":"test", "_foo":"bar"}';sleep 1s; done
restartPolicy: OnFailure

Create cronjob: k apply -f log_generate_cronjob.yaml

循环向Graylog发送log: curl -XPOST http://graylog3:12201/gelf, inside cluster

1
2
3
4
5
6
{
"short_message": "Hello there, i am your corn job ;)",
"host": "alpine-k8s.org",
"facility": "test",
"_foo": "bar"
}

Demo log message包含了GELF mandatory fields: short_messagehost(似乎不加version也被接受)

可以从Graylog dashboard看到log message:

20210520084552

Config Fluentd to send log message

直接向Graylog发送log message成功,接下来我们加入Fluentd:

  • Container产生log到stdout, 被docker logging driver捕获,写入到host的/var/log/containersfolder
  • Fluentd监视/var/log, 发送日志到Graylog, via GELF TCP, port 12201

首先Clean up:

1
k delete -f log_generate_cronjob.yaml

删除Graylog GELF HTTP input, 并Create GELF TCP input

Deploy Fluentd Daemonset: 关键config

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1-debian-graylog
imagePullPolicy: IfNotPresent
env:
- name: FLUENT_GRAYLOG_HOST
value: "graylog3.graylog-demo.svc.cluster.local"
- name: FLUENT_GRAYLOG_PORT
value: "12201"
- name: FLUENT_GRAYLOG_PROTOCOL
value: "tcp"
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true

令Fluentd向Graylog server发送log message, via TCP port 12201. 这里采用默认Fluentd配置.

Create daemonset: k apply -f fluentd_daemonset.yaml

观察Graylog, 发现有新的log:

对比我们发送的logging message:

1
2
3
4
5
6
{
"time": "2021-04-30 00:21:24.383 +00:00",
"message": "frank debug",
"severity": "info",
"level": 6
}
20210520092645

一些有趣的现象:

  • 多了一些fields, 添加docker, k8s metadata:
    • docker
    • kubernetes
    • source
    • stream
    • tag: tag added by Fluentd
  • 同时发现我们的JSON log message, 作为string存在message fields, expected, 因为Docker logging driver作了转换: 从STDOUT获取的每一行,无论format, 都作为string;

我们需要能从Docker logging driver记录的log unpack 我们的fields, 因此需要进一步配置Fluentd.

同时可以SSH登录Node, 验证logging file内容,确实是被Docker logging driver统一处理过:

1
sudo tail -f /var/log/containers/graylog-deploy-6866cc494d-mbvmx_graylog-demo_graylog3-25783c1d79760b91a6c6d0650524aa6631d02fde25b6c7a5fa63691d79339afe.log
1
2
{"log":"{\"time\": \"2021-04-30 00:21:24.383 +00:00\", \"message\": \"frank debug\", severity: \"info\", level: 6}\n","stream":"stdout","time":"2021-05-19T23:21:00.604887883Z"}
{"log":"{\"time\": \"2021-04-30 00:21:24.383 +00:00\", \"message\": \"frank debug\", severity: \"info\", level: 6}\n","stream":"stdout","time":"2021-05-19T23:21:05.606245351Z"}

直接利用GKE的logging solution

GKE默认采用fluent-bit作为logging collector, 并提供logging dashboard, 见Customizing Cloud Logging logs for Google Kubernetes Engine with Fluentd

Reference

Graylog With Kubernetes in GKE