一、EFK vs LPG¶
架构和组件:
- Loki:Loki 是一个开源的水平可扩展日志聚合系统,由 Promtail、Loki 和 Grafana 组成。
- EFK:EFK 是一个集成的解决方案,由 Elasticsearch、Fluentd 和 Kibana 组成。
存储和查询:
- Loki:Loki 使用了基于日志流的存储方式,将日志数据存储为可压缩的块文件,并 达到高度压缩效率。
- EFK:EFK 使用 Elasticsearch 作为中心化的日志存储和索引引擎。
可扩展性和资源消耗:
- Loki:Loki 的水平可扩展性非常好,可以处理大规模的日志数据。
- EFK:Elasticsearch 是一个高度可扩展的分布式存储系统,但它对硬件资源的要求较高,特别是在存储大规模日志数据时。
配置和部署复杂性:
- Loki:Loki 的配置和部署较为简单。通过使用 Promtail 收集日志,并使用 Grafana 进行查询和可视化,可以相对快速地启动和使用。
- EFK:EFK 的配置和部署相对复杂一些。需要配置 Fluentd 的输入、过滤和输出插件,以及 Elasticsearch 和 Kibana 的集群设置。
二、LPG简介¶
Grafana Loki:https://grafana.com/docs/loki/latest/
Github Loki:https://github.com/grafana/helm-charts/tree/main/charts/loki-stack
2.1 Loki架构¶
- Promtail(采集器):Loki 默认客户端,负责采集并上报日志。
- Distributor(分发器): Distributor 是 Loki 的入口组件,负责接收来自客户端的 日志数据,并将其分发给不同的 ingester 节点。
- Ingester(摄取器): Ingester 负责接收并保存来自 Distributor 的日志数据。它将 数据写入本地存储,并将索引相关的元数据发送给 index 组件。
- Index(索引): Index 组件负责管理和维护 Loki 中的索引数据结构。
- Chunks(块文件): Chunks 是 Loki 中日志数据的物理存储形式。
- Querier(查询器): Querier 是用于查询 Loki 中日志数据的组件。

2.2 日志收集方式¶
Promtail 客户端采集日志数据,将其索引并存储在后端持久化存储中。
用户可以使用 LogQL 查询语言来过滤和检索特定的日志记录,并通过 Grafana 的集成 来进行可视化分析。

三、部署配置¶
3.1 数据配置¶
添加 Loki 的 Chart 仓库:
[root@master01 9]# helm repo add grafana https://grafana.github.io/helm-charts
[root@master01 9]# helm repo update
获取 loki-stack 的 Chart 包并解压:
[root@master01 9]# helm search repo loki
[root@master01 9]# helm pull grafana/loki-stack --untar --version 2.9.10
修改所需的 values.yaml
# 修改内容
[root@master01 9]# cd loki-stack/
[root@master01 loki-stack]# vim values.yaml
test_pod:
enabled: true
image: bats/bats:1.8.2
pullPolicy: IfNotPresent
...
loki:
enabled: true
persistence:
enabled: true
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 30Gi
isDefault: true
url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
...
promtail:
enabled: true
config:
logLevel: info
serverPort: 3101
clients:
- url: http://{{ .Release.Name }}:3100/loki/api/v1/push
limits_config:
ingestion_rate_strategy: local
ingestion_rate_mb: 15
ingestion_burst_size_mb: 20
...
grafana:
enabled: true
persistence:
enabled: true
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 10Gi
# 具体修改
## 第3行修改镜像
image: registry.cn-hangzhou.aliyuncs.com/abroad_images/bats:1.8.2
## 第7行下面新增
persistence:
enabled: true
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 30Gi
## 第37行下面新增
limits_config:
ingestion_rate_strategy: local
ingestion_rate_mb: 15
ingestion_burst_size_mb: 20
## 第47行修改
enabled: true
## 第47行下面新增如下内容
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 10Gi
---
# 修改promtail的模板文件
[root@master01 loki-stack]# vim charts/promtail/values.yaml
## 修改第50行内容
50 registry: registry.cn-hangzhou.aliyuncs.com
## 修改第52行内容
52 repository: abroad_images/promtail
## 修改第54行内容
54 tag: 2.7.4
---
# 修改grafana的模板文件
[root@master01 loki-stack]# vim charts/grafana/values.yaml
## 修改第78行内容
78 repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/grafana
## 修改第80行内容
80 tag: "8.3.5"
---
# 修改loki的模板文件
[root@master01 loki-stack]# vim charts/loki/values.yaml
## 修改第2行内容
2 repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/loki
上面配置文件修改后的完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" charts/loki/values.yaml
image:
repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/loki
tag: 2.6.1
pullPolicy: IfNotPresent
ingress:
enabled: false
annotations: {}
hosts:
- host: chart-example.local
paths: []
tls: []
affinity: {}
annotations: {}
tracing:
jaegerAgentHost:
config:
auth_enabled: false
memberlist:
join_members:
- '{{ include "loki.fullname" . }}-memberlist'
ingester:
chunk_idle_period: 3m
chunk_block_size: 262144
chunk_retain_period: 1m
max_transfer_retries: 0
wal:
dir: /data/loki/wal
lifecycler:
ring:
replication_factor: 1
limits_config:
enforce_metric_name: false
reject_old_samples: true
reject_old_samples_max_age: 168h
max_entries_limit_per_query: 5000
schema_config:
configs:
- from: 2020-10-24
store: boltdb-shipper
object_store: filesystem
schema: v11
index:
prefix: index_
period: 24h
server:
http_listen_port: 3100
grpc_listen_port: 9095
storage_config:
boltdb_shipper:
active_index_directory: /data/loki/boltdb-shipper-active
cache_location: /data/loki/boltdb-shipper-cache
shared_store: filesystem
filesystem:
directory: /data/loki/chunks
chunk_store_config:
max_look_back_period: 0s
table_manager:
retention_deletes_enabled: false
retention_period: 0s
compactor:
working_directory: /data/loki/boltdb-shipper-compactor
shared_store: filesystem
extraArgs: {}
extraEnvFrom: []
livenessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
networkPolicy:
enabled: false
client: {}
nodeSelector: {}
persistence:
enabled: false
accessModes:
- ReadWriteOnce
size: 10Gi
labels: {}
annotations: {}
podLabels: {}
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "http-metrics"
podManagementPolicy: OrderedReady
rbac:
create: true
pspEnabled: true
readinessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
replicas: 1
resources: {}
securityContext:
fsGroup: 10001
runAsGroup: 10001
runAsNonRoot: true
runAsUser: 10001
containerSecurityContext:
readOnlyRootFilesystem: true
service:
type: ClusterIP
nodePort:
port: 3100
annotations: {}
labels: {}
targetPort: http-metrics
serviceAccount:
create: true
name:
annotations: {}
automountServiceAccountToken: true
terminationGracePeriodSeconds: 4800
tolerations: []
topologySpreadConstraints:
enabled: false
podDisruptionBudget: {}
updateStrategy:
type: RollingUpdate
serviceMonitor:
enabled: false
interval: ""
additionalLabels: {}
annotations: {}
scheme: null
tlsConfig: {}
prometheusRule:
enabled: false
additionalLabels: {}
rules: []
initContainers: []
extraContainers: []
extraVolumes: []
extraVolumeMounts: []
extraPorts: []
env: []
alerting_groups: []
useExistingAlertingGroup:
enabled: false
configmapName: ""
# granfana完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" charts/grafana/values.yaml
rbac:
create: true
pspEnabled: true
pspUseAppArmor: true
namespaced: false
extraRoleRules: []
extraClusterRoleRules: []
serviceAccount:
create: true
name:
nameTest:
labels: {}
autoMount: true
replicas: 1
headlessService: false
autoscaling:
enabled: false
podDisruptionBudget: {}
deploymentStrategy:
type: RollingUpdate
readinessProbe:
httpGet:
path: /api/health
port: 3000
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 60
timeoutSeconds: 30
failureThreshold: 10
image:
repository: registry.cn-hangzhou.aliyuncs.com/abroad_images/grafana
tag: "8.3.5"
sha: ""
pullPolicy: IfNotPresent
testFramework:
enabled: true
image: "bats/bats"
tag: "v1.4.1"
imagePullPolicy: IfNotPresent
securityContext: {}
securityContext:
runAsUser: 472
runAsGroup: 472
fsGroup: 472
containerSecurityContext:
{}
createConfigmap: true
extraConfigmapMounts: []
extraEmptyDirMounts: []
extraLabels: {}
downloadDashboardsImage:
repository: curlimages/curl
tag: 7.85.0
sha: ""
pullPolicy: IfNotPresent
downloadDashboards:
env: {}
envFromSecret: ""
resources: {}
securityContext: {}
podPortName: grafana
service:
enabled: true
type: ClusterIP
port: 80
targetPort: 3000
annotations: {}
labels: {}
portName: service
appProtocol: ""
serviceMonitor:
enabled: false
path: /metrics
labels: {}
interval: 1m
scheme: http
tlsConfig: {}
scrapeTimeout: 30s
relabelings: []
extraExposePorts: []
hostAliases: []
ingress:
enabled: false
annotations: {}
labels: {}
path: /
pathType: Prefix
hosts:
- chart-example.local
extraPaths: []
tls: []
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
topologySpreadConstraints: []
extraInitContainers: []
extraContainers: ""
extraContainerVolumes: []
persistence:
type: pvc
enabled: false
accessModes:
- ReadWriteOnce
size: 10Gi
finalizers:
- kubernetes.io/pvc-protection
extraPvcLabels: {}
inMemory:
enabled: false
initChownData:
enabled: true
image:
repository: busybox
tag: "1.31.1"
sha: ""
pullPolicy: IfNotPresent
resources: {}
securityContext:
runAsNonRoot: false
runAsUser: 0
adminUser: admin
admin:
existingSecret: ""
userKey: admin-user
passwordKey: admin-password
env: {}
envValueFrom: {}
envFromSecret: ""
envRenderSecret: {}
envFromSecrets: []
envFromConfigMaps: []
enableServiceLinks: true
extraSecretMounts: []
extraVolumeMounts: []
lifecycleHooks: {}
plugins: []
datasources: {}
alerting: {}
notifiers: {}
dashboardProviders: {}
dashboards: {}
dashboardsConfigMaps: {}
grafana.ini:
paths:
data: /var/lib/grafana/
logs: /var/log/grafana
plugins: /var/lib/grafana/plugins
provisioning: /etc/grafana/provisioning
analytics:
check_for_updates: true
log:
mode: console
grafana_net:
url: https://grafana.net
server:
domain: "{{ if (and .Values.ingress.enabled .Values.ingress.hosts) }}{{ .Values.ingress.hosts | first }}{{ else }}''{{ end }}"
ldap:
enabled: false
existingSecret: ""
config: ""
smtp:
existingSecret: ""
userKey: "user"
passwordKey: "password"
sidecar:
image:
repository: quay.io/kiwigrid/k8s-sidecar
tag: 1.19.2
sha: ""
imagePullPolicy: IfNotPresent
resources: {}
securityContext: {}
enableUniqueFilenames: false
readinessProbe: {}
livenessProbe: {}
alerts:
enabled: false
env: {}
label: grafana_alert
labelValue: ""
searchNamespace: null
watchMethod: WATCH
resource: both
reloadURL: "http://localhost:3000/api/admin/provisioning/alerting/reload"
script: null
skipReload: false
sizeLimit: {}
dashboards:
enabled: false
env: {}
SCProvider: true
label: grafana_dashboard
labelValue: ""
folder: /tmp/dashboards
defaultFolderName: null
searchNamespace: null
watchMethod: WATCH
resource: both
folderAnnotation: null
script: null
provider:
name: sidecarProvider
orgid: 1
folder: ''
type: file
disableDelete: false
allowUiUpdates: false
foldersFromFilesStructure: false
extraMounts: []
sizeLimit: {}
datasources:
enabled: false
env: {}
label: grafana_datasource
labelValue: ""
searchNamespace: null
watchMethod: WATCH
resource: both
reloadURL: "http://localhost:3000/api/admin/provisioning/datasources/reload"
script: null
skipReload: false
initDatasources: false
sizeLimit: {}
plugins:
enabled: false
env: {}
label: grafana_plugin
labelValue: ""
searchNamespace: null
watchMethod: WATCH
resource: both
reloadURL: "http://localhost:3000/api/admin/provisioning/plugins/reload"
script: null
skipReload: false
initPlugins: false
sizeLimit: {}
notifiers:
enabled: false
env: {}
label: grafana_notifier
labelValue: ""
searchNamespace: null
watchMethod: WATCH
resource: both
reloadURL: "http://localhost:3000/api/admin/provisioning/notifications/reload"
script: null
skipReload: false
initNotifiers: false
sizeLimit: {}
namespaceOverride: ""
revisionHistoryLimit: 10
imageRenderer:
deploymentStrategy: {}
enabled: false
replicas: 1
image:
repository: grafana/grafana-image-renderer
tag: latest
sha: ""
pullPolicy: Always
env:
HTTP_HOST: "0.0.0.0"
serviceAccountName: ""
securityContext: {}
containerSecurityContext:
capabilities:
drop: ['ALL']
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
hostAliases: []
priorityClassName: ''
service:
enabled: true
portName: 'http'
port: 8081
targetPort: 8081
appProtocol: ""
grafanaProtocol: http
grafanaSubPath: ""
podPortName: http
revisionHistoryLimit: 10
networkPolicy:
limitIngress: true
limitEgress: false
resources: {}
nodeSelector: {}
tolerations: []
affinity: {}
networkPolicy:
enabled: false
ingress: true
allowExternal: true
explicitNamespacesSelector: {}
egress:
enabled: false
ports: []
enableKubeBackwardCompatibility: false
useStatefulSet: false
extraObjects: []
# promtail完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" charts/promtail/values.yaml
nameOverride: null
fullnameOverride: null
daemonset:
enabled: true
deployment:
enabled: false
replicaCount: 1
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage:
secret:
labels: {}
annotations: {}
configmap:
enabled: false
initContainer: []
image:
registry: registry.cn-hangzhou.aliyuncs.com
repository: abroad_images/promtail
tag: 2.7.4
pullPolicy: IfNotPresent
imagePullSecrets: []
annotations: {}
updateStrategy: {}
podLabels: {}
podAnnotations: {}
priorityClassName: null
livenessProbe: {}
readinessProbe:
failureThreshold: 5
httpGet:
path: "{{ printf `%s/ready` .Values.httpPathPrefix }}"
port: http-metrics
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources: {}
podSecurityContext:
runAsUser: 0
runAsGroup: 0
containerSecurityContext:
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false
rbac:
create: true
pspEnabled: false
namespace: null
serviceAccount:
create: true
name: null
imagePullSecrets: []
annotations: {}
nodeSelector: {}
affinity: {}
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
defaultVolumes:
- name: run
hostPath:
path: /run/promtail
- name: containers
hostPath:
path: /var/lib/docker/containers
- name: pods
hostPath:
path: /var/log/pods
defaultVolumeMounts:
- name: run
mountPath: /run/promtail
- name: containers
mountPath: /var/lib/docker/containers
readOnly: true
- name: pods
mountPath: /var/log/pods
readOnly: true
extraVolumes: []
extraVolumeMounts: []
extraArgs: []
extraEnv: []
extraEnvFrom: []
enableServiceLinks: true
serviceMonitor:
enabled: false
namespace: null
namespaceSelector: {}
annotations: {}
labels: {}
interval: null
scrapeTimeout: null
relabelings: []
metricRelabelings: []
targetLabels: []
scheme: http
tlsConfig: null
prometheusRule:
enabled: false
additionalLabels: {}
rules: []
extraContainers: {}
extraPorts: {}
podSecurityPolicy:
privileged: true
allowPrivilegeEscalation: true
volumes:
- 'secret'
- 'hostPath'
- 'downwardAPI'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'RunAsAny'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: true
requiredDropCapabilities:
- ALL
config:
logLevel: info
serverPort: 3101
clients:
- url: http://loki-gateway/loki/api/v1/push
snippets:
pipelineStages:
- cri: {}
common:
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
separator: /
source_labels:
- namespace
- app
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
replacement: /var/log/pods/*$1/*.log
regex: true/(.*)
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
addScrapeJobLabel: false
extraLimitsConfig: ""
extraServerConfigs: ""
extraScrapeConfigs: ""
extraRelabelConfigs: []
scrapeConfigs: |
- job_name: kubernetes-pods
pipeline_stages:
{{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_controller_name
regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
action: replace
target_label: __tmp_controller_name
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- __meta_kubernetes_pod_label_app
- __tmp_controller_name
- __meta_kubernetes_pod_name
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: app
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- __meta_kubernetes_pod_label_release
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: instance
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
- __meta_kubernetes_pod_label_component
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: component
{{- if .Values.config.snippets.addScrapeJobLabel }}
- replacement: kubernetes-pods
target_label: scrape_job
{{- end }}
{{- toYaml .Values.config.snippets.common | nindent 4 }}
{{- with .Values.config.snippets.extraRelabelConfigs }}
{{- toYaml . | nindent 4 }}
{{- end }}
file: |
server:
log_level: {{ .Values.config.logLevel }}
http_listen_port: {{ .Values.config.serverPort }}
{{- with .Values.httpPathPrefix }}
http_path_prefix: {{ . }}
{{- end }}
{{- tpl .Values.config.snippets.extraServerConfigs . | nindent 2 }}
clients:
{{- tpl (toYaml .Values.config.clients) . | nindent 2 }}
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
{{- tpl .Values.config.snippets.scrapeConfigs . | nindent 2 }}
{{- tpl .Values.config.snippets.extraScrapeConfigs . | nindent 2 }}
limits_config:
{{- tpl .Values.config.snippets.extraLimitsConfig . | nindent 2 }}
networkPolicy:
enabled: false
metrics:
podSelector: {}
namespaceSelector: {}
cidrs: []
k8sApi:
port: 8443
cidrs: []
httpPathPrefix: ""
sidecar:
configReloader:
enabled: false
image:
registry: docker.io
repository: jimmidyson/configmap-reload
tag: v0.8.0
pullPolicy: IfNotPresent
extraArgs: []
extraEnv: []
extraEnvFrom: []
containerSecurityContext:
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
allowPrivilegeEscalation: false
readinessProbe: {}
livenessProbe: {}
resources: {}
config:
serverPort: 9533
serviceMonitor:
enabled: true
extraObjects: []
# 自身values.yaml完整配置文件
[root@master01 loki-stack]# egrep -v "#|^$" values.yaml
test_pod:
enabled: true
image: registry.cn-hangzhou.aliyuncs.com/abroad_images/bats:1.8.2
pullPolicy: IfNotPresent
loki:
enabled: true
persistence:
enabled: true
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 30Gi
isDefault: true
url: http://{{(include "loki.serviceName" .)}}:{{ .Values.loki.service.port }}
readinessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
livenessProbe:
httpGet:
path: /ready
port: http-metrics
initialDelaySeconds: 45
datasource:
jsonData: "{}"
uid: ""
promtail:
enabled: true
config:
logLevel: info
serverPort: 3101
clients:
- url: http://{{ .Release.Name }}:3100/loki/api/v1/push
limits_config:
ingestion_rate_strategy: local
ingestion_rate_mb: 15
ingestion_burst_size_mb: 20
fluent-bit:
enabled: false
grafana:
enabled: true
storageClassName: nfs-storage
accessModes:
- ReadWriteOnce
size: 10Gi
sidecar:
datasources:
label: ""
labelValue: ""
enabled: true
maxLines: 1000
image:
tag: 8.3.5
prometheus:
enabled: false
isDefault: false
url: http://{{ include "prometheus.fullname" .}}:{{ .Values.prometheus.server.service.servicePort }}{{ .Values.prometheus.server.prefixURL }}
datasource:
jsonData: "{}"
filebeat:
enabled: false
filebeatConfig:
filebeat.yml: |
filebeat.inputs:
- type: container
paths:
- /var/log/containers/*.log
processors:
- add_kubernetes_metadata:
host: ${NODE_NAME}
matchers:
- logs_path:
logs_path: "/var/log/containers/"
output.logstash:
hosts: ["logstash-loki:5044"]
logstash:
enabled: false
image: grafana/logstash-output-loki
imageTag: 1.0.1
filters:
main: |-
filter {
if [kubernetes] {
mutate {
add_field => {
"container_name" => "%{[kubernetes][container][name]}"
"namespace" => "%{[kubernetes][namespace]}"
"pod" => "%{[kubernetes][pod][name]}"
}
replace => { "host" => "%{[kubernetes][node][name]}"}
}
}
mutate {
remove_field => ["tags"]
}
}
outputs:
main: |-
output {
loki {
url => "http://loki:3100/loki/api/v1/push"
}
}
proxy:
http_proxy: ""
https_proxy: ""
no_proxy: ""
3.2 部署验证¶
[root@master01 loki-stack]# kubectl create ns logging
[root@master01 loki-stack]# helm upgrade --install loki -n logging -f values.yaml .
# 后面如果要卸载,可执行下面内容
$ helm uninstall loki -n logging
查看验证:
# 查看pod
[root@master01 loki-stack]# kubectl get pods -n logging |grep loki
loki-0 1/1 Running 0 11m
loki-grafana-8667dc7b46-cnh8d 2/2 Running 0 11m
loki-promtail-4qpgf 1/1 Running 0 11m
loki-promtail-8d25j 1/1 Running 0 11m
loki-promtail-l9msz 1/1 Running 0 11m
loki-promtail-s67t8 1/1 Running 0 11m
loki-promtail-t728x 1/1 Running 0 11m
# 查看服务
[root@master01 loki-stack]# kubectl -n logging get svc |grep loki
loki ClusterIP 192.168.9.209 <none> 3100/TCP 11m
loki-grafana ClusterIP 192.168.252.13 <none> 80/TCP 11m
loki-headless ClusterIP None <none> 3100/TCP 11m
loki-memberlist ClusterIP None <none> 7946/TCP 11m
获取grafana的密码:
[root@master01 loki-stack]# kubectl get secret --namespace logging loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
#密码
21hubL5ZXNVG6ZPvfigKeWV9FBfYGxYAEseT1YZy
创建ing:
[root@master01 loki-stack]# vim grafana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
namespace: logging
name: grafana-ingress
spec:
ingressClassName: nginx
rules:
- host: grafana-logging.zhang-qing.com
http:
paths:
- pathType: Prefix
backend:
service:
name: loki-grafana
port:
number: 80
path: /
# 应用
[root@master01 loki-stack]# kaf grafana-ingress.yaml
测试验证:
[root@master01 loki-stack]# curl grafana-logging.zhang-qing.com -i
HTTP/1.1 302 Found
Date: Wed, 16 Apr 2025 00:31:35 GMT
Content-Type: text/html; charset=utf-8
Content-Length: 29
Connection: keep-alive
Cache-Control: no-cache
Expires: -1
Location: /login
Pragma: no-cache
Set-Cookie: redirect_to=%2F; Path=/; HttpOnly; SameSite=Lax
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-Xss-Protection: 1; mode=block
<a href="/login">Found</a>.
使用用户名 admin 和上面的获取的密码即可登录 Grafana;
由于 Helm Chart 已经为 Grafana 配置好了 Loki 的数据源,所以我们可以直接获取到日志数据了。
点击左侧 Explore 菜单,然后就可以筛选 Loki 的日志数据了:



使用 Helm 安装的 Promtail 默认已经帮我们做好了配置,已经针对 Kubernetes 做了优化,我们可以查看其配置:
# 下载jq
[root@master01 loki-stack]# yum install -y epel-release
[root@master01 loki-stack]# yum install -y jq
# 查看配置信息
[root@master01 loki-stack]# kubectl get secret loki-promtail -n logging -o json | jq -r '.data."promtail.yaml"' | base64 -d
server:
log_level: info
http_listen_port: 3101
clients:
- url: http://loki:3100/loki/api/v1/push
positions:
filename: /run/promtail/positions.yaml
scrape_configs:
# See also https://github.com/grafana/loki/blob/master/production/ksonnet/promtail/scrape_config.libsonnet for reference
- job_name: kubernetes-pods
pipeline_stages:
- cri: {}
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels:
- __meta_kubernetes_pod_controller_name
regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
action: replace
target_label: __tmp_controller_name
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_name
- __meta_kubernetes_pod_label_app
- __tmp_controller_name
- __meta_kubernetes_pod_name
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: app
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_instance
- __meta_kubernetes_pod_label_release
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: instance
- source_labels:
- __meta_kubernetes_pod_label_app_kubernetes_io_component
- __meta_kubernetes_pod_label_component
regex: ^;*([^;]+)(;.*)?$
action: replace
target_label: component
- action: replace
source_labels:
- __meta_kubernetes_pod_node_name
target_label: node_name
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: namespace
- action: replace
replacement: $1
separator: /
source_labels:
- namespace
- app
target_label: job
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: pod
- action: replace
source_labels:
- __meta_kubernetes_pod_container_name
target_label: container
- action: replace
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
- action: replace
regex: true/(.*)
replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
- __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
- __meta_kubernetes_pod_container_name
target_label: __path__
limits_config:
四、Loki查询案例¶
4.1 日志选择器¶
对于查询表达式的标签部分,将其用大括号括起来{},然后使用键值语法选择标签。多个标签表达式用逗号分隔:
= 完全相等。
!= 不相等。
=~ 正则表达式匹配。
!~ 不进行正则表达式匹配。
# 根据任务名称来查找日志
{app="ingress-nginx"}
{job="devops/metallb"}
{namespace="default",app="podstdr2"}
{namespace="default",app="counterlog"}
{app=~"kube-state-metrics|prometheus|zookeeper"}
4.2 使用日志过滤器来查找¶
编写日志流选择器后,您可以通过编写搜索表达式来进一步过滤结果
|= 行包含字符串
!= 行不包含字符串。
|~ 行匹配正则表达式。
!~ 行与正则表达式不匹配。
regex表达式接受RE2语法。默认情况下,匹配项区分大小写,并且可以将regex切换为不区分大小写的前缀(?i)。
1. 精确查找名称空间为logging下container为zookeeper且包含有INFO关键字的日志
{namespace="logging",container="zookeeper"} |= "INFO"
2. 正则查找
{job="huohua/svc-huohua-batch"} |~ "(duration|latency)s*(=|is|of)s*[d.]+"
3. 不包含。
{job="mysql"} |= "error" != "timeout"
五、常见问题¶
5.1 问题(1)¶
提示找不到/var/log/pods目录下的日志文件,无法tail。
level=error ts=2023-07-17T03:22:11.682802445Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log: no such file or directory" filename=/var/log/pods/kube-system_kube-apiserver-master3_a8daf137c2a2ea7ef925aaef1e82ac16/kube-apiserver/13.log
level=error ts=2023-07-17T03:22:11.682823944Z caller=filetarget.go:307 msg="failed to tail file, stat failed" error="stat /var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log: no such file or directory" filename=/var/log/pods/kube-system_kube-scheduler-master3_bdef86673f60f833d12eb8a3ad337fac/kube-scheduler/1.log
首先我们可以进入promtail容器内,到该目录下查看下是否有该文件,通过cat命令看看是否有日志。
默认安装promtail,它会将主机 /var/log/pods 和 /var/lib/docker/containers目录通过volumes方式挂载到promtail容器内。
如果安装docker和k8s都是采用默认配置,应该不会存在读不到日志的问题。
{
"name": "docker",
"hostPath": {
"path": "/var/lib/docker/containers",
"type": ""
}
},
{
"name": "pods",
"hostPath": {
"path": "/var/log/pods",
"type": ""
}
}
但是我们这边真实的企业场景是将docker的数据目录挂载磁盘/data目录下,所以需要修改默认volumes配置。
修改步骤:
$ vim values.yaml
promtail:
enabled: true
extraVolumes:
- name: docker
hostPath:
path: /data/docker/containers
extraVolumeMounts:
- name: docker
mountPath: /data/docker/containers
readOnly: true
config:
logLevel: info
serverPort: 3101
clients:
- url: http://{{ .Release.Name }}:3100/loki/api/v1/push
上面volumes和volumeMounts都要修改,因为 /var/log/pods 目录下的日志文件其实是个软链接,指向的是 docker/containers 目录下的日志文件。
如果只修改了volumes,那么promtail容器内可以找到日志文件,但是打开确实空的,因为它只是个软连接。
[root@node1 log]# ll /var/log/pods/monitoring_promtail-bs5cs_5bc5bc90-bac9-480d-b291-4caadeff2236/promtail/
total 4
lrwxrwxrwx 1 root root 162 Dec 17 14:04 0.log -> /data/docker/containers/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f/db45d5118e9508817e1a2efa3c9da68cfe969a2b0a3ed42619ff61a29cc64e5f-json.log
5.2 问题(2)¶
Loki日志系统收集日志报429错误:
level=warn ts=2023-07-17T03:42:34.456086325Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
level=warn ts=2023-07-17T03:42:35.144739805Z caller=client.go:369 component=client host=loki:3100 msg="error sending batch, will retry" status=429 error="server returned HTTP status 429 Too Many Requests (429): Ingestion rate limit exceeded for user fake (limit: 4194304 bytes/sec) while attempting to ingest '5381' lines totaling '1048504' bytes, reduce log volume or contact your Loki administrator to see if the limit can be increased"
收集的日志太多了,超过了 loki 的限制,所以会报 429 错误,如果你要增加限制可以修改 loki 的配置文件:
promtail:
enabled: true
extraVolumes:
- name: docker
hostPath:
path: /data/docker/containers
extraVolumeMounts:
- name: docker
mountPath: /data/docker/containers
readOnly: true
config:
logLevel: info
serverPort: 3101
clients:
- url: http://{{ .Release.Name }}:3100/loki/api/v1/push
limits_config:
#将直接将日志数据发送到运行在本地的 Loki 实例
ingestion_rate_strategy: local
#每个用户每秒的采样率限制
ingestion_rate_mb: 15
#每个用户允许的采样突发大小
ingestion_burst_size_mb: 20