misterli's Blog.

kubesphere 3.1.1 接入自定义 Prometheus

字数统计: 2k阅读时长: 10 min
2021/07/16

kubesphere 3.1.1 接入自定义 Prometheus

千呼万唤始出来,KubeSphere 3.1.1 终于可以接入自定义 Prometheus 了,以前虽然也支持集成自己的prometheus,但是我们来看下KubeSphere 3.1.1 之前集成自己的prometheus的步骤

  1. 卸载 KubeSphere 的自定义 Prometheus 堆栈
  2. 安装您自己的 Prometheus 堆栈
  3. 将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈
  4. 更改 KubeSphere 的 monitoring endpoint

这个步骤,给我的感觉就一个,脱裤子放屁,一点都不优雅,接下来我们试一下使用最新的KubeSphere 3.1.1直接接入自定义的Prometheus

首先这个kubernetes集群里已经安装了prometheus-operator ,位于monitoring 这个namaspace

image-20210716003034486

最小化部署kubesphere

下载配置文件

1
2
wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/kubesphere-installer.yaml
wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/cluster-configuration.yaml

官方默认提供的cluster-configuration.yaml是一个最小化安装的配置文件,下面是修改后的文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.1.1
spec:
persistence:
storageClass: "longhorn" #设置使用的存储类
authentication:
jwtSecret: ""
local_registry: "" # Add your private registry address if it is needed.
dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-install release version.
etcd:
monitoring: false # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
endpointIps: localhost # etcd cluster EndpointIps. It can be a bunch of IPs here.
port: 2379 # etcd port.
tlsEnable: true
common:
redis:
enabled: false
openldap:
enabled: false
minioVolumeSize: 1Gi # Minio PVC size.
openldapVolumeSize: 1Gi # openldap PVC size.
redisVolumSize: 1Gi # Redis PVC size.
monitoring:
type: external # 设置使用自定义的prometheus
endpoint: http://prometheus-operated.monitoring.svc:9090 # Prometheus 地址
es: # Storage backend for logging, events and auditing.
# elasticsearchMasterReplicas: 1 # The total number of master nodes. Even numbers are not allowed.
# elasticsearchDataReplicas: 1 # The total number of data nodes.
elasticsearchMasterVolumeSize: 4Gi # The volume size of Elasticsearch master nodes.
elasticsearchDataVolumeSize: 20Gi # The volume size of Elasticsearch data nodes.
logMaxAge: 7 # Log retention time in built-in Elasticsearch. It is 7 days by default.
elkPrefix: logstash # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log.
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchUrl: ""
externalElasticsearchPort: ""
console:
enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
port: 30880
type: NodePort
alerting: # (CPU: 0.1 Core, Memory: 100 MiB) It enables users to customize alerting policies to send messages to receivers in time with different time intervals and alerting levels to choose from.
enabled: true # Enable or disable the KubeSphere Alerting System.
# thanosruler:
# replicas: 1
# resources: {}
auditing: # Provide a security-relevant chronological set of records,recording the sequence of activities happening on the platform, initiated by different tenants.
enabled: false # Enable or disable the KubeSphere Auditing Log System.
devops: # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image.
enabled: false # Enable or disable the KubeSphere DevOps System.
jenkinsMemoryLim: 2Gi # Jenkins memory limit.
jenkinsMemoryReq: 1500Mi # Jenkins memory request.
jenkinsVolumeSize: 8Gi # Jenkins volume size.
jenkinsJavaOpts_Xms: 512m # The following three fields are JVM parameters.
jenkinsJavaOpts_Xmx: 512m
jenkinsJavaOpts_MaxRAM: 2g
events: # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters.
enabled: false # Enable or disable the KubeSphere Events System.
ruler:
enabled: true
replicas: 2
logging: # (CPU: 57 m, Memory: 2.76 G) Flexible logging functions are provided for log query, collection and management in a unified console. Additional log collectors can be added, such as Elasticsearch, Kafka and Fluentd.
enabled: false # Enable or disable the KubeSphere Logging System.
logsidecar:
enabled: true
replicas: 2
metrics_server: # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler).
enabled: false # Enable or disable metrics-server.
monitoring:
storageClass: "" # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default.
# prometheusReplicas: 1 # Prometheus replicas are responsible for monitoring different segments of data source and providing high availability.
prometheusMemoryRequest: 400Mi # Prometheus request memory.
prometheusVolumeSize: 20Gi # Prometheus PVC size.
# alertmanagerReplicas: 1 # AlertManager Replicas.
multicluster:
clusterRole: none # host | member | none # You can install a solo cluster, or specify it as the Host or Member Cluster.
network:
networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods).
# Make sure that the CNI network plugin used by the cluster supports NetworkPolicy. There are a number of CNI network plugins that support NetworkPolicy, including Calico, Cilium, Kube-router, Romana and Weave Net.
enabled: false # Enable or disable network policies.
ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool.
type: none # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled.
topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope.
type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled.
openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle.
store:
enabled: false # Enable or disable the KubeSphere App Store.
servicemesh: # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology.
enabled: false # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based).
kubeedge: # Add edge nodes to your cluster and deploy workloads on edge nodes.
enabled: false # Enable or disable KubeEdge.
cloudCore:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []
cloudhubPort: "10000"
cloudhubQuicPort: "10001"
cloudhubHttpsPort: "10002"
cloudstreamPort: "10003"
tunnelPort: "10004"
cloudHub:
advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided.
- "" # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided.
nodeLimit: "100"
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
edgeWatcher:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []
edgeWatcherAgent:
nodeSelector: {"node-role.kubernetes.io/worker": ""}
tolerations: []

注意上述定义的storageClass要根据自己实际情况填写。

部署这两个文件

1
2
kubectl apply -f kubesphere-installer.yaml
kubectl apply -f cluster-configuration.yaml

我们可以使用如下命令检查安装日志:

1
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

image-20210716010631751

安装服务日志最后出现如下错误,可以暂时不用处理,这个后面会提到:

image-20210716011208355

服务会部署一个类型为nodePort类型的service来作为访问入口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@master-01 kubesphere]# kubectl get pod,svc -n kubesphere-system 
NAME READY STATUS RESTARTS AGE
pod/ks-apiserver-6f79d49f66-4p88s 1/1 Running 0 105s
pod/ks-console-74cf8b9487-56gxm 1/1 Running 0 3m
pod/ks-controller-manager-668f5fd585-zzd8h 1/1 Running 0 105s
pod/ks-installer-7bd6b699df-pjjmf 1/1 Running 0 6m50s
pod/minio-597cb64f44-stwz6 1/1 Running 0 4m16s
pod/openpitrix-import-job-2dk99 0/1 Completed 0 2m12s

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/ks-apiserver ClusterIP 10.105.13.140 <none> 80/TCP 3h50m
service/ks-console NodePort 10.105.229.139 <none> 80:30880/TCP 3h50m
service/ks-controller-manager ClusterIP 10.106.17.200 <none> 443/TCP 3h50m
service/minio ClusterIP 10.102.76.218 <none> 9000/TCP 4m16s

我们可以使用nodeip:nodeport或者自己创建ingress的方式访问,这里使用traefik的ingressroute来定义访问入口

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
kind: IngressRoute
metadata:
name: kubesphere
namespace: kubesphere-system
spec:
entryPoints:
- web
routes:
- kind: Rule
match: Host(`kubesphere.lishuai.fun`)
services:
- kind: Service
name: ks-console
port: 80
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: kubesphere-https
namespace: kubesphere-system
spec:
entryPoints:
- websecure
routes:
- kind: Rule
match: Host(`kubesphere.lishuai.fun`)
services:
- kind: Service
name: ks-console
port: 80
tls:
certResolver: myresolver

我们可以使用默认帐户和密码 (admin/P@88w0rd) 访问 Web 控制台。

将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈

KubeSphere 3.0 使用 Prometheus Operator 来管理 Prometheus/Alertmanager 配置和生命周期、ServiceMonitor(用于管理抓取配置)和 PrometheusRule(用于管理 Prometheus 记录/告警规则)。

image-20210716011553759

我们此时访问web控制台会发现展示监控数据相关的页面均无法正常显示,想要正常显示监控数据,我们需要部署kubesphere提供的prometheus-rules.yamlprometheus-rulesEtcd.yaml

1
2
3
4
5
git clone https://github.com/kubesphere/kube-prometheus.git
cd kube-prometheus/kustomize
sed -i 's/kubesphere-monitoring-system/monitoring/g' prometheus-rulesEtcd.yaml
sed -i 's/kubesphere-monitoring-system/monitoring/g' prometheus-rules.yaml
kubectl apply -f prometheus-rules.yaml -f prometheus-rulesEtcd.yaml

部署后,稍等一两分钟页面即可正常显示监控数据

image-20210716024442008

image-20210716023922622

image-20210716023959464

已知问题

问题一

点击监控告警下的自定义监控页面会提示找不到api ,这个就和我们前面安装日志最后日志出现的报错有关系

image-20210716020743998

使用如下命令部署缺少的crd资源

1
2
kubectl apply -f https://raw.githubusercontent.com/kubesphere/monitoring-dashboard/master/config/crd/bases/monitoring.kubesphere.io_clusterdashboards.yaml
kubectl apply -f https://raw.githubusercontent.com/kubesphere/monitoring-dashboard/master/config/crd/bases/monitoring.kubesphere.io_dashboards.yaml

部署完成后需要执行如下命令重启ks-apiserver

1
2
kubectl -n kubesphere-system rollout restart deploy/ks-apiserver

重启完成后,访问页面如下

image-20210716021427587

点击创建,系统内置了几个默认的模板,我们这里添加一个redis的dashboard

image-20210716021543542

image-20210716021557850

自定义监控模板可参考:

问题二

监控告警下的告警策略和告警消息无法创建自定义策略

image-20210716023858088

参考:

https://github.com/kubesphere/kubesphere/issues/3880

CATALOG
  1. 1. kubesphere 3.1.1 接入自定义 Prometheus
    1. 1.1. 最小化部署kubesphere
    2. 1.2. 将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈
  2. 2. 已知问题
    1. 2.1. 问题一
    2. 2.2. 问题二