kubesphere 3.1.1 接入自定义 Prometheus

kubesphere prometheus

字数统计: 2k阅读时长: 10 min

 2021/07/16 

kubesphere 3.1.1 接入自定义 Prometheus

千呼万唤始出来，KubeSphere 3.1.1 终于可以接入自定义 Prometheus 了，以前虽然也支持集成自己的prometheus,但是我们来看下KubeSphere 3.1.1 之前集成自己的prometheus的步骤

卸载 KubeSphere 的自定义 Prometheus 堆栈
安装您自己的 Prometheus 堆栈
将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈
更改 KubeSphere 的 monitoring endpoint

这个步骤，给我的感觉就一个，脱裤子放屁，一点都不优雅，接下来我们试一下使用最新的KubeSphere 3.1.1直接接入自定义的Prometheus

首先这个kubernetes集群里已经安装了prometheus-operator ，位于monitoring 这个namaspace

最小化部署kubesphere

下载配置文件

1
2

wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/kubesphere-installer.yaml
wget https://github.com/kubesphere/ks-installer/releases/download/v3.1.1/cluster-configuration.yaml

官方默认提供的cluster-configuration.yaml是一个最小化安装的配置文件，下面是修改后的文件：

---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.1.1
spec:
  persistence:
    storageClass: "longhorn"        #设置使用的存储类
  authentication:
    jwtSecret: ""           
  local_registry: ""        # Add your private registry address if it is needed.
  dev_tag: ""               # Add your kubesphere image tag you want to install, by default it's same as ks-install release version.
  etcd:
    monitoring: false       # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
    endpointIps: localhost  # etcd cluster EndpointIps. It can be a bunch of IPs here.
    port: 2379              # etcd port.
    tlsEnable: true
  common:
    redis:
      enabled: false
    openldap:
      enabled: false
    minioVolumeSize: 1Gi # Minio PVC size.
    openldapVolumeSize: 1Gi   # openldap PVC size.
    redisVolumSize: 1Gi # Redis PVC size.
    monitoring:
      type: external   # 设置使用自定义的prometheus
      endpoint: http://prometheus-operated.monitoring.svc:9090 # Prometheus 地址
    es:   # Storage backend for logging, events and auditing.
      # elasticsearchMasterReplicas: 1   # The total number of master nodes. Even numbers are not allowed.
      # elasticsearchDataReplicas: 1     # The total number of data nodes.
      elasticsearchMasterVolumeSize: 4Gi   # The volume size of Elasticsearch master nodes.
      elasticsearchDataVolumeSize: 20Gi    # The volume size of Elasticsearch data nodes.
      logMaxAge: 7                     # Log retention time in built-in Elasticsearch. It is 7 days by default.
      elkPrefix: logstash              # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log.
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchUrl: ""
      externalElasticsearchPort: ""
  console:
    enableMultiLogin: true  # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
    port: 30880
    type: NodePort
  alerting:                # (CPU: 0.1 Core, Memory: 100 MiB) It enables users to customize alerting policies to send messages to receivers in time with different time intervals and alerting levels to choose from.
    enabled: true         # Enable or disable the KubeSphere Alerting System.
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:                # Provide a security-relevant chronological set of records，recording the sequence of activities happening on the platform, initiated by different tenants.
    enabled: false         # Enable or disable the KubeSphere Auditing Log System. 
  devops:                  # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image.
    enabled: false             # Enable or disable the KubeSphere DevOps System.
    jenkinsMemoryLim: 2Gi      # Jenkins memory limit.
    jenkinsMemoryReq: 1500Mi   # Jenkins memory request.
    jenkinsVolumeSize: 8Gi     # Jenkins volume size.
    jenkinsJavaOpts_Xms: 512m  # The following three fields are JVM parameters.
    jenkinsJavaOpts_Xmx: 512m
    jenkinsJavaOpts_MaxRAM: 2g
  events:                  # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters.
    enabled: false         # Enable or disable the KubeSphere Events System.
    ruler:
      enabled: true
      replicas: 2
  logging:                 # (CPU: 57 m, Memory: 2.76 G) Flexible logging functions are provided for log query, collection and management in a unified console. Additional log collectors can be added, such as Elasticsearch, Kafka and Fluentd.
    enabled: false         # Enable or disable the KubeSphere Logging System.
    logsidecar:
      enabled: true
      replicas: 2
  metrics_server:                    # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler).
    enabled: false                   # Enable or disable metrics-server.
  monitoring:
    storageClass: ""                 # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default.
    # prometheusReplicas: 1          # Prometheus replicas are responsible for monitoring different segments of data source and providing high availability.
    prometheusMemoryRequest: 400Mi   # Prometheus request memory.
    prometheusVolumeSize: 20Gi       # Prometheus PVC size.
    # alertmanagerReplicas: 1          # AlertManager Replicas.
  multicluster:
    clusterRole: none  # host | member | none  # You can install a solo cluster, or specify it as the Host or Member Cluster.
  network:
    networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods).
      # Make sure that the CNI network plugin used by the cluster supports NetworkPolicy. There are a number of CNI network plugins that support NetworkPolicy, including Calico, Cilium, Kube-router, Romana and Weave Net.
      enabled: false # Enable or disable network policies.
    ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool.
      type: none # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled.
    topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope.
      type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled.
  openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle.
    store:
      enabled: false # Enable or disable the KubeSphere App Store.
  servicemesh:         # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology.
    enabled: false     # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based).
  kubeedge:          # Add edge nodes to your cluster and deploy workloads on edge nodes.
    enabled: false   # Enable or disable KubeEdge.
    cloudCore:
      nodeSelector: {"node-role.kubernetes.io/worker": ""}
      tolerations: []
      cloudhubPort: "10000"
      cloudhubQuicPort: "10001"
      cloudhubHttpsPort: "10002"
      cloudstreamPort: "10003"
      tunnelPort: "10004"
      cloudHub:
        advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided.
          - ""            # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided.
        nodeLimit: "100"
      service:
        cloudhubNodePort: "30000"
        cloudhubQuicNodePort: "30001"
        cloudhubHttpsNodePort: "30002"
        cloudstreamNodePort: "30003"
        tunnelNodePort: "30004"
    edgeWatcher:
      nodeSelector: {"node-role.kubernetes.io/worker": ""}
      tolerations: []
      edgeWatcherAgent:
        nodeSelector: {"node-role.kubernetes.io/worker": ""}
        tolerations: []

注意上述定义的storageClass要根据自己实际情况填写。

部署这两个文件

1 2	kubectl apply -f kubesphere-installer.yaml kubectl apply -f cluster-configuration.yaml

我们可以使用如下命令检查安装日志：

1	kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

安装服务日志最后出现如下错误，可以暂时不用处理，这个后面会提到：

服务会部署一个类型为nodePort类型的service来作为访问入口

[root@master-01 kubesphere]# kubectl get pod,svc -n kubesphere-system 
NAME                                         READY   STATUS      RESTARTS   AGE
pod/ks-apiserver-6f79d49f66-4p88s            1/1     Running     0          105s
pod/ks-console-74cf8b9487-56gxm              1/1     Running     0          3m
pod/ks-controller-manager-668f5fd585-zzd8h   1/1     Running     0          105s
pod/ks-installer-7bd6b699df-pjjmf            1/1     Running     0          6m50s
pod/minio-597cb64f44-stwz6                   1/1     Running     0          4m16s
pod/openpitrix-import-job-2dk99              0/1     Completed   0          2m12s

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
service/ks-apiserver            ClusterIP   10.105.13.140    <none>        80/TCP         3h50m
service/ks-console              NodePort    10.105.229.139   <none>        80:30880/TCP   3h50m
service/ks-controller-manager   ClusterIP   10.106.17.200    <none>        443/TCP        3h50m
service/minio                   ClusterIP   10.102.76.218    <none>        9000/TCP       4m16s

我们可以使用nodeip:nodeport或者自己创建ingress的方式访问，这里使用traefik的ingressroute来定义访问入口

kind: IngressRoute
metadata:
  name: kubesphere
  namespace: kubesphere-system
spec:
  entryPoints:
  - web
  routes:
  - kind: Rule
    match: Host(`kubesphere.lishuai.fun`)
    services:
    - kind: Service
      name: ks-console
      port: 80
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: kubesphere-https
  namespace: kubesphere-system
spec:
  entryPoints:
    - websecure
  routes:
  - kind: Rule
    match: Host(`kubesphere.lishuai.fun`)
    services:
    - kind: Service
      name: ks-console
      port: 80
  tls:
    certResolver: myresolver

我们可以使用默认帐户和密码 (admin/P@88w0rd) 访问 Web 控制台。

将 KubeSphere 自定义组件安装至您的 Prometheus 堆栈

KubeSphere 3.0 使用 Prometheus Operator 来管理 Prometheus/Alertmanager 配置和生命周期、ServiceMonitor（用于管理抓取配置）和 PrometheusRule（用于管理 Prometheus 记录/告警规则）。

我们此时访问web控制台会发现展示监控数据相关的页面均无法正常显示，想要正常显示监控数据，我们需要部署kubesphere提供的prometheus-rules.yaml和prometheus-rulesEtcd.yaml

git clone https://github.com/kubesphere/kube-prometheus.git
cd kube-prometheus/kustomize
sed -i 's/kubesphere-monitoring-system/monitoring/g' prometheus-rulesEtcd.yaml
sed -i 's/kubesphere-monitoring-system/monitoring/g' prometheus-rules.yaml
kubectl apply -f prometheus-rules.yaml -f prometheus-rulesEtcd.yaml

部署后，稍等一两分钟页面即可正常显示监控数据

已知问题

问题一

点击监控告警下的自定义监控页面会提示找不到api ，这个就和我们前面安装日志最后日志出现的报错有关系

使用如下命令部署缺少的crd资源

1
2

kubectl apply -f https://raw.githubusercontent.com/kubesphere/monitoring-dashboard/master/config/crd/bases/monitoring.kubesphere.io_clusterdashboards.yaml
kubectl apply -f https://raw.githubusercontent.com/kubesphere/monitoring-dashboard/master/config/crd/bases/monitoring.kubesphere.io_dashboards.yaml

部署完成后需要执行如下命令重启ks-apiserver