Monitoring of Data Workload Operators

By Chi Fujii Chi Fujii
Monitoring of Data Workload Operators

Introduction

Monitoring is a critical aspect of running data workloads in Kubernetes. As we develop the plugin ecosystem for OpenEverest, we are currently researching how various operators handle monitoring to ensure our integrations follow industry best practices. Different operators have adopted various approaches to expose metrics and integrate with monitoring stacks. This blog post explores how some operators implement monitoring and observability for their respective data workloads. We focus specifically on metrics collection and monitoring integration, while distributed tracing may be explored in a future post.

Monitoring Integration Patterns

Many Kubernetes data workload operators follow similar patterns for monitoring:

  • Metrics Exporters: Dedicated containers or sidecars that expose metrics
  • Prometheus Integration: The de facto standard for metrics collection in Kubernetes
  • Service Discovery: Automatic discovery of monitoring endpoints using Kubernetes service discovery
  • Grafana Dashboards: Pre-built dashboards for visualizing metrics

Operator Comparison

The following table summarizes monitoring capabilities across different operators:

OperatorMetrics ExposureMonitoringDashboard
ClickHouse OperatorBuilt-inPrometheusGrafana
Milvus OperatorBuilt-inPrometheusGrafana
Kafka Operator (Strimzi)JMX ExporterPrometheusGrafana
Redis OperatorRedis Exporter (sidecar)PrometheusGrafana
CloudNativePG OperatorBuilt-inPrometheusGrafana
TiDB OperatorBuilt-inPrometheus / VictoriaMetricsGrafana + Custom

Understanding Prometheus Operator Custom Resources

The Prometheus Operator introduces Custom Resources (CRs) that simplify the configuration of Prometheus monitoring in Kubernetes. Two key resources are ServiceMonitor and PodMonitor. Both CRs provide automatic service discovery, eliminating the need to manually update Prometheus configuration files when pods or services are added or removed.

ServiceMonitor

ServiceMonitor is a CR that declaratively specifies how groups of Kubernetes services should be monitored. Instead of manually configuring Prometheus scrape targets, you define a ServiceMonitor that references services using label selectors.

ServiceMonitor is ideal when:

  • Metrics are exposed via Kubernetes Services
  • You want to monitor all pods behind a service uniformly

PodMonitor

PodMonitor is similar to ServiceMonitor but directly targets pods instead of services. This is useful when you need to scrape metrics from pods that don’t have a corresponding service, or when you need more granular control over individual pod monitoring.

PodMonitor is ideal when:

  • Pods expose metrics without going through a service
  • Metrics endpoints are pod-specific (e.g., individual database instances)

Details of Operators

ClickHouse Operator

The ClickHouse Operator exposes metrics directly from ClickHouse pods. It integrates with Prometheus Operator using Kubernetes service discovery and supports Grafana for visualization.

  • Metrics exposure: Built-in metrics
  • Monitoring: Prometheus Operator; config template
  • Dashboards: Setup using Grafana Operator

Milvus Operator

The Milvus Operator exposes metrics from each Milvus component. It integrates with Prometheus Operator using ServiceMonitor CR for component discovery.

  • Metrics exposure: Built-in metrics
  • Monitoring: Prometheus Operator using ServiceMonitor CR; docs
  • Dashboards: Visualize metrics using Grafana

Kafka Operator (Strimzi)

The Strimzi Kafka Operator exports metrics via JMX Exporter. It uses ServiceMonitor CR for Prometheus discovery and provides example Grafana dashboards.

  • Metrics exposure: JMX Exporter (Java agent)
  • Monitoring: Prometheus Operator using ServiceMonitor CR; docs
  • Dashboards: Example of Grafana dashboards

Redis Operator

The Redis Operator by Opstree Solutions uses a sidecar exporter. It integrates with Prometheus Operator via PodMonitor CR. Metrics can be visualized in Grafana.

CloudNativePG Operator

CloudNativePG exposes metrics from each PostgreSQL instance. It works with Prometheus Operator using PodMonitor CR.

  • Metrics exposure: Built-in
  • Monitoring: Prometheus Operator + PodMonitor; docs
  • Dashboards: Setup Grafana dashboard to monitor CloudNativePG

TiDB Operator

The TiDB Operator exposes metrics from each component. It supports both Prometheus Operator and VictoriaMetrics Operator for flexible monitoring backend selection.

  • Metrics exposure: Built-in
  • Monitoring: Prometheus Operator or VictoriaMetrics via custom resources; docs
  • Dashboards: TiDB Dashboard and Grafana

Best Practices

When implementing monitoring for operators, consider these best practices:

  1. Enable Service Discovery: Automatic endpoint discovery reduces manual configuration
  2. Deploy Grafana Dashboards: Pre-built dashboards provide immediate visibility

Conclusion

Kubernetes operators for data workloads have converged on Prometheus as the standard for metrics collection, with many providing native integration through Prometheus Operator. The use of service discovery, pre-built exporters, and Grafana dashboards makes it easy to achieve comprehensive observability for data workloads running in Kubernetes.

By understanding the monitoring capabilities of each operator, you can make informed decisions about which solution best fits your observability requirements and existing monitoring infrastructure.