Monitoring of Data Workload Operators

Introduction

Monitoring is a critical aspect of running data workloads in Kubernetes. As we develop the plugin ecosystem for OpenEverest, we are currently researching how various operators handle monitoring to ensure our integrations follow industry best practices. Different operators have adopted various approaches to expose metrics and integrate with monitoring stacks. This blog post explores how some operators implement monitoring and observability for their respective data workloads. We focus specifically on metrics collection and monitoring integration, while distributed tracing may be explored in a future post.

Monitoring Integration Patterns

Many Kubernetes data workload operators follow similar patterns for monitoring:

Metrics Exporters: Dedicated containers or sidecars that expose metrics
Prometheus Integration: The de facto standard for metrics collection in Kubernetes
Service Discovery: Automatic discovery of monitoring endpoints using Kubernetes service discovery
Grafana Dashboards: Pre-built dashboards for visualizing metrics

Operator Comparison

The following table summarizes monitoring capabilities across different operators:

Operator	Metrics Exposure	Monitoring	Dashboard
ClickHouse Operator	Built-in	Prometheus	Grafana
Milvus Operator	Built-in	Prometheus	Grafana
Kafka Operator (Strimzi)	JMX Exporter	Prometheus	Grafana
Redis Operator	Redis Exporter (sidecar)	Prometheus	Grafana
CloudNativePG Operator	Built-in	Prometheus	Grafana
TiDB Operator	Built-in	Prometheus / VictoriaMetrics	Grafana + Custom

Understanding Prometheus Operator Custom Resources

The Prometheus Operator introduces Custom Resources (CRs) that simplify the configuration of Prometheus monitoring in Kubernetes. Two key resources are ServiceMonitor and PodMonitor. Both CRs provide automatic service discovery, eliminating the need to manually update Prometheus configuration files when pods or services are added or removed.

ServiceMonitor

ServiceMonitor is a CR that declaratively specifies how groups of Kubernetes services should be monitored. Instead of manually configuring Prometheus scrape targets, you define a ServiceMonitor that references services using label selectors.

ServiceMonitor is ideal when:

Metrics are exposed via Kubernetes Services
You want to monitor all pods behind a service uniformly

PodMonitor

PodMonitor is similar to ServiceMonitor but directly targets pods instead of services. This is useful when you need to scrape metrics from pods that don’t have a corresponding service, or when you need more granular control over individual pod monitoring.

PodMonitor is ideal when:

Pods expose metrics without going through a service
Metrics endpoints are pod-specific (e.g., individual database instances)

Details of Operators

ClickHouse Operator

The ClickHouse Operator exposes metrics directly from ClickHouse pods. It integrates with Prometheus Operator using Kubernetes service discovery and supports Grafana for visualization.

Metrics exposure: Built-in metrics
Monitoring: Prometheus Operator; config template
Dashboards: Setup using Grafana Operator

Milvus Operator

The Milvus Operator exposes metrics from each Milvus component. It integrates with Prometheus Operator using ServiceMonitor CR for component discovery.

Metrics exposure: Built-in metrics
Monitoring: Prometheus Operator using ServiceMonitor CR; docs
Dashboards: Visualize metrics using Grafana

Kafka Operator (Strimzi)

The Strimzi Kafka Operator exports metrics via JMX Exporter. It uses ServiceMonitor CR for Prometheus discovery and provides example Grafana dashboards.

Metrics exposure: JMX Exporter (Java agent)
Monitoring: Prometheus Operator using ServiceMonitor CR; docs
Dashboards: Example of Grafana dashboards

Redis Operator

The Redis Operator by Opstree Solutions uses a sidecar exporter. It integrates with Prometheus Operator via PodMonitor CR. Metrics can be visualized in Grafana.

Metrics exposure: Redis Exporter sidecar
Monitoring: Prometheus Operator + PodMonitor; docs
Dashboards: Grafana dashboards

CloudNativePG Operator

CloudNativePG exposes metrics from each PostgreSQL instance. It works with Prometheus Operator using PodMonitor CR.

Metrics exposure: Built-in
Monitoring: Prometheus Operator + PodMonitor; docs
Dashboards: Setup Grafana dashboard to monitor CloudNativePG

TiDB Operator

The TiDB Operator exposes metrics from each component. It supports both Prometheus Operator and VictoriaMetrics Operator for flexible monitoring backend selection.

Metrics exposure: Built-in
Monitoring: Prometheus Operator or VictoriaMetrics via custom resources; docs
Dashboards: TiDB Dashboard and Grafana

Best Practices

When implementing monitoring for operators, consider these best practices:

Enable Service Discovery: Automatic endpoint discovery reduces manual configuration
Deploy Grafana Dashboards: Pre-built dashboards provide immediate visibility

Conclusion

Kubernetes operators for data workloads have converged on Prometheus as the standard for metrics collection, with many providing native integration through Prometheus Operator. The use of service discovery, pre-built exporters, and Grafana dashboards makes it easy to achieve comprehensive observability for data workloads running in Kubernetes.

By understanding the monitoring capabilities of each operator, you can make informed decisions about which solution best fits your observability requirements and existing monitoring infrastructure.