Apache Druid On Kubernetes

Introduction

Let me start off with a quick personal introduction and then move to the topic at hand. I have been a core contributor to the Druid operator since it was open sourced by Himanshu Gupta (Author and Apache Druid PMC). I have also been a committer to various Kubernetes operators, plugins. I have deployed Apache druid on Kubernetes at scale, for various companies. With lots of learnings in this space, I thought it might be useful for the community that I pen down my experiences.

My core focus of writing this series is to evangelise the correct way to run Apache Druid on Kubernetes, and bust myths around the Druid operator, helm charts and autoscaling in Druid.

All the current observations mentioned down have been battle tested at Rill Data where I was working as a contractor. The Druid Operator was nurtured and built for scale at Rill. Several other companies like Apple, Splunk and Confluent have been using the Druid operator at scale as well.

Helm vs Operator for Apache Druid?

I would like to start off with a basic software principal, i.e. separation of concerns. Helm is a configuration management tool specifically designed for templating Kubernetes manifests. The responsibility of helm is to template out the Kubernetes manifests and apply them. Thats it. Post helm upgrade --install of the chart, helm's job is done, now its the responsibility of the underlying controllers to reconcile the state and get the desired state up and running.

When Kubernetes orchestrates an application it sees it as a pod in the most basic state, it's not aware of any application specific logic, it does not know if its Kafka pod or Druid pod. Big Data applications were not designed to run on Kubernetes, most of them use JVM and are legacy in design. Jamming of legacy code bases into containers can still run your data pipelines on Kubernetes but the user experience and management isn't seamless.

Thats where the operator pattern comes in. CRD i.e. custom resource definitions allows us to build custom Kubernetes group, version and kind. operator are merely controllers watching specific CR's and reconciling its state. these custom controllers have all the application knowledge build into them and act as a bridge between your application and Kubernetes.

Druid operator has all the logic built in to install, upgrade and manage the state of the druid nodes. A single druid operator can run and manage multiple Apache druid clusters on Kubernetes. Operator can handle the underlying network and storage objects need for druid to run. It has amazing features built in to understand the state of druid nodes.

Use Helm with the Operator

In the ideal scenario don't choose between helm or operator. use both. use helm to template out customer resources and use its functionality such as rollbacks ,versioning and let operator reconcile the state. helm shall maintain each applied version and performs a three way merge. ( desired state , original state, modified state ). Druid operator can also be deployed using helm.

Scoping the Druid operator

Druid Operator is a custom controller, custom controller are responsible reconciliation of custom resources. Custom Resources are namespace scoped, this is defined in the custom resource definition. The druid operator can be scoped at the cluster level or namespace level.

Watch and Reconcile ALL Namespace
Watch and Reconcile SINGLE Namespace
Watch and Reconcile ALL Namespaces but DENY certain Namespaces
Watch and Reconcile Multiple Namespaces

The DENY LIST implementation is a predicates which filters out namespaces from enqueuing them in the worker queue.

All these configurations can be configured using env variables in the deployment file or helm values.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: druid-operator
spec:
  replicas: 1
  selector:
    matchLabels:
      name: druid-operator
  template:
    metadata:
      labels:
        name: druid-operator
    spec:
      serviceAccountName: druid-operator
      containers:
        - name: druid-operator
          # Replace this with the built image name
          image: druidio/druid-operator:latest
          command:
          - /manager
          imagePullPolicy: Always
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8081
            initialDelaySeconds: 15
            periodSeconds: 20
          readinessProbe:
            httpGet:
              path: /readyz
              port: 8081
            initialDelaySeconds: 5
            periodSeconds: 10
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          # Watch multiple namespaces
             - name: WATCH_NAMESPACE
               value: "namespaceone, namespacetwo"
          # Following namespaces will not be reconciled by operator, regardless of scope        
          #  - name: DENY_LIST
          #    value: kube-system, default
          # Default Reconcile time is set to 10s
          #  - name: RECONCILE_WAIT
          #    value: 30s
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: OPERATOR_NAME
              value: "druid-operator"

Once the druid operator is deployed and running, the next step is deploy the druid custom resource.

In the next blog i'll cover the details about the Druid custom resource (CR) design and various implementations.