DomainComponentUnreadyReplicas

Meaning

Domain component has unready replicas.

Full context

Domain resource has replicas which were declared to be unready. Domain component impacted by this alert is NuoDB Admin Process (AP). For example, it is expected for a domain to have 3 AP replicas, but it has less than that for a noticeable period of time.

Symptom

To manually evaluate the conditions for this alert follow the steps below.

A domain, which has a component with unready replicas, will have the Ready status condition set to False. List all unready domains.

JSONPATH='{range .items[*]}{@.metadata.name}:{range @.status.conditions[?(@.type=="Ready")]}{@.type}={@.status}{"\n"}{end}{end}'
kubectl get domain -o jsonpath="$JSONPATH" | grep "Ready=False"

Inspect the domain component status and compare the replicas and readyReplicas fields.

kubectl get domain <name> -o jsonpath='{.status.components.admins}' | jq

Impact

Service degradation or unavailability.

The NuoDB domain is fault-tolerant and remains available even if a certain number of APs are down. If half of the APs go down unexpectedly, this impacts the ability to commit Raft commands such as performing domain configuration changes and starting database processes.

The APs perform load-balancing for SQL connections to Transaction Engines (TEs) which are not in UNKNOWN state. Unavailable APs might impact obtaining new SQL connections for all databases in the domain.

Note

For more information on NuoDB Admin quorum, see Admin Process (AP) Quorum and Admin Scale-down with Kubernetes Aware Admin.

Diagnosis

Check the domain state using kubectl describe domain <name>.
Check the domain component state and message.
Check how many replicas are declared for this component.
List and check the status of all pods associated with the domain’s Helm release.
Check if there are issues with provisioning or attaching disks to pods
Check if the cluster-autoscaler is able to create new nodes.
Check pod logs and identify issues during AP startup
Check the NuoDB process state. Kubernetes readiness probes require that the APs are in Connected state and caught up with the Raft leader.

Scenarios

Scenario 1: Pod in Pending status for a long time

Possible causes for a Pod not being scheduled:

A container on the Pod requests a resource not available in the cluster
The Pod has affinity rules that do not match any available worker node
One of the containers mounts a volume provisioned in the availability zone (AZ) where no Kubernetes worker is available
A Persistent volume claim (PVC) created for this Pod has a storage class that may be misconfigured or unusable

Scenario 2: AP fails to join the domain

Upon startup, the AP communicates with its peers to join the domain and receives the domain state from the Raft leader. For more information, check Admin Process Peering.

Possible causes for unsuccessful startup during this phase are:

Network issues prevent communication between the AP and its peers
Incorrect initial domain membership or peer configuration

Example

Get the domain name and its namespace from the alert’s labels. Inspect the domain state in the Kubernetes cluster.

kubectl get domain acme-messaging -n nuodb-cp-system

Notice that the READY status condition is False which means that the domain is in a degraded state.

NAME             TIER       VERSION   READY   SYNCED   DISABLED   AGE
acme-messaging   n0.small   6.0.2     False   True     False      46h

Inspect the domain components state.

kubectl get domain acme-messaging -o jsonpath='{.status.components}' | jq

The output below indicates issues with scheduling acme-messaging-fc4bwd8-2 Pod because the acme-messaging-fc4bwd8-2-eph-volume volume is not provisioned by the persistent volume controller. The mismatch between replicas and readyReplicas for this component triggers this alert.

{
  "admins": [
 {
      "kind": "StatefulSet",
      "message": "pod/acme-messaging-fc4bwd8-2: 0/1 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim \"acme-messaging-fc4bwd8-2-eph-volume\". preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.",
      "name": "acme-messaging-fc4bwd8",
      "readyReplicas": 2,
      "replicas": 3,
      "state": "NotReady",
      "version": "v1"
 }
 ],
  "lastUpdateTime": "2025-06-06T14:14:57Z"
}

If needed, drill down to the Pod and PVC resources associated with the domain by using the below command.

RELEASE_NAME=$(kubectl get domain acme-messaging -o jsonpath='{.spec.template.releaseName}')
kubectl get pods,pvc -l release=$RELEASE_NAME

Obtain NuoDB domain state by running nuocmd show domain inside any NuoDB pod that has Running status.

ADMIN_POD=$(kubectl get pod \
 -l release=${RELEASE_NAME},component=admin \
  --field-selector=status.phase==Running \
 -o jsonpath='{.items[0].metadata.name}')

kubectl exec -ti $ADMIN_POD -- nuocmd show domain

DatabaseUnavailable

DomainUnavailable

Docs

NuoDB Control Plane

Title here

DomainComponentUnreadyReplicas

Meaning

Symptom

Impact

Diagnosis

Scenarios

Example

DomainComponentUnreadyReplicas

Meaning#

Symptom#

Impact#

Diagnosis#

Scenarios#

Example#

Meaning

Symptom

Impact

Diagnosis

Scenarios

Example