This section captures steps to troubleshoot and resolve some errors faced while using OpenEBS Persistent Volumes (PVs). The procedures and commands used in this document are mostly generic and are applicable on any common Linux platform/Kubernetes environment.
Application pod is stuck in ContainerCreating state after deployment
Troubleshooting the issue and Workaround:
Obtain the output of the
kubectl describe pod <application_pod>and check the events.
If the error message executable not found in $PATH is found, check whether the iSCSI initiator utils are installed on the node/kubelet container (rancherOS, coreOS). If not, install the same and retry deployment.
If the warning message FailedMount: Unable to mount volumes for pod <>: timeout expired waiting for volumes to attach/mount is persisting use the following procedure.
Check whether the Persistent Volume Claim/Persistent Volume (PVC/PV) are created successfully and the OpenEBS controller and replica pods are running. These can be verified using the
kubectl get pvc,pvand
kubectl get podscommand.
If the OpenEBS volume pods are not created, and the PVC is in pending state, check whether the storageclass referenced by the application PVC is available/installed. This can be confirmed using the
kubectl get sccommand. If this storageclass is not created, or improperly created without the appropriate attributes, recreate the same and re-deploy the application.
Note: Ensure that the older PVC objects are deleted before re-deployment.
If the PV is created (in bound state), but replicas are not running or are in pending state, perform a
kubectl describe <replica_pod>and check the events. If the events indicate FailedScheduling due to Insufficient cpu, NodeUnschedulable or MatchInterPodAffinity and PodToleratesNodeTaints, check the following:
- replica count is equal to or lesser than available schedulable nodes
- there are enough resources on the nodes to run the replica pods
- whether nodes are tainted and if so, whether they are tolerated by the OpeneEBS replica pods
Ensure that the above conditions are met and the replica rollout is successful. This will ensure application enters running state.
If the PV is created and OpenEBS pods are running, use the
iscsiadm -msession command on the node (where the pod is scheduled) to identify whether the OpenEBS iSCSI volume has been attached/logged-into. If not, verify network connectivity between the nodes.
If the session is present, identify the SCSI device associated with the session using the command
iscsiadm -m session -P 3. Once it is confirmed that the iSCSI device is available (check the output of
fdisk -lfor the mapped SCSI device), check the kubelet and system logs including the iscsid and kernel (syslog) for information on the state of this iSCSI device. If inconsistencies are observed, execute the filesyscheck on the device
fsck -y /dev/sd<>. This will mount the volume to the node.
- In OpenShift deployments, you can face this issue with the OpenEBS replica pods continuously restarting, that is, they are in crashLoopBackOff state. This is due to the default "restricted" security context settings. Edit the following settings using
oc edit scc restrictedto get the application pod running.
- allowHostDirVolumePlugin: true
- runAsUser: runAsAny
Application pod enters CrashLoopBackOff state
This issue is due to failed application operations in the container. Typically this is caused due to failed writes on the mounted PV. To confirm this, check the status of the PV mount inside the application pod.
Troubleshooting the issue:
- Perform a
kubectl exec -it <app>bash (or any available shell) on the application pod and attempt writes on the volume mount. The volume mount can be obtained either from the application specification ("volumeMounts" in container spec) or by performing a
df -hcommand in the controller shell (the OpenEBS iSCSI device will be mapped to the volume mount).
- The writes can be a attempted using a simple command like
echo abc > t.outon the mount. If the writes fail with Read-only file system errors, it means the iSCSI connections to the OpenEBS volumes are lost. You can confirm by checking the node's system logs including iscsid, kernel (syslog) and the kubectl logs (
journalctl -xe, kubelet.log).
- iSCSI connections usually fail due to the following.
- flaky networks (can be confirmed by ping RTTs, packet loss etc.) or failed networks between -
- OpenEBS PV controller and replica pods
- Application and controller pods
- Node failures
- OpenEBS volume replica crashes or restarts due to software bugs
- flaky networks (can be confirmed by ping RTTs, packet loss etc.) or failed networks between -
In all the above cases, loss of the device for a period greater than the node iSCSI initiator timeout causes the volumes to be re-mounted as RO.
In certain cases, the node/replica loss can lead to the replica quorum not being met (i.e., less than 51% of replicas available) for an extended period of time, causing the OpenEBS volume to be presented as a RO device.
The procedure to ensure application recovery in the above cases is as follows:
Resolve the system issues which caused the iSCSI disruption/RO device condition. Depending on the cause, the resolution steps may include recovering the failed nodes, ensuring replicas are brought back on the same nodes as earlier, fixing the network problems and so on.
Ensure that the OpenEBS volume controller and replica pods are running successfully with all replicas in RW mode. Use the command
curl GET http://<ctrl ip>:9501/v1/replicas | grep createTypesto confirm.
If any one of the replicas are still in RO mode, wait for the synchronization to complete. If all the replicas are in RO mode (this may occur when all replicas re-register into the controller within short intervals), you must restart the OpenEBS volume controller using the
kubectl delete pod <pvc-ctrl>command . Since it is a Kubernetes deployment, the controller pod is restarted successfully. Once done, verify that all replicas transition into RW mode.
Un-mount the stale iscsi device mounts on the application node. Typically, these devices are mounted in the
Identify whether the iSCSI session is re-established after failure. This can be verified using
iscsiadm -m session, with the device mapping established using
iscsiadm -m session -P 3and
fdisk -l. Note: Sometimes, it has been observed that there are stale device nodes (scsi device names) present on the Kubernetes node. Unless the logs confirm that a re-login has occurred once the system issues were resolved, it is recommended to perform the following step after doing a purge/logout of the existing session using
iscsiadm -m node -T <iqn> -u.
If the device is not logged in again, ensure that the network issues/failed nodes/failed replicas are resolved, device is discovered, and session is re-established. This can be achieved using the commands
iscsiadm -m discovery -t st -p <ctrl svc IP>:3260and
iscsiadm -m node -T <iqn> -lrespectively.
Identify the new SCSI device name corresponding to the iSCSI session (the device name may or may not be the same as before).
Re-mount the new disk into the mountpoint mentioned earlier using the
mount -o rw,relatime,data=ordered /dev/sd<> <mountpoint>command. If the re-mount fails due to inconsistencies on the device (unclean filesystem), perform a filesyscheck
fsck -y /dev/sd<>.
Ensure that the application uses the newly mounted disk by forcing it to restart on the same node. Use the command
docker stop <id>of the application container on the node. Kubernetes will automatically restart the pod to ensure the "desirable" state.
While this step may not be necessary most times (as the application is already undergoing periodic restarts as part of the CrashLoop cycle), it can be performed if the application pod's next restart is scheduled with an exponential back-off delay.
The above procedure works for applications that are either pods or deployments/statefulsets. In case of the latter, the application pod can be restarted (i.e., deleted) after step-4 (iscsi logout) as the deployment/statefulset controller will take care of rescheduling the application on a same/different node with the volume.
In environments where the kubelet runs in a container, perform the following steps as part of the recovery procedure for a Volume-Read only issue.
- Confirm that the OpenEBS target does not exist as a Read Only device by the OpenEBS controller and that all replicas are in Read/Write mode.
- Un-mount the iSCSI volume from the node in which the application pod is scheduled.
- Perform the following iSCSI operations from inside the kubelet container.
- Re-mount the iSCSI device (may appear with a new SCSI device name) on the node.
- Verify if the application pod is able to start using/writing into the newly mounted device.
- Once the application is back in "Running" state post recovery by following steps 1-9, if existing/older data is not visible (i.e., it comes up as a fresh instance), it is possible that the application pod is using the docker container filesystem instead of the actual PV (observed sometimes due to the reconciliation attempts by Kubernetes to get the pod to a desired state in the absence of the mounted iSCSI disk).
This can be checked by performing a
mountcommand inside the application pods. These commands should show the scsi device
/dev/sd*mounted on the specified mount point. If not, the application pod can be forced to use the PV by restarting it (deployment/statefulset) or performing a docker stop of the application container on the node (pod).
Stale data seen post application pod reschedule on other nodes
- Sometimes, stale application data is seen on the OpenEBS volume mounts after application pod reschedule. Typically, these applications are Kubernetes deployments, with the reschedule to other nodes occurring due to rolling updates.
- This occurs due to the iSCSI volume mounts and sessions staying alive/persisting on the nodes even after the pod terminates. This behavior is observed on some versions of GKE clusters (1.7.x).
- Ideally, the kubelet (iSCSI volume plugin) should bring down mounts and iscsi sessions once the application has been deleted on the node. If not, it can result in data being read off the node's page (mount) cache whenever the application is re-scheduled onto it, even though the volume is being updated while on a different node.
- Un-mount the device and logout from the existing iSCSI session on stale (non-owning) node.
- Re-login and re-mount the volume on the current/scheduled (owning) node.
- Ensure application pod uses the new mount by restarting it using docker stop.
Application and OpenEBS pods terminate/restart under heavy I/O load
This is caused due to lack of resources on the Kubernetes nodes, which causes the pods to evict under loaded conditions as the node becomes unresponsive. The pods transition from Running state to unknown state followed by Terminating before restarting again.
Troubleshooting the issue:
The above cause can be confirmed from the
kubectl describe pod which displays the termination reason as NodeControllerEviction. You can get more information from the kube-controller-manager.log on the Kubernetes master.
You can resolve this issue by upgrading the Kubernetes cluster infrastructure resources (Memory, CPU).
Delete did not re-claim the disk size
Deleting OpenEBS Persistent Volume and Persistent Volume Claim did not change the disk size of the node available
To reclaim space currently, you must perform a manual delete
rm -rf of the files in /var/openebs (or whichever path the storage pool is created on). For more information, see this.
Recover data from Jiva replica
This document contains notes on how to recover data from a backed up replica files. OpenEBS Jiva volumes save the data in /var/openebs/pvc-id/. All the replicas contain identical data. Before performing a cluster re-build, it suffices to have a backup of data from one of the replica's /var/openebs/pvc-id/ path.
The following procedure helps recovering data in the scenario where replicas get scheduled on nodes where data does not exist.
Step 1: Run a sample application that generates some data.
In the following example, executing the busybox.yaml file brings up the busybox pod that saves hostname and date into the mounted OpenEBS volume.
kubectl apply -f https://raw.githubusercontent.com/kmova/bootstrap/master/gke-openebs/jiva-recovery/busybox.yaml
Wait for the busybox application to run and exec into it to check if data is generated. You can add some additional content if required.
Note the following details:
- PV id
kubectl get pv(say source-pv-id ).
- nodes on which PV replica pods are running
kubectl get pods -o wide | grep source-pv-id(say replica-hostname).
Delete the busybox pod using the
kubectl delete -f https://raw.githubusercontent.com/kmova/bootstrap/master/gke-openebs/jiva-recovery/busybox.yaml command. Note that the data folders will remain on the nodes even though the pod and PVs are deleted.
Step 2: Setup a Recovery PVC
Deploy a Recovery PVC with a single replica using the following command.
kubectl apply -f https://raw.githubusercontent.com/kmova/bootstrap/master/gke-openebs/jiva-recovery/recovery-pvc.yaml
Note the following details:
- PV id
kubectl get pv(say recovery-pv-id ).
- PV replica deployment name
kubectl get deploy | grep recovery-pv-id(say recovery-replica-deploy ).
- Recovery PVC namespace, if you have changed it to something other than default. (say recovery-replica-ns).
Step 3: Patch the Recovery PV Replica to stick to the replica-hostname
Replace replica hostname in patch-replica-dep-nodename.json with the replica-hostname that was obtained in Step 1. It is the node where source/backed up replica data is available. If the backup data is available on a remote machine, you can set the hostname to the current node where Replica is running.
wget https://raw.githubusercontent.com/kmova/bootstrap/master/gke-openebs/jiva-recovery/patch-replica-dep-nodename.json kubectl patch deploy -n <replica-replica-ns> <recovery-replica-deploy> -p "$(cat patch-replica-dep-nodename.json)"
After the patch is applied, you will notice that the replica pod is restarted on the hostname specified. Since this replica deployment is patched, you will see an orphaned replica set
kubectl get rs. You can go ahead and delete it.
Step 4: Copy the backup data into Recovery Replica
Execute the following commands.
a. ssh into the node (_replica-hostname_) b. cd /var/openebs/recovery-pv-id/ (/var/openebs if you are using default pool.) c. sudo rm -rf * d. copy contents from earlier volume (/var/openebs/source-pv-id or from remote server) into /var/openebs/recovery-pv-id/ e. You will see *peer.details*, *revision.counter*, *volume.meta* and a bunch of *.img* and *.meta* files. f. edit peer.details to set ReplicaCount=1 g. exit
Step 5: Restart the Volume Pods
kubectl delete replica-pod. Note that it gets rescheduled on the same node (replica-hostname).
kubectl delete controller-pod. Wait for these pods to get back to running state.
Step 6: Use the recovery volume to retrieve the data.
You can either launch the source application or a recovery application that now makes use of this recovery volume. In this example, using the busybox-recovery pod displays the file content which is the same as the one generated by the source application..
kubectl apply -f https://raw.githubusercontent.com/kmova/bootstrap/master/gke-openebs/jiva-recovery/busybox-recover.yaml
You can also exec into this application to check the content, retrieve the files, or use the application to check the content.