Hello!
We have DOKS cluster with 2 node. One node has many cronjobs. Now the pods in that node stuck in pending state. No more disk space in /var/lib/containerd.
We had a similar problem in January (in 2 clusters at the same time). They recreated node. Installed doks-debug. Now we have the same problem again adn again. The support says: Increase Cluster Resources or Enable Autoscaling, Adjust CronJob Frequency, etc…
Maybe not cleaning the old containers and logs automatically? Garbage collection is not enabled? This is a PaaS system.
This cluster running in ams3, 1.30.2-do.0. (We can update it, but will it help?)
What can we do?
Thanks for helping!
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
Hey there,
Yep, as you mentioned, since this is DigitalOcean Managed Kubernetes Service, you don’t have direct access to node-level garbage collection, so the best approach is to optimize what you can control and work with DigitalOcean support for the rest.
If your CronJobs are filling up storage, what you could do is indeed tweak their history retention. For example, you can lower successfulJobsHistoryLimit
and failedJobsHistoryLimit
to keep fewer old jobs. Setting concurrencyPolicy: Forbid
will also prevent overlapping runs, which can reduce unnecessary storage usage.
Another thing to check is completed and failed pods. You might want to clean them up manually. If I am not mistaken, the kubectl delete pods --field-selector=status.phase=Succeeded
and kubectl delete pods --field-selector=status.phase=Failed
commands should help with that.
As you mentioned, since you can’t manually manage node storage directly, scaling up might be necessary. As the support team mentioned, increasing the node size, enabling autoscaling, or using multiple node pools to separate CronJobs from other workloads can help distribute storage usage more efficiently.
It’s possible that DOKS’s default garbage collection settings aren’t aggressive enough for your workload. If that’s the case, reducing job history retention, spreading CronJobs across nodes, and using smaller container images are good ways to work around it or continue working with DigitalOcean support to further investigate this and see if this is actually the case.
Upgrading to a newer Kubernetes version is also worth considering. I believe that Kubernetes 1.30.x will likely be deprecated soon, and newer versions might handle this better.
If this keeps happening even after these changes, it’s best to keep the communication with the DigitalOcean support team open and see if they can provide more insights or help you with this issue.
- Bobby.
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.