Question

Kubernetes (DOKS) node run out from disk space in /var/lib/containerd

Hello!

We have DOKS cluster with 2 node. One node has many cronjobs. Now the pods in that node stuck in pending state. No more disk space in /var/lib/containerd.

We had a similar problem in January (in 2 clusters at the same time). They recreated node. Installed doks-debug. Now we have the same problem again adn again. The support says: Increase Cluster Resources or Enable Autoscaling, Adjust CronJob Frequency, etc…

Maybe not cleaning the old containers and logs automatically? Garbage collection is not enabled? This is a PaaS system.

This cluster running in ams3, 1.30.2-do.0. (We can update it, but will it help?)

What can we do?

Thanks for helping!


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Bobby Iliev
Site Moderator
Site Moderator badge
February 9, 2025

Hey there,

Yep, as you mentioned, since this is DigitalOcean Managed Kubernetes Service, you don’t have direct access to node-level garbage collection, so the best approach is to optimize what you can control and work with DigitalOcean support for the rest.

If your CronJobs are filling up storage, what you could do is indeed tweak their history retention. For example, you can lower successfulJobsHistoryLimit and failedJobsHistoryLimit to keep fewer old jobs. Setting concurrencyPolicy: Forbid will also prevent overlapping runs, which can reduce unnecessary storage usage.

Another thing to check is completed and failed pods. You might want to clean them up manually. If I am not mistaken, the kubectl delete pods --field-selector=status.phase=Succeeded and kubectl delete pods --field-selector=status.phase=Failed commands should help with that.

As you mentioned, since you can’t manually manage node storage directly, scaling up might be necessary. As the support team mentioned, increasing the node size, enabling autoscaling, or using multiple node pools to separate CronJobs from other workloads can help distribute storage usage more efficiently.

It’s possible that DOKS’s default garbage collection settings aren’t aggressive enough for your workload. If that’s the case, reducing job history retention, spreading CronJobs across nodes, and using smaller container images are good ways to work around it or continue working with DigitalOcean support to further investigate this and see if this is actually the case.

Upgrading to a newer Kubernetes version is also worth considering. I believe that Kubernetes 1.30.x will likely be deprecated soon, and newer versions might handle this better.

If this keeps happening even after these changes, it’s best to keep the communication with the DigitalOcean support team open and see if they can provide more insights or help you with this issue.

- Bobby.

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.