Question

What exactly does the Max Tokens setting on an Agent do?

The Max Tokens setting says: “The maximum number of tokens the model will intake to send to the LLM, and then generate to return a response.”

Is this the users query only or is the users query + context that will be sent as input tokens?

It doesn’t appear to actually work. I tried a 2500 token query with a 2048 Max Token limit. The Agent accepted and the message info says:

Prompt tokens: 20449

Response tokens: 1580

Total tokens: 22029

I assume I will get billed for 22,029 tokens. What is the point of a “Max Tokens” setting if it doesn’t actually work?

No matter what I set this to and no matter how short the user query is, Input Tokens are usually around 8,000-10,000 minimum. I understand that this is because a large number of Input Tokens have to be sent to the LLM for the RAG to work and to generate coherent answers. What I don’t understand is what “Max Tokens” actually limits, if it limits anything at all. Is it possible it’s not working? I would expect it to either limit the size of the user query or the total input tokens. If it’s limiting only the size of the user query, I would expect an error message "Please send a new query <=XXXX tokens (or characters).


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Bobby Iliev
Site Moderator
Site Moderator badge
April 28, 2025

Hi there,

Good question! From what I understand, the Max Tokens setting controls how much the model is allowed to generate in the response, not how large the input (user query + context) can be.

As far as I know, there’s no built-in limit on input size enforced by the Max Tokens setting, it mainly caps the response length.

If you want strict limits on total input tokens, that might need to be handled separately at the app level. Could also be worth flagging this to DigitalOcean Support just to double-check if anything’s changed:

https://do.co/support

- Bobby

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.