What exactly does the Max Tokens setting on an Agent do?

Posted on April 28, 2025

The Max Tokens setting says: “The maximum number of tokens the model will intake to send to the LLM, and then generate to return a response.”

Is this the users query only or is the users query + context that will be sent as input tokens?

It doesn’t appear to actually work. I tried a 2500 token query with a 2048 Max Token limit. The Agent accepted and the message info says:

Prompt tokens: 20449

Response tokens: 1580

Total tokens: 22029

I assume I will get billed for 22,029 tokens. What is the point of a “Max Tokens” setting if it doesn’t actually work?

No matter what I set this to and no matter how short the user query is, Input Tokens are usually around 8,000-10,000 minimum. I understand that this is because a large number of Input Tokens have to be sent to the LLM for the RAG to work and to generate coherent answers. What I don’t understand is what “Max Tokens” actually limits, if it limits anything at all. Is it possible it’s not working? I would expect it to either limit the size of the user query or the total input tokens. If it’s limiting only the size of the user query, I would expect an error message "Please send a new query <=XXXX tokens (or characters).

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Bobby

April 28, 2025

Hi there,

Good question! From what I understand, the Max Tokens setting controls how much the model is allowed to generate in the response, not how large the input (user query + context) can be.

As far as I know, there’s no built-in limit on input size enforced by the Max Tokens setting, it mainly caps the response length.

If you want strict limits on total input tokens, that might need to be handled separately at the app level. Could also be worth flagging this to DigitalOcean Support just to double-check if anything’s changed: