Skip to content

[LiteLLM] Gemma models via Ollama enter infinite tool-calling loop due to wrong tool message role #5650

@jfrometa-tlsi

Description

@jfrometa-tlsi

When using Gemma models (2B/4B) via Ollama through the LiteLLM adapter, agents with tools enter an infinite tool-calling loop and never produce a final response.

Root cause: _content_to_message_param serializes tool result messages with role="tool" (OpenAI-compatible default), but Gemma's chat template expects role="tool_responses" (according documentation: https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4).

This mismatch causes the model to misinterpret the tool result as a new turn instead of a response to its own tool call.

This is not a hardware or quantization issue — the same behavior occurs on high-end GPUs.

Metadata

Metadata

Assignees

Labels

models[Component] Issues related to model supportrequest clarification[Status] The maintainer need clarification or more information from the authortools[Component] This issue is related to tools

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions