-
Notifications
You must be signed in to change notification settings - Fork 2k
feat: add gemma3 ollama model support #3120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds support for locally-running Gemma3 models exposed via Ollama. This will enable developers to run fully-local agent workflows, using the larger two Gemma3 models. Functionality is achieved by extending LiteLlm models and using a custom prompt template with some request/response pre/post processing. As part of this work, the existing Gemma 3 support (via Gemini API) is refactored to clarify functionality and support broader reuse across both modes of interacting with the LLM. A separate `hello_world_gemma3_ollama` example is provided to highlight local usage. NOTE: Adds an optional dependency on `instructor` for finding and parsing JSON in Gemma3 response blocks. Testing Test Plan - add and run integration and unit tests - manual run of both `hello_world_gemma` and `hellow_world_gemma3_ollama` agents - manual run of `multi_tool_agent` from quickstart using new `Gemma3Ollama` LLM. Automated Tests | Test Command | Results | |--------------|---------| | pytest ./tests/unittests | 2779 passed, 2387 warnings in 63.03s | | pytest ./tests/unittests/models/test_gemma_llm.py | 15 passed in 4.06s | | pytest ./tests/integration/models/test_gemma_llm.py | 1 passed in 33.22s | Manual Tests Log of running `multi_agent_tool` with a locally-built wheel: ``` [user]: what is the weather in new york? 15:12:24 - LiteLLM:INFO: utils.py:3373 - LiteLLM completion() model= gemma3:12b; provider = ollama 15:12:28 - LiteLLM:INFO: utils.py:3373 - LiteLLM completion() model= gemma3:12b; provider = ollama [weather_time_agent]: The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees Fahrenheit). [user]: what is the time in new york? 15:12:43 - LiteLLM:INFO: utils.py:3373 - LiteLLM completion() model= gemma3:12b; provider = ollama 15:12:48 - LiteLLM:INFO: utils.py:3373 - LiteLLM completion() model= gemma3:12b; provider = ollama [weather_time_agent]: The current time in New York is 2025-10-08 18:12:48 EDT-0400. ``` `DEBUG` log snippet of an agent run: ``` 2025-10-08 15:32:33,322 - DEBUG - lite_llm.py:810 - LLM Request: ----------------------------------------------------------- System Instruction: You roll dice and answer questions about the outcome of the dice rolls. ... You are an agent. Your internal name is "data_processing_agent". ... ----------------------------------------------------------- Contents: {"parts":[{"text":"Hi, introduce yourself."}],"role":"user"} {"parts":[{"text":"I am data_processing_agent, a hello world agent that can roll a dice of 8 sides and check prime numbers."}],"role":"model"} {"parts":[{"text":"Roll a die with 100 sides and check if it is prime"}],"role":"user"} {"parts":[{"text":"{\"args\":{\"sides\":100},\"name\":\"roll_die\"}"}],"role":"model"} {"parts":[{"text":"Invoking tool `roll_die` produced: `{\"result\": 26}`."}],"role":"user"} {"parts":[{"text":"{\"args\":{\"nums\":[26]},\"name\":\"check_prime\"}"}],"role":"model"} {"parts":[{"text":"Invoking tool `check_prime` produced: `{\"result\": \"No prime numbers found.\"}`."}],"role":"user"} {"parts":[{"text":"Okay, the roll was 26, and it is not a prime number."}],"role":"model"} {"parts":[{"text":"Roll it again."}],"role":"user"} {"parts":[{"text":"{\"args\":{\"sides\":100},\"name\":\"roll_die\"}"}],"role":"model"} {"parts":[{"text":"Invoking tool `roll_die` produced: `{\"result\": 69}`."}],"role":"user"} {"parts":[{"text":"{\"args\":{\"nums\":[69]},\"name\":\"check_prime\"}"}],"role":"model"} {"parts":[{"text":"Invoking tool `check_prime` produced: `{\"result\": \"No prime numbers found.\"}`."}],"role":"user"} {"parts":[{"text":"The roll was 69, and it is not a prime number."}],"role":"model"} {"parts":[{"text":"What numbers did I get?"}],"role":"user"} ----------------------------------------------------------- Functions: ----------------------------------------------------------- ```
Summary of ChangesHello @douglas-reid, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces support for Gemma3 models running locally via Ollama, refactors existing Gemma3 Gemini API integration, and adds a new example. Key changes include a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Response from ADK Triaging Agent Hello @douglas-reid, thank you for your contribution! To help us track this new feature, could you please create a GitHub issue and link it to this PR? According to our contribution guidelines, all new features should have an associated issue. Thanks for your help! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a great feature addition that enables local agent workflows with Gemma3 models via Ollama. The refactoring of the existing Gemma support to share logic between the Gemini API and Ollama integrations is well-executed and improves code maintainability.
I've identified a few areas for improvement, including a critical issue in the tests, better handling of an optional dependency, and some minor enhancements for clarity in the new example code. Please see my detailed comments below.
async def test_gemma_gemini_preprocess_request_with_tools( | ||
llm_request_with_tools, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @pytest.mark.asyncio
decorator is missing for this async test function. Without it, pytest
will not run this test correctly as a coroutine. This will likely result in a RuntimeWarning
and the test's assertions will not be properly executed, as the coroutine is never awaited.
async def test_gemma_gemini_preprocess_request_with_tools( | |
llm_request_with_tools, | |
): | |
@pytest.mark.asyncio | |
async def test_gemma_gemini_preprocess_request_with_tools( | |
llm_request_with_tools, | |
): |
try: | ||
import instructor | ||
|
||
json_candidate = instructor.utils.extract_json_from_codeblock(response_text) | ||
|
||
if not json_candidate: | ||
return | ||
|
||
function_call_parsed = Gemma3FunctionCallModel.model_validate_json( | ||
json_candidate | ||
) | ||
function_call = types.FunctionCall( | ||
name=function_call_parsed.name, | ||
args=function_call_parsed.parameters, | ||
) | ||
function_call_part = Part(function_call=function_call) | ||
llm_response.content.parts = [function_call_part] | ||
except (json.JSONDecodeError, ValidationError) as e: | ||
logger.debug( | ||
f"Error attempting to parse JSON into function call. Leaving as text" | ||
f" response. %s", | ||
e, | ||
) | ||
except Exception as e: | ||
logger.warning("Error processing Gemma function call response: %s", e) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation uses a broad except Exception
which can catch an ImportError
if the optional instructor
package isn't installed. This would result in a generic and potentially confusing log message for the user.
To provide a better user experience, it's best to explicitly handle ImportError
and provide a clear message explaining that the optional dependency is missing and how to install it.
try:
import instructor
except ImportError:
logger.warning(
"The 'instructor' package is required for Gemma3 function calling but is not installed. "
"Text response will be returned. To enable function calling, run: pip install \"google-adk[extensions]\""
)
return
try:
json_candidate = instructor.utils.extract_json_from_codeblock(response_text)
if not json_candidate:
return
function_call_parsed = Gemma3FunctionCallModel.model_validate_json(
json_candidate
)
function_call = types.FunctionCall(
name=function_call_parsed.name,
args=function_call_parsed.parameters,
)
function_call_part = Part(function_call=function_call)
llm_response.content.parts = [function_call_part]
except (json.JSONDecodeError, ValidationError) as e:
logger.debug(
f"Error attempting to parse JSON into function call. Leaving as text"
f" response. %s",
e,
)
except Exception as e:
logger.warning("Error processing Gemma function call response: %s", e)
You roll dice and answer questions about the outcome of the dice rolls. | ||
You can roll dice of different sizes. | ||
You can use multiple tools in parallel by calling functions in parallel(in one request and in one round). | ||
It is ok to discuss previous dice roles, and comment on the dice rolls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of typos in the instruction string that could affect model behavior or confuse readers.
parallel(in
should beparallel (in
dice roles
should bedice rolls
You roll dice and answer questions about the outcome of the dice rolls. | |
You can roll dice of different sizes. | |
You can use multiple tools in parallel by calling functions in parallel(in one request and in one round). | |
It is ok to discuss previous dice roles, and comment on the dice rolls. | |
You roll dice and answer questions about the outcome of the dice rolls. | |
You can roll dice of different sizes. | |
You can use multiple tools in parallel by calling functions in parallel (in one request and in one round). | |
It is ok to discuss previous dice rolls, and comment on the dice rolls. |
session_11 = await session_service.create_session( | ||
app_name=app_name, user_id=user_id_1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability and consistency with user_id_1
, consider renaming session_11
to session_1
.
This change should also be applied to its usages on lines 64, 66, 68, and 69.
session_11 = await session_service.create_session( | |
app_name=app_name, user_id=user_id_1 | |
) | |
session_1 = await session_service.create_session( | |
app_name=app_name, user_id=user_id_1 | |
) |
Adds support for locally-running Gemma3 models exposed via Ollama. This will enable developers to run fully-local agent workflows, using the larger two Gemma3 models.
Functionality is achieved by extending LiteLlm models and using a custom prompt template with some request/response pre/post processing.
As part of this work, the existing Gemma 3 support (via Gemini API) is refactored to clarify functionality and support broader reuse across both modes of interacting with the LLM.
A separate
hello_world_gemma3_ollama
example is provided to highlight local usage.NOTE: Adds an optional dependency on
instructor
for finding and parsing JSON in Gemma3 response blocks.Testing
Test Plan
hello_world_gemma
andhellow_world_gemma3_ollama
agentsmulti_tool_agent
from quickstart using newGemma3Ollama
LLM.Automated Tests
Manual Tests
Log of running
multi_agent_tool
with a locally-built wheel:DEBUG
log snippet of an agent run: