What Do You See in this Image?


Gemma is a lightweight, family of models from Google built on Gemini technology. The Gemma 3 models are multimodalāprocessing text and imagesāand feature a 128K context window with support for over 140 languages. Available in 1B, 4B, 12B, and 27B parameter sizes, they excel in tasks like question answering, summarization, and reasoning, while their compact design allows deployment on resource-limited devices.
ollama.com
Testing Gemma3 27b
The simplest way to test Gemma 3 locally is with either Ollama or LMStudio. I like to use Open WebUI with Ollama but for this example I used LMStudio. To test the model I used the above image and a simple initial prompt then a follow up prompt.
Gemma 3 Vision Prompts
Question One
Explain everything you see how many people are in the photo? How many are Male and how many are Female? What is the approximate time of day? What season is it? Explain your reasoning?

Follow Up:
Given the information can you give a more precise location?

This is just a quick initial test but so far I am very impressed with the vision capabilities of the Gemma 3 model.
Leave a Reply