Gemma 3: Open Multimodal AI With Increased Context Window
Gemma 3: Open Multimodal AI With Increased Context Window
com/
Introduction
What is Gemma 3?
Model Variants
The models come in various sizes. These include sizes 1 billion (1B), 4
billion (4B), 12 billion (12B), and a solid 27 billion (27B) parameters.
These provide a range of abilities. These are designed for varying
hardware limitations and performance requirements. Gemma 3 models
are available in both base (pre-trained) and instruction-tuned. They are
suitable for a broad range of use cases. These applications vary from
fine-tuning for highly specialized tasks to being general-purpose
conversation agents. These agents can execute instructions well.
Gemma 3 has a powerful array of features that make it stand out and
enhance its functions:
Gemma 3 power also paves the way for a host of exciting future use
cases:
token ranges in focus and the global attention including the whole
context to enable fast long-sequence processing.
The language model maps these image embeddings into soft tokens,
employing varied attention mechanisms for text, one-way causal
attention, and images, which get the advantage of full bidirectional
attention so all parts of an image can be analyzed at once.
Performance Evaluation
One of the most important ways in which the abilities of Gemma 3 are
measured is by its showing in human preference tests, for example, as
reported on the LMSys Chatbot Arena, as illustrated in table below. In
this arena, various language models compete against each other in blind
side-by-side evaluations decided upon by human evaluators. Elo scores
are provided as a result, which act as a direct measure of user
preference for certain models. Gemma 3 27B IT has shown a very
competitive ranking compared to a variety of other well-known models,
source - https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
Apart from explicit human preference, Gemma 3's abilities are also
stringently tested on a range of standard educational metrics, as
illustrated in table below. These metrics are a wide-ranging set of
competencies, from language comprehension, code writing,
mathematical reasoning, to question answering. When comparing the
performance of Gemma 3 instruction-tuned (IT) models to earlier
versions of Gemma and Google's Gemini models, it is clear that the
newest generation performs well on these varied tasks. Where direct
numerical comparisons should be reserved for the fine-grained tables,
source - https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
In addition, the testing of Gemma 3 is also done on other vital areas like
handling long context, where metrics such as RULER and MRCR are
utilized to measure performance with longer sequence lengths. The
models are also tested on multiple multilingual tasks to confirm their
competence across many languages. Furthermore, stringent safety tests
are performed to comprehend and avoid possible harms, such as
measurements of policy break rates and understanding about sensitive
areas. Lastly, the memorization ability of the models is tested to
comprehend how much they replicate training data. These varied tests
cumulatively present a detailed picture of the strengths and areas of
improvement for Gemma 3.
One potential area for future work, while already a strong point of
Gemma 3, could involve further optimization of performance and
memory usage. This kind of optimization may be particularly helpful for
multimodal models. It would be a goal to support even more
resource-constrained environments. Even though Pan & Scan can push
through some limitations due to the fixed inference input resolution of the
vision encoder to a certain degree, further enhancement could be made.
This enhancement would be in withstanding changing image aspect
ratios and resolutions. Continued development is also a likely course of
Conclusion
Blog: https://blog.google/technology/developers/gemma-3/
Developer: https://developers.googleblog.com/en/introducing-gemma3/
Disclaimer - This article is intended purely for informational purposes. It is not sponsored or endorsed by any company or
organization, nor does it serve as an advertisement or promotion for any product or service. All information presented is based
on publicly available resources and is subject to change. Readers are encouraged to conduct their own research and due
diligence.