Add EoMT from ViT is Secretly an Image Segmentation Model #1132

tcourat · 2025-04-19T11:49:19Z

Hi, here to share a new image segmentation paper using ViT !

Paper : https://arxiv.org/abs/2503.19108
Code : https://github.com/tue-mps/eomt

This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.

It would be interesting to have in this library !

qubvel · 2025-04-19T22:47:46Z

Hey @tcourat, indeed, super nice work!

I would be very happy to have it in the library, however, I have some concerns. It's an instance/panoptic segmentation model and it would be the first model of such a class. So it may not be straightforward to add it with training because the Matcher and loss need to be defined, and the training architecture is a bit different from the inference one.

However, in case anyone is happy to challenge themselves, I would greatly appreciate it and would help with the integration!

qubvel added new-model good difficult issue labels Apr 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

tcourat commented Apr 19, 2025

qubvel commented Apr 19, 2025

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Uh oh!

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Add EoMT from ViT is Secretly an Image Segmentation Model #1132

Comments

tcourat commented Apr 19, 2025

qubvel commented Apr 19, 2025

Uh oh!

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.