You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.
It would be interesting to have in this library !
The text was updated successfully, but these errors were encountered:
I would be very happy to have it in the library, however, I have some concerns. It's an instance/panoptic segmentation model and it would be the first model of such a class. So it may not be straightforward to add it with training because the Matcher and loss need to be defined, and the training architecture is a bit different from the inference one.
However, in case anyone is happy to challenge themselves, I would greatly appreciate it and would help with the integration!
Hi, here to share a new image segmentation paper using ViT !
Paper : https://arxiv.org/abs/2503.19108
Code : https://github.com/tue-mps/eomt
This papers reach almost SOTA result with considerably less complex architectures (vision transformer only), if they are already well pretrained. EoMT only uses the architecture of the plain ViT with a few extra learned queries and a small mask prediction module. It works on par with ViT-Adapter + Mask2Former while being much less complex.
It would be interesting to have in this library !
The text was updated successfully, but these errors were encountered: