🔁 Hugging Face 转推了
Lucas Beyer (bl16) @giffmana
I like the Encoder-only Mask Transformer (EoMT): basically removing all the bells and whistles, and doing panoptic segmentation with an almost vanilla ViT.
You're sliiiiightly worse for the same encoder size, but it's a lot simpler/faster and (likely) more scalable. I wish they
You're sliiiiightly worse for the same encoder size, but it's a lot simpler/faster and (likely) more scalable. I wish they

