Semantic Segmentation With 171 Extended Object Stuff Categories Via Coco Stuff Variant

1

MS COCO (Common Objects in Context)Dataset60/100

via “semantic segmentation with 171 extended object/stuff categories via coco-stuff variant”

330K images with object detection, segmentation, and captions.

Unique: 171-category taxonomy combining 80 instance objects + 91 stuff categories enables panoptic segmentation in single dataset; pixel-level masks for stuff enable dense scene understanding without instance boundaries

vs others: More comprehensive than ADE20K (150 categories) and larger scale than Cityscapes (5K images); unified instance+stuff annotation enables panoptic evaluation unlike separate semantic/instance datasets

2

MMDetectionRepository58/100

via “panoptic segmentation with stuff and thing fusion”

OpenMMLab detection toolbox with 300+ models.

Unique: Implements panoptic segmentation by combining instance segmentation (Mask R-CNN) for things with semantic segmentation for stuff, then fusing predictions with a learned fusion module that resolves overlaps and assigns consistent instance IDs across both prediction types

vs others: More comprehensive than instance-only segmentation because it captures both countable objects and scene context; more efficient than running separate instance and semantic models because it shares backbone features; better integrated than post-hoc fusion approaches because fusion is learned end-to-end

3

segformer-b0-finetuned-ade-512-512Fine-tune47/100

via “ade20k-scene-category-prediction-with-class-mapping”

image-segmentation model by undefined. 3,13,332 downloads.

Unique: Provides direct mapping to 150 ADE20K scene categories with official color palette and hierarchical groupings, enabling interpretable scene understanding without post-hoc label engineering — most generic segmentation models require manual class mapping and visualization setup

vs others: Pre-trained on diverse indoor/outdoor scenes (ADE20K) with comprehensive 150-class taxonomy covering furniture, building parts, and natural elements, providing richer scene understanding than generic COCO panoptic segmentation (80 classes) or Cityscapes (19 classes) which focus on specific domains

4

oneformer_ade20k_swin_tinyModel46/100

via “ade20k-scene-parsing-with-150-class-taxonomy”

image-segmentation model by undefined. 2,48,429 downloads.

Unique: Trained specifically on ADE20K's 150-class taxonomy with dense pixel-level annotations for indoor scenes, providing fine-grained scene understanding (room types, furniture, architectural elements) that general-purpose segmentation models (e.g., COCO-trained models with 80 classes) cannot match. Achieves 48.5% mIoU on ADE20K validation set through task-conditioned learning.

vs others: Achieves higher accuracy on ADE20K benchmarks than task-specific models (e.g., Mask2Former, DeepLabV3+) due to unified task learning; provides 150 semantic classes vs 80 for COCO-trained models, enabling richer scene understanding for indoor applications.

5

segformer-b0-finetuned-ade-512-512Fine-tune45/100

via “ade20k-scene-class-prediction-with-150-categories”

image-segmentation model by undefined. 5,08,692 downloads.

Unique: Integrates ADE20K's 150-class ontology with hierarchical scene understanding — classes are organized by spatial context (indoor vs outdoor, furniture vs architecture) enabling downstream filtering and reasoning without custom label mapping

vs others: More granular than COCO segmentation (80 classes) for indoor scene understanding, and includes scene-context labels (wall, floor, ceiling) that generic object detectors omit

6

mask2former-swin-large-ade-semanticModel44/100

via “ade20k 150-class semantic taxonomy mapping”

image-segmentation model by undefined. 1,19,949 downloads.

Unique: Leverages ADE20K's diverse 150-class taxonomy that balances thing and stuff classes, enabling both instance-level and semantic-level understanding in a single model. Unlike COCO (80 classes, mostly things) or Cityscapes (19 classes, driving-focused), ADE20K covers diverse indoor/outdoor scenes with fine-grained distinctions.

vs others: ADE20K taxonomy provides 2-3x more semantic granularity than Cityscapes for indoor scenes and 1.5-2x more than COCO for stuff classes, enabling richer scene understanding at the cost of lower per-class accuracy on common categories like 'person' or 'car'.

7

segformer-b4-finetuned-ade-512-512Fine-tune43/100

via “ade20k-scene-parsing-with-150-semantic-classes”

image-segmentation model by undefined. 1,04,510 downloads.

Unique: Fine-tuned specifically on ADE20K's 150-class taxonomy covering both common and rare scene elements, achieving 50.3% mIoU through domain-specific optimization. Unlike generic segmentation models (COCO, Cityscapes), this model prioritizes scene understanding over object detection, with classes representing spatial regions and architectural elements rather than discrete objects.

vs others: Achieves 8-12% higher mIoU on ADE20K than Cityscapes-trained models and 15-20% higher than COCO-trained models due to domain-specific fine-tuning, making it the standard choice for scene parsing benchmarks.

8

segformer-b1-finetuned-ade-512-512Fine-tune43/100

via “ade20k-150-class-semantic-taxonomy-prediction”

image-segmentation model by undefined. 1,77,465 downloads.

Unique: Trained on ADE20K's hierarchical scene taxonomy (150 fine-grained classes) rather than generic COCO or Cityscapes, capturing scene-specific semantics like 'wall', 'ceiling', 'floor', and furniture types. Optimized for indoor/outdoor scene understanding rather than autonomous driving or panoptic segmentation.

vs others: Richer semantic granularity than Cityscapes (19 classes) for scene understanding; more scene-focused than COCO panoptic segmentation; better suited for interior robotics and spatial understanding than generic object detectors.

9

segformer-b2-finetuned-ade-512-512Fine-tune42/100

via “ade20k-scene-category-classification-with-150-classes”

image-segmentation model by undefined. 63,104 downloads.

Unique: Trained on ADE20K's 150-class taxonomy which includes fine-grained scene elements (architectural details, furniture types, vegetation species) rather than generic object categories — enables detailed scene understanding beyond basic object detection. Hierarchical class structure allows both coarse (e.g., 'furniture') and fine-grained (e.g., 'chair', 'table') predictions.

vs others: More comprehensive scene understanding than COCO-panoptic (80 classes) or Cityscapes (19 classes) for indoor/outdoor scenes, but less specialized than domain-specific models (medical, satellite) — best for general-purpose scene parsing.

10

yolov10sModel42/100

via “coco dataset-aligned class prediction with 80-class taxonomy”

object-detection model by undefined. 2,23,706 downloads.

Unique: Pre-trained on COCO with YOLOv10's improved training recipe (including anchor-free loss functions and dynamic label assignment), achieving higher mAP than prior YOLO versions on the same 80-class taxonomy without architectural changes to the classifier.

vs others: More accurate on COCO classes than YOLOv8s due to improved training dynamics; simpler class handling than open-vocabulary models (CLIP-based) which require additional inference steps but offer flexibility beyond 80 classes.

11

mask2former-swin-tiny-coco-instanceModel41/100

via “coco-pretrained 80-class object recognition with transfer learning”

image-segmentation model by undefined. 63,563 downloads.

Unique: Weights trained on COCO instance segmentation task (not just classification), meaning features encode both semantic and spatial information about object boundaries. This differs from ImageNet-pretrained backbones which optimize for classification only; COCO pretraining provides better initialization for segmentation tasks.

vs others: Outperforms ImageNet-pretrained backbones by 3-5 mAP on segmentation tasks due to instance-aware training; requires more computational resources than lightweight classification models but provides better transfer to dense prediction tasks.

12

rtdetr_r101vd_coco_o365Model40/100

via “multi-domain object detection with coco+objects365 pretraining”

object-detection model by undefined. 1,21,720 downloads.

Unique: Combines COCO (80 classes, high-quality annotations) with Objects365 (365 classes, broader coverage) in a unified detection framework using class-agnostic bounding box regression, enabling detection across 365+ object categories with a single model rather than ensemble or multi-task approaches

vs others: Broader category coverage than COCO-only models (365 vs 80 classes) with better generalization than Objects365-only training due to COCO's higher annotation quality, outperforming single-dataset detectors on diverse real-world images

13

oneformer_coco_swin_largeModel39/100

via “coco-dataset-pretraining-with-133-class-vocabulary”

image-segmentation model by undefined. 54,407 downloads.

Unique: Pre-trained jointly on semantic, instance, and panoptic segmentation tasks using a unified architecture, enabling transfer learning across all three tasks simultaneously. Unlike task-specific pre-training, this approach learns shared representations that benefit all downstream tasks.

vs others: Achieves 45.1 mIoU on COCO panoptic segmentation with a single model, competitive with specialized panoptic models while maintaining flexibility for semantic and instance tasks without retraining.

Top Matches

Also Known As

Company