Resnet Based Feature Extraction For Textline Images

1

detr-resnet-50Model44/100

via “resnet-50 cnn feature extraction with imagenet pretraining”

object-detection model by undefined. 2,39,063 downloads.

Unique: Uses ImageNet-1k pretrained ResNet-50 weights frozen or fine-tuned during DETR training, providing a stable feature extractor that has been validated across millions of natural images

vs others: More computationally efficient than Vision Transformer backbones while maintaining competitive accuracy; better established than EfficientNet for detection tasks due to widespread adoption in DETR implementations

2

detr-doc-table-detectionModel44/100

via “resnet-50 backbone feature extraction with transformer refinement”

object-detection model by undefined. 2,04,862 downloads.

Unique: Combines ImageNet-pretrained ResNet-50 CNN backbone with DETR transformer encoder-decoder, enabling both transfer learning from general vision tasks and document-specific spatial reasoning via attention, rather than using either CNN-only (Faster R-CNN) or transformer-only (ViT) approaches

vs others: More accurate than ResNet-50 alone for document tables because transformer attention captures long-range dependencies between table elements, and more efficient than pure vision transformers because ResNet-50 backbone provides strong inductive bias for local feature extraction, reducing transformer compute requirements

3

en_PP-OCRv5_mobile_recModel41/100

via “resnet-based feature extraction for textline images”

image-to-text model by undefined. 3,39,341 downloads.

Unique: Uses depthwise separable convolutions throughout the ResNet backbone to reduce parameters by ~70% compared to standard ResNet, while concatenating features from multiple scales (stride 4, 8, 16) to preserve fine-grained character details. This hybrid approach balances mobile efficiency with multi-scale robustness.

vs others: More parameter-efficient than standard ResNet50 used in EasyOCR, and faster than VGG-based backbones in Tesseract; trades some capacity for mobile deployability.

4

test_resnet.r160_in1kModel41/100

via “feature extraction and embedding generation from images”

image-classification model by undefined. 6,22,682 downloads.

Unique: Leverages ResNet-160's deep residual architecture to produce hierarchical multi-scale features; timm's model registry allows easy access to intermediate layer outputs via hook-based feature extraction, avoiding manual model surgery.

vs others: Produces more semantically rich embeddings than shallow CNNs and faster inference than Vision Transformers for feature extraction, with well-established benchmarks on standard image retrieval datasets.

Top Matches

Also Known As

Company