Onnx And Openvino Model Export For Edge Deployment

1

all-MiniLM-L6-v2Model57/100

via “multi-format-model-export-and-inference”

sentence-similarity model by undefined. 23,35,18,673 downloads.

Unique: Distributed across multiple ecosystem projects (sentence-transformers for PyTorch, ONNX community for format conversion, OpenVINO toolkit for Intel optimization) rather than single unified export pipeline; enables best-in-class optimization per format but requires manual orchestration

vs others: More deployment flexibility than proprietary embedding APIs (OpenAI, Cohere) which lock you into their inference infrastructure; more mature ONNX support than newer models due to wide adoption in sentence-transformers ecosystem

2

Piper TTSRepository55/100

via “onnx model export and optimization for edge deployment”

Fast local neural TTS optimized for Raspberry Pi and edge devices.

Unique: Implements ONNX export with built-in quantization and operator fusion specifically tuned for VITS architecture, enabling 50-70% model size reduction with minimal quality loss vs. generic ONNX converters

vs others: More optimized for TTS than generic ONNX export tools; supports quantization strategies specific to VITS; produces models 2-3x smaller than unoptimized exports while maintaining quality

3

xlm-roberta-baseModel54/100

via “onnx model export and optimized inference”

fill-mask model by undefined. 1,81,65,674 downloads.

Unique: Provides native ONNX export support via HuggingFace Transformers, enabling single-command conversion to hardware-agnostic format with built-in optimization profiles for CPU, GPU, and mobile inference — unlike manual ONNX conversion which requires deep knowledge of ONNX IR and operator semantics

vs others: Reduces deployment complexity and inference latency compared to PyTorch/TensorFlow serving by eliminating framework dependencies and enabling aggressive quantization/pruning, while maintaining model accuracy through ONNX Runtime's operator fusion and memory optimization

4

bge-m3Model54/100

via “onnx model export for edge and serverless deployment”

sentence-similarity model by undefined. 2,04,74,507 downloads.

Unique: Pre-optimized ONNX export with native quantization support and operator fusion for CPU inference, reducing deployment complexity compared to manual PyTorch-to-ONNX conversion while maintaining embedding quality through careful quantization calibration

vs others: Simpler than custom ONNX conversion pipelines and includes pre-tuned quantization profiles, whereas generic PyTorch-to-ONNX export requires manual optimization; reduces cold-start latency by 60-80% vs PyTorch Lambda deployments

5

all-MiniLM-L12-v2Model54/100

via “multi-format-model-export-and-deployment”

sentence-similarity model by undefined. 28,25,304 downloads.

Unique: Provides native export to four distinct inference formats with automatic tokenizer serialization and config preservation, enabling single-command deployment across CPU, GPU, mobile, and edge hardware without manual format conversion or architecture reimplementation; SafeTensors format ensures secure deserialization preventing arbitrary code execution

vs others: More deployment-flexible than OpenAI embeddings (API-only); simpler than custom ONNX conversion pipelines; safer than pickle-based PyTorch exports due to SafeTensors format

6

bge-base-en-v1.5Model53/100

via “onnx-export-and-cpu-inference”

feature-extraction model by undefined. 81,55,394 downloads.

Unique: BGE-base-en-v1.5 provides official ONNX exports with optimized graph structure for inference runtimes, enabling sub-100ms CPU inference on modern processors and enabling deployment on edge devices without PyTorch or GPU requirements

vs others: Faster CPU inference than PyTorch eager execution and more portable than TorchScript for cross-platform deployment; enables embedding generation on edge devices where PyTorch is too heavy

7

multi-qa-mpnet-base-dot-v1Model52/100

via “onnx-and-openvino-export-for-edge-deployment”

sentence-similarity model by undefined. 25,30,482 downloads.

Unique: Provides native ONNX and OpenVINO export support with quantization-friendly architecture (no custom ops). Enables deployment on edge devices and CPU-only infrastructure with minimal code changes, supporting both float32 and int8 quantized inference.

vs others: Faster edge deployment than PyTorch models because ONNX Runtime and OpenVINO use optimized inference engines with hardware-specific optimizations, and quantization support reduces model size by 4x and latency by 2-3x compared to full-precision models.

8

multilingual-e5-smallModel52/100

sentence-similarity model by undefined. 70,32,108 downloads.

Unique: Provides pre-optimized ONNX and OpenVINO representations of multilingual-e5-small, enabling single-model deployment across diverse hardware (CPUs, mobile, edge) without language-specific optimizations. OpenVINO export includes graph-level optimizations (operator fusion, constant folding) and quantization-aware training compatibility, reducing inference latency by 2-4x on Intel CPUs.

vs others: Smaller and faster than PyTorch deployment for edge use cases; more portable than TensorFlow Lite (which lacks transformer support); enables privacy-preserving on-device inference without cloud dependencies.

9

table-transformer-detectionModel52/100

via “onnx model export for edge deployment and inference optimization”

object-detection model by undefined. 33,94,499 downloads.

Unique: Provides transformer-aware ONNX export that preserves attention mechanism semantics while enabling quantization-friendly operator fusion. The export pipeline includes automatic calibration for INT8 quantization using representative document images, reducing manual tuning overhead.

vs others: More portable than TensorFlow Lite or CoreML because ONNX Runtime runs on Windows, Linux, macOS, iOS, and Android with identical inference results; achieves better accuracy-latency tradeoffs than naive INT8 quantization due to transformer-specific calibration strategies.

10

multilingual-e5-baseModel51/100

sentence-similarity model by undefined. 36,60,082 downloads.

Unique: Supports three inference backends (PyTorch, ONNX Runtime, OpenVINO) from a single model artifact, with automatic optimization for each target platform — ONNX for cross-platform compatibility, OpenVINO for Intel hardware, PyTorch for development

vs others: More portable than PyTorch-only deployment and faster than unoptimized ONNX due to OpenVINO's graph-level optimizations; enables 2-4x latency reduction on CPU compared to PyTorch inference

11

ChatTTSAgent51/100

via “onnx export for cross-platform deployment”

A generative speech model for daily dialogue.

Unique: Provides ONNX export capability for all major pipeline components (GPT, DVAE, Vocos), enabling end-to-end deployment without PyTorch. The export process includes optimization and quantization options, enabling deployment on resource-constrained devices.

vs others: More flexible than PyTorch-only deployment because ONNX enables use of alternative inference runtimes (ONNX Runtime, TensorRT, CoreML). More portable than TorchScript because ONNX is a standard format with broad ecosystem support.

12

e5-base-v2Model49/100

via “onnx and openvino model export for edge and on-premise deployment”

sentence-similarity model by undefined. 17,78,169 downloads.

Unique: Provides native ONNX and OpenVINO export through sentence-transformers' built-in conversion utilities, supporting both full-precision and quantized models without custom export code. The export process preserves the tokenizer and preprocessing logic, enabling end-to-end inference without reimplementing text preprocessing.

vs others: One-command export to multiple formats (ONNX, OpenVINO) with quantization support, whereas most models require separate conversion pipelines and manual tokenizer integration for edge deployment.

13

UAE-Large-V1Model49/100

via “onnx and openvino quantized inference for edge deployment”

feature-extraction model by undefined. 13,37,383 downloads.

Unique: Provides both ONNX and OpenVINO export formats with INT8 quantization pre-applied, enabling plug-and-play edge deployment without requiring custom quantization pipelines. Maintains <2% accuracy loss through careful calibration on representative text samples, unlike generic quantization approaches that often degrade embedding quality.

vs others: Faster edge inference than Sentence-BERT's standard PyTorch format (2-4x speedup via INT8) and more accessible than proprietary edge models like TensorFlow Lite, with no vendor lock-in.

14

bert-base-NERModel49/100

via “onnx export for edge deployment and inference optimization”

token-classification model by undefined. 18,11,113 downloads.

Unique: Supports ONNX export via transformers' built-in export utilities, enabling deployment on ONNX Runtime which provides hardware-specific optimizations (graph fusion, operator fusion, quantization) without retraining. ONNX models are framework-agnostic and can run on CPU, GPU, or specialized accelerators (NPU, TPU) via different ONNX Runtime backends.

vs others: Faster and smaller than PyTorch checkpoints due to graph optimization, and more portable than TensorFlow SavedModel, but requires additional conversion step and validation compared to native PyTorch deployment.

15

mDeBERTa-v3-base-xnli-multilingual-nli-2mil7Model47/100

via “onnx-model-export-and-inference”

zero-shot-classification model by undefined. 3,03,704 downloads.

Unique: Enables ONNX export of the DeBERTa-v3-base architecture with full transformer semantics preserved, supporting dynamic batch sizes and sequence lengths without reexport. Unlike simple PyTorch-to-ONNX conversion, this approach maintains cross-lingual capabilities and NLI reasoning patterns across different runtime environments.

vs others: Provides hardware-agnostic inference without PyTorch dependency, enabling 2-5x faster startup and lower memory overhead than PyTorch on CPU, and supports quantization for 4x model size reduction with minimal accuracy loss vs full-precision models.

16

mask2former-swin-large-cityscapes-semanticModel46/100

via “model export to onnx and torchscript formats”

image-segmentation model by undefined. 1,55,904 downloads.

Unique: Supports export to both ONNX and TorchScript, enabling deployment across diverse inference engines (ONNX Runtime, TensorRT, CoreML) — though deformable attention may require custom ONNX operators not available in standard opset

vs others: Enables multi-platform deployment vs PyTorch-only inference, though export complexity and potential operator compatibility issues add deployment friction

17

sat-12l-smModel41/100

via “onnx-optimized inference export for production deployment”

token-classification model by undefined. 3,07,609 downloads.

Unique: Provides pre-exported ONNX weights alongside safetensors format, eliminating conversion overhead and enabling immediate deployment to ONNX Runtime without requiring PyTorch/TensorFlow toolchains on target systems

vs others: Faster deployment than converting from PyTorch at runtime; ONNX format is hardware-agnostic unlike TensorRT (NVIDIA-only) or CoreML (Apple-only), enabling single export for multi-platform deployment

18

Anzhcs_YOLOsModel39/100

via “model export to multiple inference frameworks and hardware targets”

object-detection model by undefined. 86,897 downloads.

Unique: Ultralytics provides one-line export API (model.export(format='onnx')) that handles all conversion complexity internally, including dynamic shape handling and optimization. Supports 13+ export formats from single codebase without manual graph surgery or format-specific code.

vs others: Simpler export workflow than ONNX Model Zoo or TensorFlow's conversion tools; automatic optimization for each target (TensorRT graph fusion, CoreML neural engine tuning) without manual tuning per format.

19

optimumFramework32/100

via “hardware-agnostic model export to optimized formats”

Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.

Unique: Uses a composition of TasksManager (task-type detection), NormalizedConfig (architecture-agnostic config standardization), and ExporterConfig subclass hierarchy to decouple export logic from model architecture, enabling new format support without modifying core export pipeline. Dummy input generation system automatically constructs valid inputs based on model signatures rather than requiring manual specification.

vs others: Unified export API across 40+ architectures and 8+ formats with automatic task detection, whereas alternatives like ONNX's converter scripts require format-specific code per architecture and manual input specification.

20

sentence-transformersRepository28/100

via “model-export-to-onnx-and-openvino-backends”

Embeddings, Retrieval, and Reranking

Unique: Exports models to ONNX and OpenVINO formats with optional quantization, enabling CPU-only and edge device deployment without PyTorch runtime — more deployment-flexible than PyTorch-only alternatives

vs others: Enables deployment on resource-constrained devices because ONNX/OpenVINO models are smaller and faster than PyTorch, vs. PyTorch-only libraries requiring full runtime installation

Top Matches

Also Known As

Company