Capability
9 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →via “onnx model export and optimization for edge deployment”
Fast local neural TTS optimized for Raspberry Pi and edge devices.
Unique: Implements ONNX export with built-in quantization and operator fusion specifically tuned for VITS architecture, enabling 50-70% model size reduction with minimal quality loss vs. generic ONNX converters
vs others: More optimized for TTS than generic ONNX export tools; supports quantization strategies specific to VITS; produces models 2-3x smaller than unoptimized exports while maintaining quality
via “onnx-and-openvino-export-for-edge-deployment”
sentence-similarity model by undefined. 25,30,482 downloads.
Unique: Provides native ONNX and OpenVINO export support with quantization-friendly architecture (no custom ops). Enables deployment on edge devices and CPU-only infrastructure with minimal code changes, supporting both float32 and int8 quantized inference.
vs others: Faster edge deployment than PyTorch models because ONNX Runtime and OpenVINO use optimized inference engines with hardware-specific optimizations, and quantization support reduces model size by 4x and latency by 2-3x compared to full-precision models.
via “onnx and openvino model export for edge deployment”
sentence-similarity model by undefined. 70,32,108 downloads.
Unique: Provides pre-optimized ONNX and OpenVINO representations of multilingual-e5-small, enabling single-model deployment across diverse hardware (CPUs, mobile, edge) without language-specific optimizations. OpenVINO export includes graph-level optimizations (operator fusion, constant folding) and quantization-aware training compatibility, reducing inference latency by 2-4x on Intel CPUs.
vs others: Smaller and faster than PyTorch deployment for edge use cases; more portable than TensorFlow Lite (which lacks transformer support); enables privacy-preserving on-device inference without cloud dependencies.
via “onnx export for cross-platform deployment”
A generative speech model for daily dialogue.
Unique: Provides ONNX export capability for all major pipeline components (GPT, DVAE, Vocos), enabling end-to-end deployment without PyTorch. The export process includes optimization and quantization options, enabling deployment on resource-constrained devices.
vs others: More flexible than PyTorch-only deployment because ONNX enables use of alternative inference runtimes (ONNX Runtime, TensorRT, CoreML). More portable than TorchScript because ONNX is a standard format with broad ecosystem support.
via “onnx and openvino model export for edge deployment”
sentence-similarity model by undefined. 36,60,082 downloads.
Unique: Supports three inference backends (PyTorch, ONNX Runtime, OpenVINO) from a single model artifact, with automatic optimization for each target platform — ONNX for cross-platform compatibility, OpenVINO for Intel hardware, PyTorch for development
vs others: More portable than PyTorch-only deployment and faster than unoptimized ONNX due to OpenVINO's graph-level optimizations; enables 2-4x latency reduction on CPU compared to PyTorch inference
via “onnx and openvino model export for edge and on-premise deployment”
sentence-similarity model by undefined. 17,78,169 downloads.
Unique: Provides native ONNX and OpenVINO export through sentence-transformers' built-in conversion utilities, supporting both full-precision and quantized models without custom export code. The export process preserves the tokenizer and preprocessing logic, enabling end-to-end inference without reimplementing text preprocessing.
vs others: One-command export to multiple formats (ONNX, OpenVINO) with quantization support, whereas most models require separate conversion pipelines and manual tokenizer integration for edge deployment.
via “onnx export for edge deployment and inference optimization”
token-classification model by undefined. 18,11,113 downloads.
Unique: Supports ONNX export via transformers' built-in export utilities, enabling deployment on ONNX Runtime which provides hardware-specific optimizations (graph fusion, operator fusion, quantization) without retraining. ONNX models are framework-agnostic and can run on CPU, GPU, or specialized accelerators (NPU, TPU) via different ONNX Runtime backends.
vs others: Faster and smaller than PyTorch checkpoints due to graph optimization, and more portable than TensorFlow SavedModel, but requires additional conversion step and validation compared to native PyTorch deployment.
via “onnx and openvino quantized inference for edge deployment”
feature-extraction model by undefined. 13,37,383 downloads.
Unique: Provides both ONNX and OpenVINO export formats with INT8 quantization pre-applied, enabling plug-and-play edge deployment without requiring custom quantization pipelines. Maintains <2% accuracy loss through careful calibration on representative text samples, unlike generic quantization approaches that often degrade embedding quality.
vs others: Faster edge inference than Sentence-BERT's standard PyTorch format (2-4x speedup via INT8) and more accessible than proprietary edge models like TensorFlow Lite, with no vendor lock-in.
via “model-export-to-onnx-and-openvino-backends”
Embeddings, Retrieval, and Reranking
Unique: Exports models to ONNX and OpenVINO formats with optional quantization, enabling CPU-only and edge device deployment without PyTorch runtime — more deployment-flexible than PyTorch-only alternatives
vs others: Enables deployment on resource-constrained devices because ONNX/OpenVINO models are smaller and faster than PyTorch, vs. PyTorch-only libraries requiring full runtime installation
Building an AI tool with “Onnx And Openvino Export For Edge Deployment”?
Submit your artifact →curl unfragile.ai/agents.md | sh© 2026 Unfragile. The platform for software for agents.