ONNX RuntimeFramework44/100
via “graph-level optimization with operator fusion and memory planning”
Cross-platform ML inference accelerator — runs ONNX models on any hardware with optimizations.
Unique: Implements a modular optimizer pipeline (onnxruntime/core/optimizer/graph_transformer.h) where each optimization pass (constant folding, fusion, layout optimization) is a separate transformer class, allowing selective enabling/disabling and composition. The memory planner (onnxruntime/core/framework/allocation_planner.cc) uses a graph coloring algorithm to assign tensor lifetimes and maximize buffer reuse across the entire computation graph.
vs others: More aggressive fusion than TensorFlow's graph optimization (fuses across operator boundaries including attention patterns) and provides explicit memory planning vs PyTorch's dynamic allocation, enabling predictable memory usage on embedded devices.