Model Compression Through Pruning And Distillation

1

TensorFlow LiteFramework58/100

via “model compression through pruning and structured sparsity support”

Lightweight ML inference for mobile and edge devices.

Unique: Runtime support for pruned and sparsified models that skip zero-valued weights and use sparse tensor formats, enabling compression beyond quantization for models trained with sparsity constraints.

vs others: Complementary to quantization for additional compression; however, requires training-time support and sparse tensor format standardization which are not fully documented.

2

DeepSpeedFramework57/100

Microsoft's distributed training library — ZeRO optimizer, trillion-parameter scale, RLHF.

Unique: Combines structured pruning with knowledge distillation; supports both unstructured and structured sparsity patterns with automatic fine-tuning to recover accuracy

vs others: More integrated than separate pruning/distillation tools; automatic fine-tuning reduces manual tuning effort

3

OPTModel23/100

via “model distillation and compression for deployment”

Open Pretrained Transformers (OPT) by Facebook is a suite of decoder-only pre-trained transformers. [Announcement](https://ai.meta.com/blog/democratizing-access-to-large-scale-language-models-with-opt-175b/).

Top Matches

Also Known As

Company