Capability
Activation Checkpointing With Selective Layer Recomputation
2 artifacts provide this capability.
Want a personalized recommendation?
Find the best match →Top Matches
Microsoft's distributed training library — ZeRO optimizer, trillion-parameter scale, RLHF.
Unique: Selective layer-wise checkpointing that recomputes only expensive layers (attention, MLP) while keeping normalization activations, achieving 30-50% memory reduction with <10% compute cost; uses gradient checkpointing API for transparent integration
vs others: More fine-grained than full-model checkpointing; lower overhead than storing all activations