Capability

Activation Checkpointing With Selective Layer Recomputation

2 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

Microsoft's distributed training library — ZeRO optimizer, trillion-parameter scale, RLHF.

Unique: Selective layer-wise checkpointing that recomputes only expensive layers (attention, MLP) while keeping normalization activations, achieving 30-50% memory reduction with <10% compute cost; uses gradient checkpointing API for transparent integration

vs others: More fine-grained than full-model checkpointing; lower overhead than storing all activations

Activation Checkpointing With Selective Layer Recomputation

Top Matches

Also Known As

Company