Capability

Distributed Model Training With Data Parallelism

20 artifacts provide this capability.

Want a personalized recommendation?

Top Matches

via “distributed training with ddp and fsdp for multi-gpu scaling”

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Unique: Implements both DDP and FSDP strategies with automatic selection based on model size and hardware configuration, with integrated checkpoint management that handles distributed state serialization and conversion to single-GPU format

vs others: Provides flexible distributed training with both data parallelism (DDP) and model parallelism (FSDP) options, enabling efficient scaling from 2 GPUs to 100+ GPUs without code changes

Distributed Model Training With Data Parallelism

Top Matches

Also Known As

Company