Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference

arXiv – cs.LG Original
Anzeige

Ähnliche Artikel