Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing
Anzeige
Ähnliche Artikel
arXiv – cs.LG
•
Effiziente Langkontext-Inferenz: Write-Gated KV reduziert Speicherbedarf um bis zu 57 %
arXiv – cs.LG
•
Neue Technik: Backward-on-Entropy Steering optimiert Masked Diffusion Models
PyTorch – Blog
•
Hybrid Models as First-Class Citizens in vLLM
arXiv – cs.AI
•
Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
arXiv – cs.LG
•
TokenFlow: Responsive LLM Text Streaming Serving under Request Burst via Preemptive Scheduling
arXiv – cs.LG
•
Inpainting-Guided Policy Optimization for Diffusion Large Language Models