d2: Improved Techniques for Training Reasoning Diffusion Language Models
Anzeige
Ähnliche Artikel
arXiv – cs.LG
•
Aligning Diffusion Language Models via Unpaired Preference Optimization
arXiv – cs.LG
•
Nash Policy Gradient: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria
arXiv – cs.AI
•
Probabilistic Modeling of Intentions in Socially Intelligent LLM Agents
arXiv – cs.AI
•
Leading the Follower: LLM-Agenten meistern soziale Deduktion durch überzeugende Kommunikation
arXiv – cs.LG
•
Fine-Tuning Diffusion Models via Intermediate Distribution Shaping
arXiv – cs.AI
•
Robix: A Unified Model for Robot Interaction, Reasoning and Planning