How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
Anzeige
Ähnliche Artikel
arXiv – cs.AI
•
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
arXiv – cs.LG
•
ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models
arXiv – cs.AI
•
Learning Neural Control Barrier Functions from Expert Demonstrations using Inverse Constraint Learning
arXiv – cs.AI
•
Towards Reliable Code-as-Policies: A Neuro-Symbolic Framework for Embodied Task Planning
arXiv – cs.AI
•
A Multimodal Benchmark for Framing of Oil & Gas Advertising and Potential Greenwashing Detection
arXiv – cs.AI
•
Evaluating Hallucinations in Multimodal LLMs with Spoken Queries under Diverse Acoustic Conditions