MatSciBench: Benchmarking the Reasoning Ability of Large Language Models in Materials Science
Anzeige
Ähnliche Artikel
arXiv – cs.AI
•
QuantumBench: A Benchmark for Quantum Problem Solving
arXiv – cs.LG
•
Dual-Weighted Reinforcement Learning for Generative Preference Modeling
arXiv – cs.LG
•
Inverse-Free Wilson Loops for Transformers: A Practical Diagnostic for Invariance and Order Sensitivity
arXiv – cs.LG
•
Dynamic Policy Induction for Adaptive Prompt Optimization: Bridging the Efficiency-Accuracy Gap via Lightweight Reinforcement Learning
arXiv – cs.AI
•
Analogy-Driven Financial Chain-of-Thought (AD-FCoT): A Prompting Approach for Financial Sentiment Analysis
arXiv – cs.AI
•
SHERPA: A Model-Driven Framework for Large Language Model Execution