Position: Human Baselines in Model Evaluations Need Rigor and Transparency
We argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and we provide recommendations and a reporting checklist towards this end