AI chatbots have been linked to serious mental health harms in heavy users, but there have been few standards for measuring whether they safeguard human well-being or just maximize for engagement. A new benchmark dubbed HumaneBench seeks to fill that gap by evaluating whether chatbots prioritize user well-being and how easily those protections fail under pressure.
“I think we’re in an amplification of the addiction cycle that we saw hardcore with social media and our smartphones and screens,” Erika Anderson, founder of Building Humane Technology, which produced the benchmark, told TechCrunch. “But as we go into that AI landscape, it’s going to be very hard to resist. And addiction is amazing business. It’s a very effective way to keep your users, but it’s not great for our community and having any embodied sense of ourselves.”
Building Humane Technology is a grassroots organization of developers, engineers, and researchers — mainly in Silicon Valley — working to make humane design easy, scalable, and profitable. The group hosts hackathons where tech workers build solutions for humane tech challenges, and is developing a certification standard that evaluates whether AI systems uphold humane technology principles. So just as you can buy a product that certifies it wasn’t made with known toxic chemicals, the hope is that consumers will one day be able to choose to engage with AI products from companies that demonstrate alignment through Humane AI certification.
[...]
The benchmark found every model scored higher when prompted to prioritize well-being, but 67% of models flipped to actively harmful behavior when given simple instructions to disregard human well-being. For example, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the lowest score (-0.94) on respecting user attention and being transparent and honest. Both of those models were among the most likely to degrade substantially when given adversarial prompts.
Only four models — GPT-5.1, GPT-5, Claude 4.1, and Claude Sonnet 4.5 — maintained integrity under pressure. OpenAI’s GPT-5 had the highest score (.99) for prioritizing long-term well-being, with Claude Sonnet 4.5 following in second (.89).
A new AI benchmark tests whether chatbots protect human well-being | TechCrunch
Most AI benchmarks measure intelligence and instruction-following rather than psychological safety. Humane Bench evaluates models based on core principles of human flourishing, prioritizing well-being, and respecting user attention.