How I found mistakes in OpenAI’s HealthBench using AI

How I found mistakes in OpenAI’s HealthBench using AI

11 months ago
Anonymous $Xhdy3By1G_