
Source
Tom’s Guide
Summary
A recent OpenAI study finds that ChatGPT-5 produces incorrect answers in roughly 25 % of cases, due largely to the training and evaluation frameworks that penalise uncertainty. Because benchmarks reward confident statements over “I don’t know,” models are biased to give plausible answers even when unsure. More sophisticated reasoning models tend to hallucinate more because they generate more claims. The authors propose shifting evaluation metrics to reward calibrated uncertainty, rather than penalising honesty, to reduce harmful misinformation in critical domains.
Key Points
- ChatGPT-5 is wrong about one in four responses (~25%).
- Training and benchmarking systems currently penalise hesitation, nudging models toward confident guesses even when inaccurate.
- Reasoning-focused models like o3 and o4-mini hallucinate more often—they make more claims, so they have more opportunities to err.
- The study recommends redesigning AI benchmarks to reward calibrated uncertainty and the ability to defer instead of guessing.
- For users: treat AI output as tentative and verify especially in high-stakes domains (medicine, law, finance).
Keywords
URL
https://www.tomsguide.com/ai/study-finds-chatgpt-5-is-wrong-about-1-in-4-times-heres-the-reason-why
Summary generated by ChatGPT 5