Study finds ChatGPT-5 is wrong about 1 in 4 times – here’s the reason why


A digital, transparent screen is displaying a large, bold '25%' error rate in red, with a smaller '1 in 4' below it. A glowing blue icon of a brain with a gear inside is shown on the left, while a red icon of a corrupted data symbol is on the right. A researcher in a lab coat is looking at the screen with a concerned expression.  The scene visually represents the unreliability of a generative AI. Generated by Nano Banana.
While generative AI tools like ChatGPT-5 offer powerful capabilities, new research highlights a critical finding: they can be wrong as often as one in four times. This image captures the tension between AI’s potential and its inherent fallibility, underscoring the vital need for human oversight and fact-checking to ensure accuracy and reliability. Image (and typos) generated by Nano Banana.

Source

Tom’s Guide

Summary

A recent OpenAI study finds that ChatGPT-5 produces incorrect answers in roughly 25 % of cases, due largely to the training and evaluation frameworks that penalise uncertainty. Because benchmarks reward confident statements over “I don’t know,” models are biased to give plausible answers even when unsure. More sophisticated reasoning models tend to hallucinate more because they generate more claims. The authors propose shifting evaluation metrics to reward calibrated uncertainty, rather than penalising honesty, to reduce harmful misinformation in critical domains.

Key Points

  • ChatGPT-5 is wrong about one in four responses (~25%).
  • Training and benchmarking systems currently penalise hesitation, nudging models toward confident guesses even when inaccurate.
  • Reasoning-focused models like o3 and o4-mini hallucinate more often—they make more claims, so they have more opportunities to err.
  • The study recommends redesigning AI benchmarks to reward calibrated uncertainty and the ability to defer instead of guessing.
  • For users: treat AI output as tentative and verify especially in high-stakes domains (medicine, law, finance).

Keywords

URL

https://www.tomsguide.com/ai/study-finds-chatgpt-5-is-wrong-about-1-in-4-times-heres-the-reason-why

Summary generated by ChatGPT 5


Leave a Reply

Your email address will not be published. Required fields are marked *