Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes this is quite shocking. They could have just had o3 fact check the slides and it would have noticed...


I thought so too, but I gave it a screenshot with the prompt:

> good plot for my presentation?

and it didn't pick up on the issue. Part of its response was:

> Clear metric: Y-axis (“Accuracy (%), pass @1”) and numeric labels make the performance gaps explicit.

I think visual reasoning is still pretty far from text-only reasoning.


o3 did fact check the slides and it fixed its lower score.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: