Those don't look like rewards, or at least don't get processed as such for many ...

IshKebab · on April 8, 2025

In RL rewards can be anything you want. They don't have to be things that humans like.

TeMPOraL · on April 8, 2025

Fair enough!

I guess you can always find some well-specified, measurable goal/reward, but then that choice limits the performance of your model. It's fine when you're building a very specialized system; it gets more difficult the more general you're trying to be.

For a general system meant to operate in human environment, the goal ends up approaching "things that humans like". Case in point, that's what the overall LLM goal function is - continuations that make sense to humans, in fully-general meaning of that.