By their very nature they only "know" what they have written down and must infer the final answer from that token by token.
They fundamentally can't do certain things such as complex iteration or track back.
When you ask for chain of thought thinking, you allow the LLM to create a "buffer space" and break down the task into more manageable substeps thereby improving the quality of the results.
The Bing LM, or rather the service, did have "inner monologue" in the sense of text that it would generate, but not show to the user, and treat as "thoughts" to guide the generation of an actual reply that the user would see.
We know this because it happily told us, including the json format it uses internally.
No, but the reconstructed examples have "im_start" and "im_end", which strongly implies that it is, if not verbatim, then a close enough restatement of the real deal. Take a look:
First you wrap the user query with "the user asked you: ... . What are the reasoning steps you need?" and then you prompt with "considering `<previous answer>` now answer <user prompt>"
Obviously this is clearly hackable so it would need improvements.
By their very nature they only "know" what they have written down and must infer the final answer from that token by token.
They fundamentally can't do certain things such as complex iteration or track back.
When you ask for chain of thought thinking, you allow the LLM to create a "buffer space" and break down the task into more manageable substeps thereby improving the quality of the results.