> This is like lamenting that a person who has a doctoral degree, say, in mathematics or physics often don't have a more than basic knowledge in, for example, medicine or pharmacy.
It was word problems not rocket science. That tells a lot about human intelligence. We're much less smart than we imagine, and most of our intelligence is based on book learning, not original discovery. Causal reasoning is based on learning and checking exceptions to rules. Truly novel ideation is actually rare.
We spent years implementing transformers in a naive way until someone figured out you can do it with much less memory (FlashAttention). That was such a face palm, it was a trivial idea thousands of PhDs missed. And the code is just 3 for loops, with a multiplication, a sum and an exponential. An algorithm that fits on a napkin in its abstract form.
Doesn't this lead you to, perhaps, question the category and measure of "intelligence" in general, especially how it is mobilized in this kind of context? Like this very angle does a lot to point out the contradictions in some speculative metaphysical category of "intelligence" or "being smart," but then you just seem to accept it in this particular kind of fatalism.
Why not take away from this that "intelligence" is a word that obtains something relative to a particular society, namely, one which values some kind of behavior and speech over others. "Intelligence" is something important to society, its the individual who negotiates (or not) the way they think and learn with what this particular signifier connects with at a given place and time.
Like I assume you don't agree, but just perhaps if we use our "intelligence" here we could maybe come to some different conclusions here! Everyone is just dying to be like mid-20th century behaviorist now, I just don't understand!
Yes, I think intelligence is social and we kind of write off the social part and prefer to think in heroic terms, like "Einstein was so smart!"
I prefer to use the concept of search instead, it is better defined in search space and goal space. It doesn't hide the environment, the external part of intelligence, or the learning process.
> And the code is just 3 for loops, with a multiplication, a sum and an exponential.
All invented/discovered and formalized by humans. That we found so much (unexpected) power in such simple abstractions is not a failure but a testament to the absolute ingenuity of human pursuit of knowledge.
The mistake is we’re over-estimating isolated discoveries and underestimating their second order effects.
> a testament to the absolute ingenuity of human pursuit of knowledge
I think it is more like searching and stumbling onto some great idea than pure-brain-ingenuity. That is why searching and social collaboration is essential and why I say we're not that smart individually, but we search together. It's slow, it took us years to get to Flash version of attention, but we get there, someone finds their way onto a major discovery eventually.
It took humanity 200K years to accumulate our current level of understanding, and if we lost it, it would take us another 200k years. Not even a whole human generation is that smart. It's also why I don't fault LLMs for mass-learning from human text. We do the same thing, 99% is inherited knowledge. The whole process of knowledge discovery moves slowly, and over large populations.
It’s a failure in that for decades we thought we had to circumlocute theoretically about all kinds of made up things for consciousness to exist rather than just leverage a bit of looping evolution like the universe did.
It was word problems not rocket science. That tells a lot about human intelligence. We're much less smart than we imagine, and most of our intelligence is based on book learning, not original discovery. Causal reasoning is based on learning and checking exceptions to rules. Truly novel ideation is actually rare.
We spent years implementing transformers in a naive way until someone figured out you can do it with much less memory (FlashAttention). That was such a face palm, it was a trivial idea thousands of PhDs missed. And the code is just 3 for loops, with a multiplication, a sum and an exponential. An algorithm that fits on a napkin in its abstract form.