Hacker Newsnew | past | comments | ask | show | jobs | submit | metalspot's commentslogin

This is obviously political and the entire narrative is fabrication.

David Sacks is publicly gloating about it: https://x.com/DavidSacks/status/2065853007619588171

I can't really say that Anthropic didn't get what they deserved. They exploited security threats to sell their product and play political games, and now their rivals are rubbing it in their faces.


> This is obviously political and the entire narrative is fabrication.

I agree with this

> David Sacks is publicly gloating about it: https://x.com/DavidSacks/status/2065853007619588171

I do no like David Sacks but how do you say this is gloating about it?

Again, I do believe this is political, but Sacks is saying "you said this is dangerous and wanted regulation, and we believe you. Fix this because it's dangerous and we'll let it out again".

How is this gloating?


> How is this gloating?

he is emphasizing that they used their own words against them. everyone knows the security threat is a pretext. the message is that he is smart and they are stupid and he won, which is what I call gloating.

> "Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong."

an obvious lie, which is inserted to emphasize that it is a lie. when you purposefully lie, not to deceive, but with the intent that the counter-party knows you are lying and must accept the lie, that is an assertion of power.


"everyone knows the security threat is a pretext." On what planet? Anthropic itself made a big stink about Mythos being able to hack every app out there, and very dangerous as a result. Many reports have confirmed this.

Sacks works for the government now. Everything is political. This happening on the day SpaceX IPO’d? It’s a flex, and a message.

what's the flex? what's the message? how does it relate to the IPO?

It is gloating in the context of it being the exact same form of dangerous as all the other frontier models out there?

anthropic would see a crazy boost to its ipo for releasing " so good that we had to ban it" model .

i dont see how it effects them negatively at all given their opus models are already on par or exceed any other model out there.


I wouldn't use anything but open weights models for developing open source software. This is just training OpenAI and Anthropic to steal your work for their proprietary models.

If your license is open then Anthropic and OpenAI are using your work anyway.

the user prompts and harness used for development are much more valuable for training than the final source code.

my approach to open source development with AI now is to include all of the agent sessions used in development in the repository, which makes this data freely available for training for both proprietary and open weights models, but that is just my own approach. every open source developer ultimately has to make their own judgement on the best way to integrate AI in accordance with their values.


The crux of open source: per definition it is opened for the public to use it.

I see it as a chance. Many OS projects themselves offer LLM readable websites, their docs.

This way the project at least not only gets ingested but receives referential treatment.

Some sort of collaboration. Ingested it will be, anyway.


> I see it as a chance

absolutely. AI is the same as any other software, and open source has to integrate, adapt, and lead to make sure that open source values continue to propagate.

my personal approach is to focus on developing with open weights models, so that my work is optimized for them, and leads to their development. proprietary labs are free to copy, but they have a structural cost disadvantage. my objective is that open weights models remain competitive on capability but lead on capability/cost.


corporations and governments fund most linux development. for hardware companies software cost is a tax that decreases their revenue and profit, so Nvidia and AMD have strong incentives to support open source models, which they are, very actively.

The key thing here is that effective intelligence = model capability / cost. If you drive down the cost of inference you can have higher effective capability even with a technically less capable model. There is nothing in Anthropic/OpenAIs general reasoning capabilities that can't be easily done much better with a purpose built harness for a domain specific task.

> Fable was the strongest model on the market

based on Anthropic's own self promotion. no reason to think that Chinese models are not just as good or better. the key thing here is training on machine code and dis-assembled binaries and the Chinese have a complete data set of pirated software, with no limitations on how they use it. I seriously doubt they are actually behind.

> only if you're not a US citizen, but in practice, even if you are

the issue here is that Anthropic needs a legal opinion that their mechanisms for detecting foreign users in the US are compliant, which is technically hard to do, and a complex intersection of technical details and national security law, so getting a legal opinion can't happen overnight. it will be back.


i have been using deepseek-v4-flash since it came out. i use a highly structured harness and spec/test driven workflow running through opencode, and so far there has been nothing it can't do.

i have run through a bunch of tests: re-writing vvenc with assembly kernels, creating the first generation agent harness integration with opencode, porting TS npm modules to C++, porting an entire TS server app to C++, creating a new pure io_uring http server with zero-copy (325K RPS single core), creating a second generation agent from the ground up in C++, setting up a dev environment for custom kernel development on tenstorrent accelerators using tt-metal and ttsim.

i consistently get 98.5% input cache hit ratio. i do see noticeable degradation in performance in the 400-500K context range, so i always try to wrap up sessions by 500K max.

a non-intuitive thing is that the model is very good at low-level systems engineering. i suspect this is because they are internally using it to port their stack to huawei hardware. it can churn out exceptionally complex low level C++ stuff that blows your mind, and then completely choke and run in circles on other seemingly simple tasks.

i only use flash and not pro because i want my tooling to be portable to open weights models that are practical to run. i use deepseek platform and not the open weights models for development, because it is subsidized, and based on observation, i think it is highly likely that they are running some proprietary features on the platform which are not in the open weights model.

it will be very interesting to see what their next point release looks like. the compounding effect of optimizing inference cost and then feeding back inference into training should lead to rapid and accelerating improvement, but only time will tell.


Thanks for the details. What's a second generation agent?

You mentioned the workflow is heavy on specs and tests. The smaller models seem to be really good at following instructions now. (Well, some of them!)

So that's probably part of why you're seeing good results. It has a very clear target.

Whereas with more open ended instructions they seem to struggle more. I think common sense is the main thing you get with model size.

When I'm working with the big models I feel like I don't have to spell things out so much. The gap is closing, but I'm assuming there is some fundamental limit there based on the size.

Of course the ideal would be Mythos, running for free, in my house, at 1,000 tok/s ;) Someday...


> What's a second generation agent?

i meant that i initially developed an agent harness as a set of skills integrated with opencode and now i am in the process of using that to write a new agent from scratch to replace opencode.

> probably part of why you're seeing good results

yes. i think tests and setting up feedback loops for diagnosing errors (logs, debugging, etc) are the most important things. in my experience deepseek-v4-flash tends to ignore instructions to use these tools and default to churning through code and guessing the cause of errors, which is often wrong, so it requires occasionally stepping in when it has been grinding fruitlessly for a while and reminding it, probably due to context length and sparse attention forgetting instructions that are put in context at the beginning of a session.


Thank you a lot for such an insightful comment. The low level stuff part, including porting entire codebases using DV4Flash came as a genuine surprise to me. I did not expected it to be this good.

When you say "i use a highly structured harness" ... can you please tell me what is it exactly?



Thanks..

> glorified autocomplete machine

It is a next token prediction function and it is important to understand the technology accurately based on what it actually is.

What is unique about a next token prediction function though is that every computer program is just a string of instructions. At the theoretical limit a next token prediction function can generate the entire instruction stream (boot loader, OS, application) so a next token prediction function can theoretically generate any computer program, which means that it is a universal predictor for anything that a computer can simulate. Still not AGI/ASI in the woo-woo non-technical interpretations of those terms, but incredibly powerful.


What you’re saying is correct if the model is trained with all the knowledge humanity had, has and ever produce. But at the moment the next token prediction is quite limited to the training data.

Things could change if the model supports re-inforced leaning. That way the LLM would change the weights in real time based on a feedback loop, but again that could vastly improve the quality of the token prediction or completely degrade it as well


The distinction I would make here is that computer code is logical transformations on arbitrary data, not the actual data itself. An LLM can learn the entire space of logical transformation patterns from existing code, and can hallucinate new logical transformations, using a computer as a validator for the logic, so an LLM can create new logic as well as repeat existing patterns, and that logic can be applied to novel input data that the LLM has never seen before.

That’s not how LLMs work at the moment as far as I understand. LLM would not hallucinate any new logical transformation, rather just predict a transformation from its training data.

I understand that there can be many different combinations for all the logical transformations in the training data. But still the number of combinations are finite and I would assume that large number of those combinations would not result in any meaningful outcome.

Best outcome is that it just predicts a new pattern we haven’t discovered (LLM randomly connected the correct dots) one example is protein folding.


> If you deploy 10x faster, than me as business owner need less of you for the same amount of work

An important consideration here is that velocity is not zero sum. If you are delivering in weeks what used to take months you are creating an entirely new realm of what is possible to do with software within a corporation.

In the real world, I have never worked for a company that doesn't have a huge backlog (either tracked or in engineers heads) of work that would never be done because it wasn't economic under the old model. This tends to apply to the internal work of engineers (developer tooling, infrastructure, tech debt, etc) more than anything else. 10X faster doesn't necessarily mean shipping 10X product code. You can use that productivity boost to accelerate prototyping, ship betas faster, move the iteration loop faster, all while shipping higher quality code with less tech debt and having the time to continuously improve the engineering side of things that the business never sees.


Most companies aren't software houses.

If you fulfill your delivery contract in half the time, great for me, you now need track down another customer.

Or put in another way, an agency now only needs a third of previous team sizes to deliver the same amount of work.

The other two thirds might be lucky to have another project assigned, or get to seat on the bench, and depending on the world region (offshoring shops) get their salary halved, before being fired if seating too long on the bench.


> people are driven by different things

This is important to understand. I have been coding since I was 11 when I got my first C64, and it is a genuine passion for me, but I also love working with LLM tooling.

One of the biggest things for me is that after decades of sitting in front of a computer I have chronic back and wrist pain that makes it impossible to do the long deep focus sessions that were normal when I was younger. Using AI tooling to handle all of the procedural tasks (running tests, debugging, managing git, etc) dramatically reduces the physical strain of programming, and allows for a much healthier workflow, with regular short breaks.


That's awesome, I'm happy for you finding such great value in it!

Not sure if it's important or not, but for the sake of OP's discussion I note that your value is not necessarily tied to "speed of execution".


Calling it AI is where the problem lies. An LLM is not AI. It is a next token prediction function. That is a very powerful function but just one function out of millions in the overall stack. As an engineer you still have to have the right framework to call that function with the right inputs at the right places and validate the results. But if you focus on the technical details and not the marketing hype you can get amazing results in the areas where it works.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: