The secret is to not run out of quota. Instead have Claude know when to offload ...

BuildTheRobots · 2026-02-05T08:12:43 1770279163

I don't suppose you could point to any resources on where I could get started. I have a M2 with 64gb of unified memory and it'd be nice to make it work rather than burning Github credits.

EagnaIonat · 2026-02-05T08:36:15 1770280575

https://ollama.com

Although I'm starting to like LMStudio more, as it has more features that Ollama is missing.

https://lmstudio.ai

You can then get Claude to create the MCP server to talk to either. Then a CLAUDE.md that tells it to read the models you have downloaded, determine their use and when to offload. Claude will make all that for you as well.

shen · 2026-02-05T16:39:15 1770309555

Which local models are you using for the 32gb MacBooks?

EagnaIonat · 2026-02-06T04:23:57 1770351837

Mainly gpt-oss-20b as the thinking mode is really good. I occasionally use granite4 as it is a very fast model. But any 4GB model should easily be used.

eek2121 · 2026-02-05T15:58:05 1770307085

LM Studio is fantastic for playing with local models.

kilroy123 · 2026-02-05T13:00:48 1770296448

I strongly think you're on to something here. I wish Apple would invest heavily in something like this.

The big powerful models think about tasks, then offload some stuff to a drastically cheaper cloud model or the model running on your hardware.