Thanks! I don't have much experience with diffusion models, but technically any ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

anuarsh 8 months ago | parent | context | favorite | on: Show HN: Run Qwen3-Next-80B on 8GB GPU at 1tok/2s ...

Thanks! I don't have much experience with diffusion models, but technically any multi-layer model could benefit from loading weights one by one

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact