Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The original model, aside from its programming mistakes, also misremembered the doubling formula. I hoped to see that solved, which it was, as well as maybe a more general performance boost from recovering some distillation loss.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: