Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, for multiplication complexity is defined in terms of on the number of digits/bits digits directly. For attention, complexity is defined on terms of the number of input vectors which are all at fixed precision. I don't understand what happens to the method proposed in the paper at higher precision (since I don't understand the paper), but in reality in doesn't matter since there is no value in anything over float16 for machine learning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: