Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I have no Surrogate Range CPs in a string, it is far easier to work with UTF-16 than UTF-8 at the byte level, because all chars are constant size. For UTF-8 that only applies to ASCII. And SRs characters are extraordinarily rare, while non-ASCII chars are extremely common. So my programs ensure at the entry points the string is UCS-2 compatible, and then all subsequent string manipulations are far less complex to handle than with UTF-8.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: