Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anna’s archive has files specifically for training LLMs. But I’d guess the big players secured their share beforehand, by scraping those sites. I have zero proof, it’s just a guess.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: