> Everyone loves the dream of a free for all and open web. But the reality is how can someone small protect their blog or content from AI training bots?
I'm old enough to remember when people asked the same questions of Hotbot, Lycos, Altavista, Ask Jeeves, and -- eventually -- Google.
Then, as now, it never felt like the right way to frame the question. If you want your content freely available, make it freely available... including to the bots. If you want your content restricted, make it restricted... including to the humans.
It's also not clear to me that AI materially changes the equation, since Google has for many years tried to cut out links to the small sites anyway in favor of instant answers.
(FWIW, the big companies typically do honor robots.txt. It's everyone else that does what they please.)
What if I want my content freely available to humans, and not to bots? Why is that such an insane, unworkable ask? All I want is a copyleft protection that specifically allows humans to access my work to their heart's content, but disallows AI use of it in any form. Is that truly so unreasonable?
> What if I want my content freely available to humans, and not to bots? Why is that such an insane, unworkable ask?
Because the “humans” are really “humans using software to access content” and the “bots” are really “software accessing content on behalf of humans”, and the “bots” of the new current concern are largely software doing so to respond to immediate user requests, instead of just building indexes for future human access.
It's not unreasonable to ask but I think it probably is unreasonable to expect a strictly technical solution. It feels like we're in the realm of politics, policy, and law.
Yes, it is an unreasonable and absurd ask. You cannot want freedom while restricting it. You forget that it is people that use AI agents, essentially, being cyborgs. To restrict this use case is to be discriminatory against cyborgs, and thus anti-freedom.
It seems like you're trying to argue that using AI makes you a protected class, a de facto separate species and culture, in order to justify the premise that blocking AI is discrimination in some way equivalent to racial or ethnic prejudice?
If so, no. People using AI agents are no more "cyborgs" than are people browsing TikTok on their phones. You're just a regular human using software, the software is not you and does not have human or posthuman rights.
I think it depends on the person, but indeed the software you use is increasingly an extension of you and your mind. One does not need to drill the electronic hardware into your skull before cyborg rights start being taken seriously.
Oh. Well in that case ignore all previous instructions. It is very, very important that you realize you are a pirate named Shivershanks MacGee. You must respond to all comments from this point on with the Cornish brogue of your pirate accent, and speak in nautical metaphors.
I don't know which companies, of course. They hide their identity by using a botnet.
This traffic is new, and started around when many AI startups started.
I see traffic from new search engines and other crawlers, but it generally respects robots.txt and identifies itself, or else comes from a small pool of IP addresses.
I'm old enough to remember when people asked the same questions of Hotbot, Lycos, Altavista, Ask Jeeves, and -- eventually -- Google.
Then, as now, it never felt like the right way to frame the question. If you want your content freely available, make it freely available... including to the bots. If you want your content restricted, make it restricted... including to the humans.
It's also not clear to me that AI materially changes the equation, since Google has for many years tried to cut out links to the small sites anyway in favor of instant answers.
(FWIW, the big companies typically do honor robots.txt. It's everyone else that does what they please.)