I really appreciate you giving a shit. Not sarcastically -- it seems like you're actually doing everything right, and it makes a difference.
Gating robots.txt might be a mistake, but it also might be a quick way to deal with crawlers who mine robots.txt for pages that are more interesting. It's also a page that's never visited by humans. So if you make it a tarpit, you both refuse to give the bot more information and slow it down.
It's crap that it's affecting your work, but a website owner isn't likely to care about the distinction when they're pissed off at having to deal with bad actors that they should never have to care about.
> It's also a page that's never visited by humans.
Never is a strong word. I have definitely visited robots.txt of various websites for a variety of random reasons.
- remembering the format
- seeing what they might have tried to "hide"
- using it like a site's directory
- testing if the website is working if their main dashboard/index is offline
Yes. I have checked many checkboxes that say "Verify You Are a Human" and they have always confirmed that I am.
In fairness, however, my daughters ask me that question all the time and it is possible that the verification checkboxes are lying to me as part of some grand conspiracy to make me think I am a human when I am not.
I usually hit robots.txt when I want to make fetch requests to a domain from the console without running into CORS or CSP issues. Since it's just a static file, there's no client-side code interfering, which makes it nice for testing. If you're hunting for vulnerabilities it's also worth probing (especially with crawler UAs), since it can leak hidden endpoints or framework-specific paths that devs didn't expect anyone to notice.
Gating robots.txt might be a mistake, but it also might be a quick way to deal with crawlers who mine robots.txt for pages that are more interesting. It's also a page that's never visited by humans. So if you make it a tarpit, you both refuse to give the bot more information and slow it down.
It's crap that it's affecting your work, but a website owner isn't likely to care about the distinction when they're pissed off at having to deal with bad actors that they should never have to care about.