This is the standard now for astroturfing online. Build up a profile over time with varied interactions, sometimes over years, and then sell it for a few hundred dollars via blackhatworld. I've not seen hn listed but reddit definitely follows this pattern.
If you think the IPs are normal, you can check if people are proxying by looking at DNS connecting IP (they may not have proxied UDP), SIMD score (server CPUs cluster differently to consumer), residential proxy lists (there are a bunch of these), invalid webgpu setups, etc. Maybe this kind of detection is against HN way of doing things but I've definitely seen recaptcha on the login before and it employs a bunch of these checks. Happy to help!
Adding to this some of those proxied connections will be HTTP/1.1 and not HTTP/2.0 like normal clients. Sometimes the MSS of their TCP SYN packets will be just a little lower than 1460. Some of them are also missing the client header for sec-fetch-mode. Blocking HTTP/1.1 to the non API port/url should slow down some of the nonsense. Many API clients still use HTTP/1.1.
In NGinx as an example in the Location for the non-API url:
So far its cost me $2.27 to submit a contact form 3 times - why is this better than a captcha solver with human solves at 1000 per $2?
On your automation, your tool fed back to me as follows after 3 submissions:
> The CAPTCHA is persistently blocking now — Prosopo's widget appears to have flagged the session/IP due to the repeated submissions. The checkbox won't reset this time. This is expected behavior from their bot protection product. To submit again, you'd likely need to wait a while for the rate limit to clear, or submit manually from your own browser.
The cost is AI cost for using the agent - not captcha cost. Usually, you would write the project and then call it via API - instead of asking the agent to do the action more than 1 time. Considering using the web task API for this use case.
In combination with other signals JA4s are useful. You learn to spot obviously incorrect ones because Chrome always looks different from Safari which looks different to Firefox. Captcha solvers have their own unique JA4s based on whatever scripting language they're using (pyhton / rust / node). As another commentor pointed out, browsers have unique sets of headers like priority, DNT. So yes, it won't stop dedicated attackers but it is worth implementing as a coarse filter.
If someone invests time/money in using a captcha solver, they're already dedicated enough and will easily get around a JA4 signature block.
Maybe there's some one-off exercise where this is useful, but it's very rare and I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge.
It's not hard to setup JA4 monitoring and I think its valid as a coarse filter. There are various plugins for nginx/node.
> I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge
You just store the ja4 on requests and build a catalogue of known JA4s over time using statistics. Outlier JA4s you treat with suspicion by default and challenge. It shouldn't be manual.
> If someone invests time/money in using a captcha solver, they're already dedicated enough and will easily get around a JA4 signature block.
Obviously, not for the regular user but captcha solvers are also blockable:
- proxy detection
- detection by running DNS server and capturing real IP over UDP request
- abnormal TLS handshake latency
- repeat behaviour at scale
- rendering captcha on a fake origin instead of in the real page
At the time, reCAPTCHA was the alternative and it was effectively working as a giant ad targeting data collection tool. I'm pretty sure Google have now back tracked from this.
WebGL finger printing is just one of many things you need to do if you actually want to stop automation. There is no way round it other than requiring ID of some sort.
I'm no CF advocate but those random APIs are literally what differentiates people running Chrome on their computer versus a bot operation with a load of containers. Kubertnetes clusters don't have GPUs. This is why it's used in bot detection (I use brave with no hardware acceleration and I'm captcha everywhere)
- behavioural fingerprinting
- ja4
- IP rep
- queue mechanism
- card country to IP country checks
- app attestation
- custom metrics based on knowledge of past scalpers
It's hard but it's not impossible. You can make it very inconvenient for scalpers. They need to poll at volume so their behaviour is very much detectable. A hard stance is required on IP rep, especially for more in demand concerts.
It's either that or you tie tickets to government ID like in France. If the arbitrage opportunity is more than the cost of automation then someone will exploit it.
reply