More

arbol · 2026-06-13T17:01:35 1781370095

AI generated drivel

arbol · 2026-06-09T11:19:55 1781003995

> coming from normal-user IP addresses

This is the standard now for astroturfing online. Build up a profile over time with varied interactions, sometimes over years, and then sell it for a few hundred dollars via blackhatworld. I've not seen hn listed but reddit definitely follows this pattern.

If you think the IPs are normal, you can check if people are proxying by looking at DNS connecting IP (they may not have proxied UDP), SIMD score (server CPUs cluster differently to consumer), residential proxy lists (there are a bunch of these), invalid webgpu setups, etc. Maybe this kind of detection is against HN way of doing things but I've definitely seen recaptcha on the login before and it employs a bunch of these checks. Happy to help!

Bender · 2026-06-09T14:01:18 1781013678

Adding to this some of those proxied connections will be HTTP/1.1 and not HTTP/2.0 like normal clients. Sometimes the MSS of their TCP SYN packets will be just a little lower than 1460. Some of them are also missing the client header for sec-fetch-mode. Blocking HTTP/1.1 to the non API port/url should slow down some of the nonsense. Many API clients still use HTTP/1.1.

In NGinx as an example in the Location for the non-API url:

    if ($server_protocol != HTTP/2.0) { return 403 'Browser Error.'; }

    if ($http_sec_fetch_mode !~ (cors|no-cors|navigate) ) { return 403 'Error: Flux Capacitor Under-Current.'; }

arbol · 2026-06-08T16:16:30 1780935390

So far its cost me $2.27 to submit a contact form 3 times - why is this better than a captcha solver with human solves at 1000 per $2?

On your automation, your tool fed back to me as follows after 3 submissions:

> The CAPTCHA is persistently blocking now — Prosopo's widget appears to have flagged the session/IP due to the repeated submissions. The checkbox won't reset this time. This is expected behavior from their bot protection product. To submit again, you'd likely need to wait a while for the rate limit to clear, or submit manually from your own browser.

fkilaiwi · 2026-06-08T16:29:19 1780936159

The cost is AI cost for using the agent - not captcha cost. Usually, you would write the project and then call it via API - instead of asking the agent to do the action more than 1 time. Considering using the web task API for this use case.

arbol · 2026-06-08T13:50:45 1780926645

In combination with other signals JA4s are useful. You learn to spot obviously incorrect ones because Chrome always looks different from Safari which looks different to Firefox. Captcha solvers have their own unique JA4s based on whatever scripting language they're using (pyhton / rust / node). As another commentor pointed out, browsers have unique sets of headers like priority, DNT. So yes, it won't stop dedicated attackers but it is worth implementing as a coarse filter.

mmarian · 2026-06-08T14:09:17 1780927757

If someone invests time/money in using a captcha solver, they're already dedicated enough and will easily get around a JA4 signature block.

Maybe there's some one-off exercise where this is useful, but it's very rare and I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge.

arbol · 2026-06-08T15:29:07 1780932547

It's not hard to setup JA4 monitoring and I think its valid as a coarse filter. There are various plugins for nginx/node.

> I've seen people waste so much time with the whack a mole JA4 block just because they like the intellectual challenge

You just store the ja4 on requests and build a catalogue of known JA4s over time using statistics. Outlier JA4s you treat with suspicion by default and challenge. It shouldn't be manual.

> If someone invests time/money in using a captcha solver, they're already dedicated enough and will easily get around a JA4 signature block.

Obviously, not for the regular user but captcha solvers are also blockable: - proxy detection - detection by running DNS server and capturing real IP over UDP request - abnormal TLS handshake latency - repeat behaviour at scale - rendering captcha on a fake origin instead of in the real page

arbol · 2026-06-03T18:50:53 1780512653

Is it not just a case of most of their clients being US based?

arbol · 2026-06-01T09:33:47 1780306427

At the time, reCAPTCHA was the alternative and it was effectively working as a giant ad targeting data collection tool. I'm pretty sure Google have now back tracked from this.

WebGL finger printing is just one of many things you need to do if you actually want to stop automation. There is no way round it other than requiring ID of some sort.

arbol · 2026-05-31T20:14:41 1780258481

You literally can't get rid of it without introducing government issued ID to buy any scarce freely accessible items

raincole · 2026-05-31T21:00:20 1780261220

Which is why it's very likely to happen, especially in the EU.

arbol · 2026-05-31T20:12:58 1780258378

I'm no CF advocate but those random APIs are literally what differentiates people running Chrome on their computer versus a bot operation with a load of containers. Kubertnetes clusters don't have GPUs. This is why it's used in bot detection (I use brave with no hardware acceleration and I'm captcha everywhere)

arbol · 2026-05-31T20:09:09 1780258149

Yeah, this doesn't even begin to cut it

arbol · 2026-05-31T20:08:34 1780258114

- behavioural fingerprinting - ja4 - IP rep - queue mechanism - card country to IP country checks - app attestation - custom metrics based on knowledge of past scalpers

It's hard but it's not impossible. You can make it very inconvenient for scalpers. They need to poll at volume so their behaviour is very much detectable. A hard stance is required on IP rep, especially for more in demand concerts.

Wowfunhappy · 2026-05-31T22:31:39 1780266699

I don't now, a lot of this seems just as invasive as WebGL fingerprinting, if not more invasive.

arbol · 2026-06-01T07:13:15 1780297995

It's either that or you tie tickets to government ID like in France. If the arbitrage opportunity is more than the cost of automation then someone will exploit it.