My back of the napkin math says there should be a 40 byte overhead for wireguard around tailscale 1280 byte packets. That's only about a 3% overhead on the direct wire. What is your testing methodology so I can attempt to replicate it in the lab?
I meant overhead in a broad sense - both packet size and CPU load combined - what end user actually care about.
My test is what I have to do fairly often: use Windows Explorer to copy 70-100gb file
from a network NAS to a local drive. Every so often I click on the wrong network share
pinned in the Explorer and see slow transfer speed.