SG-1000 Very Poor Performance
-
Hmm, I've potentially fixed it. Will monitor over the next few days to make sure it holds up.
I (of course) have Disable TSO set in Advanced->Networking. However, it seems that there is a default system tunable that sets net.inet.tcp.tso to 1, ignoring this value (and any value you may set in /boot/loader.conf.local as per the tuning guide). Set to 0 and verified using sysctl on the command line. I'm really hoping this kills the last of my issues with the pfsense install on this end too (on a massively overpowered 2x X5650 HP server that still acts like it's hitting bottlenecks sometimes).
Seems to me like this is not the proper intention, and that this should be taken out of future versions, or at least documented to keep folks like me from pulling their hair out too much :P
-
@mkernalcon Looks like you can safely ignore the tunable:
https://forum.netgate.com/topic/106131/disable-hardware-tcp-segmentation-offload -
@thenarc ...well that's confusing. Where exactly can I verify TSO per-interface then?
And regardless of theory, in practice on this SG-1000, with Disable TSO checked, and the tunables at their default state, I get about 5x worse throughput on speedtest than if I change the value of that tunable (or use sysctl to do it). It's an instant difference too. Maybe it's something weird about the cpsw interfaces?
-
@mkernalcon That is certainly interesting. I have no experience with the SG-1000 so I can't really comment on it specifically. I just thought that other posting might be relevant if you were just wondering whether you needed to set that tunable. But if your testing has unequivocally shown that you get poor performance until you set it to 0, then I can't explain it in the context of that other post. It also claims that the tunable flips back to a 1 on a reboot, which will be a problem for you if that's the only workaround you have. Maybe you could use the shellcmd package to set it back to 0 after every reboot. Note that I don't know enough to say whether that's a good idea, and hopefully someone who knows more about this will stop by :)
-
Alright, this one is still most definitely unsolved.
Today I ran all the same tests as yesterday, and it now is performing a bit better with TSO enabled. However, it does seem like that's not the whole story, like there's another more important factor governing these slowdowns (running tests to the same speedtest server give VASTLY variable results even with no change - like anywhere from less than 1Mbit/s to over 20, ping varies from about 80ms to over 2 seconds.
Where else can I look for this issue?
-
@mkernalcon When connecting from behind the SG-1000, is he double-NATed? Or is DHCP turned off on the SG-1000? I have a Technicolor modem/router as well that I've put in bridged mode. It's not the same modem as his, but hopefully his can be put in bridged mode as well. I don't know if that would be acceptable for his setup, since then everything would need to connect via the SG-1000, but it may be another test worth running. Note that, while generally speaking double-NATing isn't an ideal situation, I don't have any specific theory for how it could be causing the issues you're observing. It's just another thought for something to try.
-
@thenarc Yes, his setup is a double NAT (modem gives 192.168.0.0/24, sg-1000 gives 192.168.28.0/24, and 192.168.0.0/24 shows up nowhere else in my entire setup.
And unfortunately I seem to not have access to the modem configuration - plus I don't want to tank his other connections because this box isn't routing right - oh and the modem is his only wireless access point. So no, I don't have the ability to de-NAT the WAN interface unfortunately. I have a rule to allow all ipv4 traffic with source 192.168.0.0/24 on WAN. I can't really imagine why the double-NAT would cause a performance issue, so hopefully it's unrelated to that.
-
So you are using the wifi on the upstream router? That's hardly ideal. (Yes it's an upstream router+modem, since it has wifi). So if a wireless client decides to pull 50Mbps, then the SG1000 is going to have to deal with a drop in it's WAN bandwidth. That situation could easily be causing you problems, and the solution is to get another WAP and disable the one on the upstream router. What else is running off that upstream router?
As for the 100Mbps connection, thats basically the effective limits of the SG1000. I wouldn't expect much more than that, especially if you're trying to run other services such as OpenVPN as well.
Lastly, the double-NAT also isn't an ideal scenario, especially for VoIP. For OpenVPN it should be OK though.
-
Yes, he's using the upstream wifi, but pretty much when he's working (i.e. when he expects reasonable performance on the SG-1000 side), there is little to no usage of the upstream side. I'll talk to him about that, but I don't expect that's the problem.
Although I'd be extremely happy with 100mbps out of this thing, I don't need that (and I don't expect that - this is a big reason I don't want to put him entirely behind this router). However, I expect it to be able to route better than a few mbps, and it is not doing that consistently.
Double-NAT is never ideal, I know that. If it helps, the phone is connected through the openvpn tunnel (it connects using H.323 to the IP Office that is sitting local here, using its local address which routes through the tunnel - the phone has no knowledge of the vpn, and doesn't hit the internet at all).
-
Here's another symptom: bad latency to the modem (again, there is a direct connection here over about 4' of cat5e. I expect pings less than 1ms, but here's my statistics: round-trip min/avg/max/stddev = 1.145/2.539/53.022/3.831 ms. Frequent 10+ms pings in this.
Over the same time period, a ping to a host on the LAN side (through a gigabit switch) gives: round-trip min/avg/max/stddev = 0.229/0.412/9.816/0.888 ms - it was only one singular ping that ventured above 0.7, almost all were about 0.3
I'm beginning to think the modem is doing something very strange to cause these issues.
-
@mkernalcon I can't tell with 100% certainty, but I'm pretty sure the DPC3848VE is a Puma 6 modem. https://badmodems.com/Forum/app.php/badmodems Note the Cisco 3848V entry, and it seems that Cisco sold this model to Tecnicolor. This thread further suggests that the Technicolor 3848VE is also a Puma 6 modem. If so, it's hot garbage and he needs to get his ISP to replace it. He can run a test too: http://www.dslreports.com/tools/puma6
-
I apologise for my earlier post - it came across very snarky! I understand you're working with what you've got so far. Definitely pushing everything through the SG1000 is the most-controlled - thus the most ideal - scenario.
Don't expect too much accuracy below 1ms timings. ~1 is fine. Don't sweat the small stuff, it's likely variances in timings and interrupts. It does sound like that upstream router/modem should [at minimum] be replaced.
-
Thanks @TheNarc , I had a hunch that the modem was doing something nasty - I just never think about leased hardware being bad from design. Glad to know I wasn't barking up the wrong tree on this side. Advised to get a replacement from his provider or consider buying his own.
And yes, he has always had mild connection issues - he noticed occasional poor call quality on the phone even when it was SW->IPSEC->SW (behind the same modem) before I touched anything, but the sg-1000 seems to have much more trouble with it.
And @moikerz - no worries; you'd probably yell at me worse if I told you how many VLANS I am successfully running off of how small a LAGG on the main office LAN (and that's probably not even my worst sin on this setup). Turns out that the IT culture at a construction company doesn't ask for much and is willing to put up with a lot. Of course what they do ask for tends to be either trivially easy or impossible shy of some hacks. Try telling some of the hard-hat side of the company that they are behind too many layers of NAT and let me know how that goes :P
And the only reason I mentioned the pings is because I saw SEVERAL over 10ms pings over a single cable, I do recognize that ping is a very poor indicator of connection quality.