UPG 2.1 -> 2.1.1.: extremely high latency & pakage loss Intel IGB
-
Why you run 435GB disk??? :D
Because yes we can :P
( ;D)
It's a WD disk, WIFE picked it. She found out it used about the same energy as an SSD and was way way cheaper than an SSD. Not for the life of it do I dare to dispute WIFE (whom I dearly love btw, for more than two decades), so now I have a 500 GB disk which is filled for 0% ;D ;D ;D
To return the question: looking at the pic in your sig: why you own the internets :o
( ;D)
-
HAHAHAHAHAHAHHAA :D
-
@Hollander:
Why you run 435GB disk??? :D
Because yes we can :P
( ;D)
It's a WD disk, WIFE picked it. She found out it used about the same energy as an SSD and was way way cheaper than an SSD. Not for the life of it do I dare to dispute WIFE (whom I dearly love btw, for more than two decades), so now I have a 500 GB disk which is filled for 0% ;D ;D ;D
To return the question: looking at the pic in your sig: why you own the internets :o
( ;D)
I too have the same drive in my firewall. I knew it's way overkill but price difference of $10 from 250gigs to 500gigs was no brainer. Eventually I'll get a SSD for it but for now it works.
-
Ugh, that's a PRO/1000 VT NIC. Those things are horrible. I actually lit one on fire once because I couldn't get it to work correctly in vSphere.
EDIT: Before anyone asks, I was angry and I do strange things when sleep-deprived.
-
Ugh, that's a PRO/1000 VT NIC. Those things are horrible. I actually lit one on fire once because I couldn't get it to work correctly in vSphere.
EDIT: Before anyone asks, I was angry and I do strange things when sleep-deprived.
I finally got it to work in my Dell, and I am now testing it in my mini-ITX. So far so good. But I can still return it if need be. I don't think I will do anything vSphere, my boxes are dedicated to pfSense. But of course, given the horrors to get this card to work: am I to expect more troubles with this card? Because I can still return it. But what then? The IBM Intel quad I had even crashed the Dell before pfSense booted, so that was no choice either.
I got these Dell's used, and new ones official Intel NIC's are 300 EUR or more easily. Which is too expensive for my budget. If you could recommend me something better than I have now then very please do, I dont'want to run into new problems (if any) in a month or so, when I no longer can send them back :D
-
The problems I've had is specific to the VT NICs, no issues here with other Intel parts.
-
If I may disturb you all one more time ;D
I moved the Dell YT674 Intel quad NIC to my first machine, my Intel mini-ITX. I am experiencing rather high RTT's (from the dashboard GUI), in the area of around 130-150 ms, but fluctuating back to no lower than around 40 ms for VDSL, and 27 ms for cable.
The problem is: there are so many things running at the time that it is hard to pinpoint what the cause might be. Is it a (tweakable?) problem with this Dell nic, or is it a general (tweakable) pfSense problem, or is it the fault of my ISP's? I am running out of ideas on how to find out what is going on.
I do know that before all the advanced features I have now (OpenVPN to PIA, Traffic Shaper, VLAN's, Radius Enterprise (certificates), Snort) my dual WAN showed reasonable RTT of around 20 for VDSL and around 7 for cable.
So I started over:
- Fresh reinstall of pfSense 2.1.2. No tweaking except for the ones mentioned above in /boot/loader.conf.local.
#for intel nic kern.ipc.nmbclusters="131072" hw.igb.num_queues=1 hw.igb.rxd=4096 hw.igb.txd=4096
- Setup single WAN and LAN with my two internal Intel mobo-nics.
- Setup dual WAN utilizing the first port of the Dell/Intel quad nic.
- Install the other packages, dual WAN failover, PIA VPN, Traffic Shaper.
Experience the high RTT. Next:
- Disable traffic shaper. No result.
- Try different combinations of putting the WAN- and LAN-cables into the fixed mobo-nic's and the Dell nic; no result.
I am wondering what I should do next :-[
A traceroute to google.com from within Windows 7 shows:
[code]
Tracing route to google.com [173.194.70.138]
over a maximum of 30 hops:1 90 ms 78 ms 69 ms x.x-x-x.adsl-dyn.isp.belgacom.be [x.x.x.x]
2 69 ms 30 ms 30 ms lag-71-100.iarmar2.isp.belgacom.be [91.183.241.208]
3 48 ms 35 ms 64 ms ae-25-1000.iarstr2.isp.belgacom.be [91.183.246.108]
4 * * * Request timed out.
5 86 ms 103 ms 66 ms 94.102.160.3
6 49 ms 41 ms 62 ms 94.102.162.204
7 133 ms 42 ms 71 ms 74.125.50.21
8 89 ms 80 ms 73 ms 209.85.244.184
9 48 ms 41 ms 65 ms 209.85.253.94
10 106 ms 100 ms 81 ms 209.85.246.152
11 91 ms 97 ms 92 ms 209.85.240.143
12 116 ms 112 ms 85 ms 209.85.254.118
13 * * * Request timed out.
14 59 ms 52 ms 47 ms fa-in-f138.1e100.net [173.194.70.138]The same one from within the pfSense CLI shows:
traceroute google.com traceroute: Warning: google.com has multiple addresses; using 173.194.112.230 traceroute to google.com (173.194.112.230), 64 hops max, 52 byte packets 1 10.192.1.1 (10.192.1.1) 43.377 ms 46.749 ms 44.185 ms 2 hosted.by.leaseweb.com (37.x.x.x) 46.697 ms 45.143 ms 50.235 ms 3 46.x.x.x (x.x.x.x) 43.744 ms 48.999 ms xe-1-1-3.peering-inx.fra.leaseweb.net (46.x.x.x) 65.264 ms 4 de-cix10.net.google.com (80.81.192.108) 111.169 ms 106.334 ms google.dus.ecix.net (194.146.118.88) 53.519 ms 5 209.85.251.150 (209.85.251.150) 59.617 ms 209.85.240.64 (209.85.240.64) 51.161 ms 64.431 ms 6 209.85.242.51 (209.85.242.51) 71.756 ms 56.882 ms 62.243 ms 7 fra02s18-in-f6.1e100.net (173.194.112.230) 78.484 ms 70.885 ms 69.981 ms
(For some strange reason this appears to be going through PIA/OpenVPN. Although, in the LAN firewall rules, I only arranged for one server on my LAN to go through VPN, that server not being my pfSense).
So I am lost :-[
[b]Most important for me is to know if this could be a problem with my Dell/Intel Quad NIC (as I can still return it to the shop now), or if this is another problem.
Would anybody be able (and willing ;D) to suggest me some next steps to find out the root cause of this problem? Is it the NIC, or is the NIC fine?
Thank you in advance once again very much :P
Bye,
-
Check System: Routing: Gateways: what is the system default gateway?
The pfSense box itself will always use the default gateway. Since the VPN gateway was probably added most recently it may have become the default.Steve
-
@Hollander:
Why you run 435GB disk??? :D
Because yes we can :P
( ;D)
It's a WD disk, WIFE picked it. She found out it used about the same energy as an SSD and was way way cheaper than an SSD. Not for the life of it do I dare to dispute WIFE (whom I dearly love btw, for more than two decades), so now I have a 500 GB disk which is filled for 0% ;D ;D ;D
To return the question: looking at the pic in your sig: why you own the internets :o
( ;D)
No way a mechanical HD uses the same power as an SSD, except in some corner cases, but peak power usage for an SSD is 500MB/s bi-directional, so 1000MB/s. After looking at the low power notebook harddrives from WD, their lower power usage is around 0.13watts, but that's fully powered off and just the controller running, which is the same power draw of the Samsung 840 evo. Once the HD un-parks the head and turns back on, the mechanical HD is about 4x the idle power draw of the evo and about 6x when under load. Except the evo at 100% load is a heck of a lot faster. I was able to upgrade(2.1->2.1.2) and reboot in under 1 minute, including a full back-up.
Seeing that my PFSense box's HD light blinks every ~30 seconds, my guess if your HD is always on, no time to shutdown and park to save power.
The main reason I went with a SSD is they have about 1/2 the 30 day return failure rate and about 1/4 the warranty failure rate of mechanical drives. Not to mention I don't need to worry about bumping my box or heat issues.
A hybrid drive could technically reach nearly the same power usage for a zero-write-few-read load, but I think PFSense is writing those 30 sec blinks, which requires writing to the platters.
Be careful about any mechanical HDs power down settings on an appliance type computer, like a firewall or HTPC. IO patterns on appliance devices tend to bring out pathological cases, and mechanical HDs have a rated lifetime maximum number of spin-ups.
-
Check System: Routing: Gateways: what is the system default gateway?
The pfSense box itself will always use the default gateway. Since the VPN gateway was probably added most recently it may have become the default.Steve
Thank you once again Steve ;D
But no, that wasn't the problem: WAN(1, VDSL) is the default GW, as the screenshot shows.
Would you know any other means of testing where the problem might lie / if this nic is the cause (so I should return it, by the latest this week)?
-
No way a mechanical HD uses the same power as an SSD, except in some corner cases, but peak power usage for an SSD is 500MB/s bi-directional, so 1000MB/s. After looking at the low power notebook harddrives from WD, their lower power usage is around 0.13watts, but that's fully powered off and just the controller running, which is the same power draw of the Samsung 840 evo. Once the HD un-parks the head and turns back on, the mechanical HD is about 4x the idle power draw of the evo and about 6x when under load. Except the evo at 100% load is a heck of a lot faster. I was able to upgrade(2.1->2.1.2) and reboot in under 1 minute, including a full back-up.
Seeing that my PFSense box's HD light blinks every ~30 seconds, my guess if your HD is always on, no time to shutdown and park to save power.
The main reason I went with a SSD is they have about 1/2 the 30 day return failure rate and about 1/4 the warranty failure rate of mechanical drives. Not to mention I don't need to worry about bumping my box or heat issues.
A hybrid drive could technically reach nearly the same power usage for a zero-write-few-read load, but I think PFSense is writing those 30 sec blinks, which requires writing to the platters.
Be careful about any mechanical HDs power down settings on an appliance type computer, like a firewall or HTPC. IO patterns on appliance devices tend to bring out pathological cases, and mechanical HDs have a rated lifetime maximum number of spin-ups.
Thanks, I didn't know this (of course not :P ) ;D
Then again, last year, when I installed this system, I've read on this forum many times SSD's are ruined when running pfSense on it. And as they are rather expensive, I sticked with the mechanical drives. So SSD's are now safe to use I assume?
-
I doubt that your new NIC has anything to do with the fact that your ping traffic is going via the VPN. You may have an option to redirect all traffic via the VPN in the setup.
I seem to remember discussing the HD when you were thinking about buying it and coming to the conclusion that any saving made in power consumption was more than offset by the cost of an SSD. Like Harvy said I doubt it is spinning down but you don't want it spinning down frequently anyway. Early consumer level SSDs were bad. The ware levelling systems used was not up tot he job. Worse some drives actually had bad firmware that would brick the drive well before it was worn out. Hence SSDs got a bad reputation. Current SSDs are much better. If you on-line there are a number of reviews where people have tried to kill SSDs by writing to them continuously and failed. Some are still going after many hundreds of TB! One remaining issue is that of data corruption in the event of power loss (not a problem for you as you have a UPS) but there are drives now that address this by having on-board energy storage to allow them to write out any cached data.
Steve
-
@Hollander:
Then again, last year, when I installed this system, I've read on this forum many times SSD's are ruined when running pfSense on it. And as they are rather expensive, I sticked with the mechanical drives. So SSD's are now safe to use I assume?
That was BS to start with.
-
;D
Yes there was (is) a tremendous amount of fud in that thread. ::)Steve
-
To be honest, I don't care about the HDD. It is the high latency that is bothering me. Is there anything else I can do to try to find out if the NIC is the cause or something else?
-
Nobody could help me find out the cause? :-[
I need to find out if the intel quad nic is the cause because I can return it no later than this week.
I attached a new pic :-\
WAN_PPPoE is now in the internal mobo-nic, WAN2 is in the Intel quad nic. PIA VPN goes over WAN_PPPoE.
Is this perhaps a 2.1.2-thing? Because before, on 2.1, I didn't have this. Or does VPN cause this? Some strange 'interaction' between the Intel quad nic and the two intel onboard nic's?
How can I debug the cause?
Peep :-[
Thank you for any help :D
-
Remove any loader.conf.local changes you've made for Intel NICs. The newer drivers in 2.1.1+ don't require the mods that used to be recommended for some systems.
If that doesn't fix it then light the card on fire and buy a i350. The fewer VT cards that remain in this world the better.
-
Those ping times all look high to me but clearly 1.3s is ridiculous. I did see some change in ping times when I upgraded to 2.1.2 but because the PPP sessions are restarted that's not unusual. I'm not seeing anything like that.
Did you not re-install completely at one point? Did you immediately see high latency? Was that with some loader tweaks?
Steve
-
Thank you to both of you for replying ;D
@Jason: the I350 cards are 300 EUR each, and I would need two (two boxes). That is a load of money for a home user. I bought the current cards for 70 EUR each, 25%.
Steve: yes, this was a fresh install of 2.1.2, as upgrading never worked for me. I have removed, per Jason's suggestion, the /boot/loader.conf.local tweaks (and off course rebooted), I have disabled vpn, traffic shaper, snort, so basically nothing is running except for the core system.
The WAN1 is in the internal intel nic's in this Intel mini-ITX machine (an extremely kind man once recommended this board to me ;D ), the WAN2, cable, is using the Dell/Intel Quad nic.
These were the tweaks in /boot/loader.conf.local:
kern.cam.boot_delay=3000 #for intel nics ##kern.ipc.nmbclusters="131072" ##kern.ipc.nmbclusters=131072 ##hw.igb.num_queues=1 #hw.igb.num_queues="1" ##hw.igb.rxd=4096 ##hw.igb.txd=4096 #hw.igb.enable_msix="0" #intel acknowledge license legal.intel_ipw.license_ack=1 #for squid #kern.ipc.nmbclusters="131072" ##kern.maxfiles=65536 ##kern.maxfilesperproc=32768 ##net.inet.ip.portrange.last=65535
I will now shut down this number 1 and remove the quad nic. Then it will be as it was before when I started with pfSense, and then for no reason should there be any high latency as I never had this before when I only used the two internal nic's.
-
:( >:( >:( - :'( :'( :'( - :o :o :o
::) This is getting weirder by the second ::)Recap (please see hardware in my sig):
1. I have my first pfSense, the mini-ITX (=pfSense1), and my backup pfSense, the Dell R200 (=pfSense2).2. I had bought me an IBM Intel quad NIC I couldn't get to work. The other Intel Quad NIC, the Dell, I could get to work in the Dell R200 (pfSense2).
2.a. I did a fresh pfSense install on the pfSense2 with the Dell/Intel quad nic in it (the installer assigned igb0-3 to the ports), and after that was working I put this Dell card into the pfSense1. My thoughts were: my backup pfSense is completed, all I have to do when my first pfSense goes wrong is switch WAN/LAN-cables from my pfSense1 to my pfSense2, power that on and we are ready again.
3. I put the same Dell NIC in my pfSense 1, and I am having problems with that very same Dell NIC in this pfSense1.
What I did now:
4. I removed that very same Dell NIC from my pfSense1 and put LAN and WAN in the internal Intel NICs of pfSense1. The high RTT stays exactly the same**, so even without the Dell quad NIC** the problem remains.5. I then put the very Dell NIC in my pfSense2, the Dell R200 (where it previously already was working fine). Now it gets even weirder-weirder-weirder:
5.a. On booting up pfSense has forgotten all NIC-assignments; I had, during boot up, to answer the questions about interface assignment. However:
5.b. Whereas when I set up the Dell R200 with the Dell quad NIC it had correctly named the NIC's IGB, now, during the interface assignment, it suddenly assigns the EM driver to them.
5.c. With this Dell NIC in the Dell R200 the latencies/RTT are normal. That is: the way they have always been for me ever since pfSense 2.0 (screenshots). This suggests the Dell Intel quad nic is not a problem. At least not in the Dell R200.
5.d. The Dell R200 crashed twice, at the LCD physically connected to the Dell I saw all kinds of weird signs on the screen, and nothing responded anymore. I don't know how to fetch crash logs :-[ I suspect this has to be with the EM-driver being assigned where this clearly should have been IGB. I have no clue how I can change the drivers for the card. As a remark: on first install, when I set up this R200, the pfSense installer correctly selected the IGB-drivers. Only after switching the cables and re-installing the Dell nic in the R200 it had for some reason or the other assigned the EM-driver, and I assume this is why it now crashed twice. As the screenshot shows, there also still is a IGB (albeit only 1 out of 4) 'orphan' somewhere in the configuration. I also don't know how to fix this.
[u]So, where am I now:
6. pfSense1 (Intel mini-ITX) on 2.0 was working correctly with the two internal Intel nics. however, I can't get it to work correctly with the Dell/Intel quad NIC on 2.1.2. Moreover: the high latency remains when I remove the quad NIC. Which suggests the Dell quad nic might not be the problem in pfSense1, but 2.1.2 is.7. pfSense2 (the Dell R200) on 2.1.2 was working correctly with the Dell/Intel quad NIC the first time I freshly installed pfSense on it, when it assigned the IGB-drivers correctly. On re-inserting this Dell nic in the pfSense2 (the backup), the installer had lost the prevous nic assignments, and moreover, where it previously had correctly assigned the IGB-drivers to it, it now assigned the EM-drivers to it. Hence - probably - why it crashed twice. However: in the short time (minutes) that it was up, the RTT/latencies for both VDSL and cable were normal (screenshot). Which also suggests the Dell nic might now be the problem (at least not in the R200), but, for this R200, only the wrongly assigned drivers (EM instead of IGB) are. Which I don't know how to fix.
So, is the Dell NIC the problem?
- Perhaps not in the pfSense2 (the Dell R200);
- And perhaps also not in the pfSense1 (the Intel mini-ITX), as this machine keeps the same high RTT/latency with the Dell quad NIC removed. So it might be that 2.1.2 is the problem for my pfSense1.
–
So, this is all a stupid economist like me can make of this ( :P ). How can I proceed? There are a number of problems:A. Is 2.1.2 the problem for pfSense1, whereas it isn't for pfSense2? How can I determine this, and how might I fix this?
B. How can I re-assign the correct IGB-drivers on pfSense2 (the Dell R200), that suddenly decided, on re-plugging in the cables, that the driver is EM instead of IGB, probably causing the double crash?I will now below reply separately with screenshots, so it stays understandable ;D
EDIT: I forgot: the /boot/loader.conf.local settings were identical.