UPG 2.1 -> 2.1.1.: extremely high latency & pakage loss Intel IGB
-
No way a mechanical HD uses the same power as an SSD, except in some corner cases, but peak power usage for an SSD is 500MB/s bi-directional, so 1000MB/s. After looking at the low power notebook harddrives from WD, their lower power usage is around 0.13watts, but that's fully powered off and just the controller running, which is the same power draw of the Samsung 840 evo. Once the HD un-parks the head and turns back on, the mechanical HD is about 4x the idle power draw of the evo and about 6x when under load. Except the evo at 100% load is a heck of a lot faster. I was able to upgrade(2.1->2.1.2) and reboot in under 1 minute, including a full back-up.
Seeing that my PFSense box's HD light blinks every ~30 seconds, my guess if your HD is always on, no time to shutdown and park to save power.
The main reason I went with a SSD is they have about 1/2 the 30 day return failure rate and about 1/4 the warranty failure rate of mechanical drives. Not to mention I don't need to worry about bumping my box or heat issues.
A hybrid drive could technically reach nearly the same power usage for a zero-write-few-read load, but I think PFSense is writing those 30 sec blinks, which requires writing to the platters.
Be careful about any mechanical HDs power down settings on an appliance type computer, like a firewall or HTPC. IO patterns on appliance devices tend to bring out pathological cases, and mechanical HDs have a rated lifetime maximum number of spin-ups.
Thanks, I didn't know this (of course not :P ) ;D
Then again, last year, when I installed this system, I've read on this forum many times SSD's are ruined when running pfSense on it. And as they are rather expensive, I sticked with the mechanical drives. So SSD's are now safe to use I assume?
-
I doubt that your new NIC has anything to do with the fact that your ping traffic is going via the VPN. You may have an option to redirect all traffic via the VPN in the setup.
I seem to remember discussing the HD when you were thinking about buying it and coming to the conclusion that any saving made in power consumption was more than offset by the cost of an SSD. Like Harvy said I doubt it is spinning down but you don't want it spinning down frequently anyway. Early consumer level SSDs were bad. The ware levelling systems used was not up tot he job. Worse some drives actually had bad firmware that would brick the drive well before it was worn out. Hence SSDs got a bad reputation. Current SSDs are much better. If you on-line there are a number of reviews where people have tried to kill SSDs by writing to them continuously and failed. Some are still going after many hundreds of TB! One remaining issue is that of data corruption in the event of power loss (not a problem for you as you have a UPS) but there are drives now that address this by having on-board energy storage to allow them to write out any cached data.
Steve
-
@Hollander:
Then again, last year, when I installed this system, I've read on this forum many times SSD's are ruined when running pfSense on it. And as they are rather expensive, I sticked with the mechanical drives. So SSD's are now safe to use I assume?
That was BS to start with.
-
;D
Yes there was (is) a tremendous amount of fud in that thread. ::)Steve
-
To be honest, I don't care about the HDD. It is the high latency that is bothering me. Is there anything else I can do to try to find out if the NIC is the cause or something else?
-
Nobody could help me find out the cause? :-[
I need to find out if the intel quad nic is the cause because I can return it no later than this week.
I attached a new pic :-\
WAN_PPPoE is now in the internal mobo-nic, WAN2 is in the Intel quad nic. PIA VPN goes over WAN_PPPoE.
Is this perhaps a 2.1.2-thing? Because before, on 2.1, I didn't have this. Or does VPN cause this? Some strange 'interaction' between the Intel quad nic and the two intel onboard nic's?
How can I debug the cause?
Peep :-[
Thank you for any help :D
-
Remove any loader.conf.local changes you've made for Intel NICs. The newer drivers in 2.1.1+ don't require the mods that used to be recommended for some systems.
If that doesn't fix it then light the card on fire and buy a i350. The fewer VT cards that remain in this world the better.
-
Those ping times all look high to me but clearly 1.3s is ridiculous. I did see some change in ping times when I upgraded to 2.1.2 but because the PPP sessions are restarted that's not unusual. I'm not seeing anything like that.
Did you not re-install completely at one point? Did you immediately see high latency? Was that with some loader tweaks?
Steve
-
Thank you to both of you for replying ;D
@Jason: the I350 cards are 300 EUR each, and I would need two (two boxes). That is a load of money for a home user. I bought the current cards for 70 EUR each, 25%.
Steve: yes, this was a fresh install of 2.1.2, as upgrading never worked for me. I have removed, per Jason's suggestion, the /boot/loader.conf.local tweaks (and off course rebooted), I have disabled vpn, traffic shaper, snort, so basically nothing is running except for the core system.
The WAN1 is in the internal intel nic's in this Intel mini-ITX machine (an extremely kind man once recommended this board to me ;D ), the WAN2, cable, is using the Dell/Intel Quad nic.
These were the tweaks in /boot/loader.conf.local:
kern.cam.boot_delay=3000 #for intel nics ##kern.ipc.nmbclusters="131072" ##kern.ipc.nmbclusters=131072 ##hw.igb.num_queues=1 #hw.igb.num_queues="1" ##hw.igb.rxd=4096 ##hw.igb.txd=4096 #hw.igb.enable_msix="0" #intel acknowledge license legal.intel_ipw.license_ack=1 #for squid #kern.ipc.nmbclusters="131072" ##kern.maxfiles=65536 ##kern.maxfilesperproc=32768 ##net.inet.ip.portrange.last=65535
I will now shut down this number 1 and remove the quad nic. Then it will be as it was before when I started with pfSense, and then for no reason should there be any high latency as I never had this before when I only used the two internal nic's.
-
:( >:( >:( - :'( :'( :'( - :o :o :o
::) This is getting weirder by the second ::)Recap (please see hardware in my sig):
1. I have my first pfSense, the mini-ITX (=pfSense1), and my backup pfSense, the Dell R200 (=pfSense2).2. I had bought me an IBM Intel quad NIC I couldn't get to work. The other Intel Quad NIC, the Dell, I could get to work in the Dell R200 (pfSense2).
2.a. I did a fresh pfSense install on the pfSense2 with the Dell/Intel quad nic in it (the installer assigned igb0-3 to the ports), and after that was working I put this Dell card into the pfSense1. My thoughts were: my backup pfSense is completed, all I have to do when my first pfSense goes wrong is switch WAN/LAN-cables from my pfSense1 to my pfSense2, power that on and we are ready again.
3. I put the same Dell NIC in my pfSense 1, and I am having problems with that very same Dell NIC in this pfSense1.
What I did now:
4. I removed that very same Dell NIC from my pfSense1 and put LAN and WAN in the internal Intel NICs of pfSense1. The high RTT stays exactly the same**, so even without the Dell quad NIC** the problem remains.5. I then put the very Dell NIC in my pfSense2, the Dell R200 (where it previously already was working fine). Now it gets even weirder-weirder-weirder:
5.a. On booting up pfSense has forgotten all NIC-assignments; I had, during boot up, to answer the questions about interface assignment. However:
5.b. Whereas when I set up the Dell R200 with the Dell quad NIC it had correctly named the NIC's IGB, now, during the interface assignment, it suddenly assigns the EM driver to them.
5.c. With this Dell NIC in the Dell R200 the latencies/RTT are normal. That is: the way they have always been for me ever since pfSense 2.0 (screenshots). This suggests the Dell Intel quad nic is not a problem. At least not in the Dell R200.
5.d. The Dell R200 crashed twice, at the LCD physically connected to the Dell I saw all kinds of weird signs on the screen, and nothing responded anymore. I don't know how to fetch crash logs :-[ I suspect this has to be with the EM-driver being assigned where this clearly should have been IGB. I have no clue how I can change the drivers for the card. As a remark: on first install, when I set up this R200, the pfSense installer correctly selected the IGB-drivers. Only after switching the cables and re-installing the Dell nic in the R200 it had for some reason or the other assigned the EM-driver, and I assume this is why it now crashed twice. As the screenshot shows, there also still is a IGB (albeit only 1 out of 4) 'orphan' somewhere in the configuration. I also don't know how to fix this.
[u]So, where am I now:
6. pfSense1 (Intel mini-ITX) on 2.0 was working correctly with the two internal Intel nics. however, I can't get it to work correctly with the Dell/Intel quad NIC on 2.1.2. Moreover: the high latency remains when I remove the quad NIC. Which suggests the Dell quad nic might not be the problem in pfSense1, but 2.1.2 is.7. pfSense2 (the Dell R200) on 2.1.2 was working correctly with the Dell/Intel quad NIC the first time I freshly installed pfSense on it, when it assigned the IGB-drivers correctly. On re-inserting this Dell nic in the pfSense2 (the backup), the installer had lost the prevous nic assignments, and moreover, where it previously had correctly assigned the IGB-drivers to it, it now assigned the EM-drivers to it. Hence - probably - why it crashed twice. However: in the short time (minutes) that it was up, the RTT/latencies for both VDSL and cable were normal (screenshot). Which also suggests the Dell nic might now be the problem (at least not in the R200), but, for this R200, only the wrongly assigned drivers (EM instead of IGB) are. Which I don't know how to fix.
So, is the Dell NIC the problem?
- Perhaps not in the pfSense2 (the Dell R200);
- And perhaps also not in the pfSense1 (the Intel mini-ITX), as this machine keeps the same high RTT/latency with the Dell quad NIC removed. So it might be that 2.1.2 is the problem for my pfSense1.
–
So, this is all a stupid economist like me can make of this ( :P ). How can I proceed? There are a number of problems:A. Is 2.1.2 the problem for pfSense1, whereas it isn't for pfSense2? How can I determine this, and how might I fix this?
B. How can I re-assign the correct IGB-drivers on pfSense2 (the Dell R200), that suddenly decided, on re-plugging in the cables, that the driver is EM instead of IGB, probably causing the double crash?I will now below reply separately with screenshots, so it stays understandable ;D
EDIT: I forgot: the /boot/loader.conf.local settings were identical.
-
So first I removed the Dell/Intel quad nic from my pfSense1 (Intel mini-ITX). What remains are the two internal NIC's of this Intel mobo. I plugged my HP switch in one of these ports, and then, one after another, both the VDSL and the cable, in the other. The high latency/RTT remains without the Dell quad NIC.



 -
Next, I moved the Dell quad NIC to my pfSense2, the backup machine, the Dell R200. RTT/latency is how it has always been for me ever since my first installation of pfSense. So, for me, normal.
-
And, finally, the weird situation I have on the Dell R200 (pfSense2, the backup machine, now). Suddenly the drivers are EMx, but at the same time there is some weird 'orphan' IGBx left over.
-
As in so many times already; I am in debt for help in this complicated matter. Extremely thank you very much for helping me out ;D
-
And this is all using 2.1.2 64bit full install?
Weird doesn't cut it, utterly bizarre is more like. How can the em driver attach to the pro/1000 VT NIC? ???Could you give us the output of:
: pciconf -lv | grep 20000
Steve
-
And this is all using 2.1.2 64bit full install?
Weird doesn't cut it, utterly bizarre is more like. How can the em driver attach to the pro/1000 VT NIC? ???Could you give us the output of:
: pciconf -lv | grep 20000
Steve
Hi Steve ;D
Yes, this is a fresh install of 2.1 AMD64 (like said, upgrades have never worked for me), and then upgrading that using the GUI to 2.1.1 and consequently to 2.1.2 in the short time frame in which they came available.
Your command on the Dell, with the mysterious assignment of the NICs, gave:
[2.1.2-RELEASE][root@dell.workgroup]/root(1): pciconf -lv | grep 20000 em0@pci0:4:0:0: class=0x020000 card=0x11bc8086 chip=0x10bc8086 rev=0x06 hdr=0x00 em1@pci0:4:0:1: class=0x020000 card=0x11bc8086 chip=0x10bc8086 rev=0x06 hdr=0x00 em2@pci0:5:0:0: class=0x020000 card=0x11bc8086 chip=0x10bc8086 rev=0x06 hdr=0x00 em3@pci0:5:0:1: class=0x020000 card=0x11bc8086 chip=0x10bc8086 rev=0x06 hdr=0x00 bge0@pci0:6:0:0: class=0x020000 card=0x023c1028 chip=0x165914e4 rev=0x21 hdr=0x00 bge1@pci0:7:0:0: class=0x020000 card=0x023c1028 chip=0x165914e4 rev=0x21 hdr=0x00 [2.1.2-RELEASE][root@dell.workgroup]/root(2):
The same command on the pfSense1, the mini-ITX, which still has the high latencies despite the Dell NIC now being removed, gives:
[2.1.2-RELEASE][root@ids.workgroup]/root(1): pciconf -lv | grep 20000 em0@pci0:0:25:0: class=0x020000 card=0x20368086 chip=0x15028086 rev=0x04 hdr=0x00 em1@pci0:2:0:0: class=0x020000 card=0x20368086 chip=0x10d38086 rev=0x00 hdr=0x00 [2.1.2-RELEASE][root@ids.workgroup]/root(2):
(Note: this pfSense1, the mini-ITX, also is a fresh install of 2.1 -> 2.1.1 -> 2.1.2. I've used this for over a year, but given upgrades never worked I freshly installed 2.1 after having been on 2.0 for a year. So no leftovers from previous trials and errors.)
So, do you agree with me that the - still remaining - high RTT/latency on pfSense1 appears to be a problem with 2.1.2 with my hardware (after all, the Dell NIC is not inside in it anymore yet the high latency remains)?
And of course, is there a simple way for me to tell the pfsense2 (R200) to use the IGB's it originally correctly installed? Or will I have to do a fresh install every time I switch switch- and VDSL/CABLE ethernet cable from pfsense1 to pfsense2? (That will be horrible).
Thank you again Steve; in debt as always ;D
-
Yep, I agree that the latency remains without the Dell card in place.
So the interfaces on the Dell card appear as PCI Vendor ID: 8086 (Intel), PCI Device ID: 10BC. Consulting the list of hardware supported by Intel Gigabit drivers from FreeBSD 8.3 we see that this is:
@http://svnweb.freebsd.org/base/release/8.3.0/sys/dev/e1000/e1000_hw.h?revision=234063&view=markup:#define E1000_DEV_ID_82571EB_QUAD_COPPER_LP 0x10BC
O.K. so that looks right, we know it's a quad copper NIC. However that chip is supported, in FreeBSD 8.3, by the em(4) driver not igb. Also checking the other source files it's still supported by the em driver in 8.1(2.0.X) and in10 (2.2). The actual driver used in 2.1.2 is not the FreeBSD 8.3 release version but a backport. I haven't got around to signing up to the tools repo access yet so I can't check exactly but since it hasn't changed in 10 I think it's very unlikely to be anything other than em.
The same is true for the two on-board NICs in system 1.
Why the were those NICs ever attched to the igb(4) driver? :-\ Were they returning a different PCI Dev ID before perhaps for some reason? It doesn't seem to be a Pro/1000 VT card as those are using 827575GB controllers.
Anyway since they appear to be correctly using the em driver I suggest you try some of the loader variables with em instead of igb.
I'll try and re-read this thread because it seems like I may have misunderstood/forgotten something.
Steve
-
Yep, I agree that the latency remains without the Dell card in place.
So the interfaces on the Dell card appear as PCI Vendor ID: 8086 (Intel), PCI Device ID: 10BC. Consulting the list of hardware supported by Intel Gigabit drivers from FreeBSD 8.3 we see that this is:
@http://svnweb.freebsd.org/base/release/8.3.0/sys/dev/e1000/e1000_hw.h?revision=234063&view=markup:#define E1000_DEV_ID_82571EB_QUAD_COPPER_LP 0x10BC
O.K. so that looks right, we know it's a quad copper NIC. However that chip is supported, in FreeBSD 8.3, by the em(4) driver not igb. Also checking the other source files it's still supported by the em driver in 8.1(2.0.X) and in10 (2.2). The actual driver used in 2.1.2 is not the FreeBSD 8.3 release version but a backport. I haven't got around to signing up to the tools repo access yet so I can't check exactly but since it hasn't changed in 10 I think it's very unlikely to be anything other than em.
The same is true for the two on-board NICs in system 1.
Why the were those NICs ever attched to the igb(4) driver? :-\ Were they returning a different PCI Dev ID before perhaps for some reason? It doesn't seem to be a Pro/1000 VT card as those are using 827575GB controllers.
Anyway since they appear to be correctly using the em driver I suggest you try some of the loader variables with em instead of igb.
I'll try and re-read this thread because it seems like I may have misunderstood/forgotten something.
Steve
That is some real British Sherlock Holmes work you have done, Steve; thank you very much ;D
So now it appears not to be the VT? But then what(?)
I don't know what has happened, because:
1. The IBM card didn't work. So I had only one other card to try, the Dell.
2. This one single card from Dell I installed in both machines; first in the R200, then in the mini-ITX.
3. When I was finished installing the R200 - to be stashed away as a backup machine - I took note of which of the 6 ports was for what (WAN, WAN2, VLAN, LAN), removed the cables and shut down the Dell, to go work on my fresh re-install of the mini-ITX, in which I also put the Dell quad NIC.
4. Both machines assigned the IGB-driver to the Dell card on their first install (the mini-ITX still has all the IGB-interfaces after I removed the card yesterday).
5. The Dell worked perfectly on the IGB-driver the whole week I tested it before shutting it down and storing it as a backup. So: with the IGB-driver.
6. Only yesterday did it suddenly decide to assign em-drivers to them; and it started crashing immediately.I will of course put the em-variables in /boot/loader.conf.local as you suggest, but it strikes me: point 5.
I can also reinstall the Dell, but I am sure it will assign the IGB again (I recall doing that every time, since I had to reinstall the Dell 3 times before I got it to work).
So, while I will do the em-variables, my questions are:
1. What could I do about the mini-ITX? (This has a little high priority above the Dell, since I can not return cards after this week, should I need to do so).
2. Suppose I add the em-variables in the Dell (and perhaps also in the mini-ITX), aren't things supposed to run into a mess since I am sure the references to IGB are spread through config parts in different kinds of the system?Thank you once again for your help, Sir Steve ;D
-
I assume that the mini-ITX board is using the em driver for it's on-board interfaces? Do you have any em tunables loading? Try playing with them if not. Check for errors in Status: Interfaces: and the logs. The latency could be some sort of excessive buffering or huge error rate (are you seeing packet loss?)
If you put the card back in some other box and it appears as igb interfaces run the pciconf command again and see if it's reporting a different device ID. There are only a couple of things I could imagine changing the device ID: the card firmware has been updated. I would expect it to be some manual process with multiple 'are you sure?'s ;) but I coul;d just about imagine the Dell box talks to their card differently or that an Intel board updates the firmware somehow. Very odd.Also if you reinstall for any reason I suggest you use a 2.1.2 CD directly to eliminate any upgrade issues you may have coming from 2.1.
Steve
-
I assume that the mini-ITX board is using the em driver for it's on-board interfaces? Do you have any em tunables loading? Try playing with them if not. Check for errors in Status: Interfaces: and the logs. The latency could be some sort of excessive buffering or huge error rate (are you seeing packet loss?)
If you put the card back in some other box and it appears as igb interfaces run the pciconf command again and see if it's reporting a different device ID. There are only a couple of things I could imagine changing the device ID: the card firmware has been updated. I would expect it to be some manual process with multiple 'are you sure?'s ;) but I coul;d just about imagine the Dell box talks to their card differently or that an Intel board updates the firmware somehow. Very odd.Also if you reinstall for any reason I suggest you use a 2.1.2 CD directly to eliminate any upgrade issues you may have coming from 2.1.
Steve
Thanks Steve ;D
Update: adding the em-settings to the mini-ITX does nothing good, but one thing bad: I can not ping to it anymore, or access the GUI. I had to go over to the console and manually edit these settings out again:
kern.ipc.nmbclusters="131072" hw.em.num_queues=1 #hw.em.rxd=4096 #hw.em.txd=4096
The mini-ITX for it's onboard indeed is em0 and em1. Status/interfaces is clean, no in/out errors, no collissions, on none of the interfaces. Package loss occassionaly happens in the GUI, especially when the RTT goes above around 100.
I did not update any firmware; neither on the nic or the mobo of the Dell, nor on the mini-ITX. I am assuming these boards don't update themselves over the internets without me knowing anything about it ( :-[ ).
I will now add the em-settings to the Dell and see if that crashes again within minutes.
Thanks for your help Steve ;D