System load on WAN interface on two pfSence VM's

voodooless

We are running two pfSence 1.2.3 VM's in an vSphere 4 environment. Both are on the same WAN subnet. One of the pfSence has a multiWan config. Both machines are basically running fine.

Now we have a problem that for some reason, every few hours we see a lot of netbios traffic (netbios name query broadcast) on the WAN subnet comming from both machines. It goes on for hours, and results in a huge load on the system process of the virtual nic (Intel E1000).

When I disconnect one of the two interfaces, it stops for a few hours, and then starts again… I have no clue why..

Source address of the packets send from both machines is the same: 169.254.21.51 and destination is a broadcast to 169.254.255.255

jimp

That is the IP scheme used by a windows box (or some others, like a slingbox, IIRC) that comes up when they do not get an IP by DHCP.

Not sure why the pfSense boxes are interacting with that traffic, unless they're just passing it through.

voodooless

I don't know why traffic should be forwarded… Just to be sure I blocked netbios traffic on the WAN interfaces. Now, all seems to work just fine.

voodooless

Well, it still is not working correctly. Looks like pfSence forwards netbios traffic from one of the LAN interfaces to WAN2. From there it keeps on gooing on between the two pFsence machines. I already block RFC1918 networks on the WAN2 interface…

This time the netbios message is from another ip range (192.168.1.x)...

jimp

pfSense does not forward broadcast traffic like that between segments. If you are seeing traffic from LAN on the WAN2 segment, the usual cause is that they are both plugged into the same ~~physical~~ switch, vlan, etc.

Edit: forgot you were doing this all on VMs. Could still be the same vswitch/virtual segment

voodooless

No, those a sparate virtual networks, with two different physical VLAN's on the physical switches.

The problem just popped up again, despite setting the firewall to block any netbios traffic on the wan interface… I really don't get it..

jimp

Can you verify by packet capture/tcpdump if this traffic is actually flowing through the firewall?

It really sounds like it's bleeding through at the layer 2 level somehow, not in the firewall.

voodooless

Well, let's assume pfSense is not bleading traffic to the WAN interface. Then:

-Why are both netbios packets send originating from the two pfSence machines
-Why are these netbios requests generating so much load? If it runns for several hours, pfSence goes just dead completely!

jimp

@voodooless:

Well, let's assume pfSense is not bleading traffic to the WAN interface. Then:

-Why are both netbios packets send originating from the two pfSence machines
-Why are these netbios requests generating so much load? If it runns for several hours, pfSence goes just dead completely!

Your assumption and point #1 are contradictory. It's impossible to speculate without more information. Like I said, packet captures at a minimum, and the output of top -SH at the time would also be good.

High loads that build up over time because of weird traffic that should not be where you're seeing, to me, screams "layer 2 loop".

voodooless

Okay… I'll try dooing a packet cature. The problem is... Capturing while it goes wrong is not very interesting. It's only interesting to capture when it starts. Although that is not easy, I'll give it a try.

Top output is easy: about 50% to 90% system load on the affected network interface process. The rest is normal.

I also disables the load balancer on the multi WAN machine.. Maybee this helps.. For now it's waiting for the disaster to happen again ;)

jjponce

Hey, I know the post is too old, sorry about that, But I am looking for the same answer.

I also have 2 pfsense box running 2 RC1, I have noticed than when CARP is enabled and after 20 minutes aprox, a NETBIOS broadcast starts to appear on the WAN.

The message that appears with a tcpdump on the wan is:

7:56:45.903707 IP 169.254.25.213.netbios-ns > 169.254.255.255.netbios-ns: NBT UDP PACKET(137): REGISTRATION; REQUEST; BROADCAST

If I shutdown the secondary box the broadcast dissapears.

I think that is weird :-
For your help thanks.

jjponce

SOLVED

I think i got it, and yes its a layer 2 loop.

Because both WAN interfaces are connected on one switch, and another is connecting both pfsense. So when enabling CARP with a dedicated link to sync you have to create a rule on both firewalls that only permits the PFSYNC protocol and blocks everything else.

Saludos!

Itwerx

Just ran into this for the first time here as well with two physical boxes running v1.2.1 w/CARP and having separate switches for each private/public subnet. Interestingly, the only difference between this network and any of quite a few others we've worked on with pfSense was the presence of a number of new Windows 7 machines which had bad keys and all suddenly started looking for KMM servers at the same time, (with lots of NBT broadcasts in the process). Confoundingly, it's a multi-WAN and we had just added the second link (on a separate switch of course) and thought maybe we had configured something wrong in the LB by accident. Fortunately we have an identically configured setup at another location and after doing a line-by-line comparison between all the configs determined it had to be a bug in pfSense. A quick search came across this thread and we have implemented jjponce suggestion of restricting traffic between the two WAN interfaces to pfsync and nothing else. (This was only performed on the public-facing NICs as they were the only interfaces exhibiting the problem, his reference to CARP interfaces may still apply under some circumstances).

To clarify a bit further for anyone else seeing this, the traffic only appears on the public side and completely saturates the external NICs. If you do a packet capture all the packets are NetBIOS addressed from/to 169.254.x.x (actual IP varies of course) and run up to the maximum bandwidth of the WAN link.
To reiterate that last, the bandwidth utilization we observed was the physical limits of the dedicated lines coming in, NOT the limits of the local hardware. This implies that pfSense is routing the broadcast packets out and they are getting reflected back by upstream devices(?) The multi-WAN in this case has lines coming from two different ISPs, both lines having bandwidth caps set by the ISPs, one at 35Mb and the other at 100Mb. All local hardware is Gb but the traffic load was never more than what the lines were (externally) capped at.

We'll post again if jjponce's solution does not help, otherwise consider it the answer for now.