Traffic just stops

techie_g33k

At random (no set amount of time - sometimes, hours after it last happened and other times a week or two) my pfSense box will just stop passing traffic from the WAN -> LAN and back out. I can SSH (or web) into the box w/o any problems and can ping inside/outside.
To fix it I just have to fo to Interfaces -> WAN and click Save and it comes back by the time the script reloads.

I have checked and every time this happens the last entry in the Logs is always under Firewall and looks simular to this:

pf: 1. 772402 rule 332/0(match): block in on an0: (tos 0x0, ttl 255, id 35270, offset 0, flags [none], proto: UDP (17), length: 109) 64.187.xx.xxx.5353 > 224.0.0.251.5353: 0[|domain]

There are lots of those all throughout the Firewall logs, but I have nothing else to go by. I also get a "kernel: an0: xmit failed" on the "System" tab but I have read that's caused from the firmware of the Cisco 350 PCI card I am using (will fix it when I have time to pull card and flash the card back to an older firmware), but this is not the last entry before traffic stops.

Below is my hardware (in case that helps):

pfSense 1.0 RC2
Dell Dimension 2400 (see hardware below):
-6GB Seagate IDE hard drive
-onboard 10/100 ethernet (bfe0 - LAN)
-P4 2.4GHz
-128MB DDR233 RAM
-Cisco 350 PCI (an0 - WAN)
-Cisco AIR-CB21AG-?-K9 PCI (ath0 - WLAN)
I have the LAN connected to a Switch to allow for a constant Ethernet LINK even though all computers in my network connect via the wireless.

A co-worker with completely different hardware (other than they have a Cisco 350 PCI card also for WAN) has the same issue with the same fix. I don't know what he shows in his logs.

Any help would be great, otherwise I'll just hold out to see if goes away in RC3.

Ohh ya, just FYI it was also seen in RC1.

Thanks!
Logan

hoba

What type of WAN do you use? DHCP, PPPoE, …? Is there anything in the logs concerning unsuccessful renewal of DHCP WAN for example? Do you have the possibility to try with another NIC for WAN (not an)? Maybe it's just an hardware issue with these kind of nics.

techie_g33k

I am running Static (in case the DHCP server crashes I never loose my IP).
I talked with the System Admin and he runs pfSense Beta4 (Beta3 wouldn't allow Static IPs to be set and still pass traffic) and never see's this issue that me and my co-worked are seeing on RC1/2. So for now he is going to stay on Beta4 and avoid possibly becoming #3 in our group w/ this odd and random issue.

Is there a way I can get a more in depth output of what's failing? Maybe look for a certain process to be there that isn't or such from SSH?
I have also tried this w/ and w/o any Packet Filtering rules setup and had the same affect.

I can see if we have another type of card we can use and let you know. Remember though that the pfSense box (SSH/web) can ping out on both the LAN and WAN side, just the rules taht allow the traffic from my LAN to pass out the WAN stop working - it seems. You can also SSH/web in from the WAN side.

hoba

Atm I would try swapping the nics as you get an error in the logs concerning the nic and this is the only thing that is common between these systems. I have not heard from anybody else with this problem yet.

techie_g33k

I will have the techs at my office ensure that I take a Cisco 350 card with old enough FW that it doesn't error (they know which version works, I can't remember) and try it out and let you know :)

Thanks for working with me so much :)

wacko

I open this topic again, since I'm seeing exactly the same behaviour with the current snapshot (1.2 BETA).

Was there any solution back then?

So again the behaviour I'm seeing is: after some random time pfsense simply stops forwarding (new) packets and I get many firewall blocks of the form:


Jun  1 15:55:50 192.168.6.1 pf: 000017 rule 167/0(match): block in on em2: x.y.z.v.3727 > 85.75.180.33.11206: UDP, length 108
Jun  1 15:55:51 192.168.6.1 pf: 1\. 205282 rule 167/0(match): block in on em2: x.y.z.v.3727 > 89.79.20.136.17147: UDP, length 24
Jun  1 15:55:51 192.168.6.1 pf: 000018 rule 167/0(match): block in on em2: x.y.z.v.3727 > 84.252.27.137.9767: UDP, length 24
Jun  1 15:55:51 192.168.6.1 pf: 000019 rule 167/0(match): block in on em2: x.y.z.v.3727 > 83.226.157.41.8833: UDP, length 24
Jun  1 15:55:52 192.168.6.1 pf: 831579 rule 167/0(match): block in on em2: x.y.z.v.3727 > 85.75.180.33.11206: UDP, length 108

where x.y.z.v is my internal ip, em2 is the WAN interface and the rule, which suddenly applies is the default "block all" rule.
It seems that my packets can pass out, but, the related packets wich are beeing sent back are not recognized as related, and hence blocked.

Any already existing or established connection STAY ALIVE; only no new connection can be established. Additionally I do not see any hardware error previous to this happening. The NICs are all intel.
After some minutes (say 5-10) everything is back to normal, and new connections are again possible. I looked through all the logs, and cannot find any hint why this happens.

First I thought, this might be happening because I use a bridge between LAN and WAN, but seeing this topic makes me feel it is something else.

I running out of ideas what else i can test to get this working stable.

Regards
Arno

sullrich

Increase your state table size.

wacko

Thanks for the quick reply.

That was exactly what I was suspecting myself, since all the symptoms (I made a few more tests) point this way. However, the RRD graph show a max peek of about 2k states with a average arround 200-600. I used the default 10k states, and thought that this would be ok since it is way under the maximum limit.

Anyway I increased the states to 100k (machine has 2GB RAM), and until now the effect did not apear anymore - however, its also Friday evening and nobody is working anymore, so the real test will be on monday. But I got the feeling this solves it. I'll keep you informed.

best regards
Arno

wacko

.. unfortunately increasing the states does not seem to do the trick: Today the effect reappeared, and I was blocked for about 10 minutes from the outside. I still could check the webinterface and saw that I had something arround 50 states… :( - I was completely alone in the whole office, hence the low state-number.

I updated now to the newest snapshot from today around 13:00 with the hope that the problem magically disappears :).

However, since i looked at the changes made, I doubt that. Hence, can sombody advise me what to check when the situation occurs again. What kind of commads would help to indentifiy the problem? (I checked the interfaces: they where all up.) I have remote sysloging running - however, there is nothing suspect to me when it occurs. Again is there something I should look for?

Best regards
Arno

sullrich

Sounds like hardware glitches of some sort. Maybe try replacing your interfaces with Intel nics if they are not already. Turn off all unneeded options in the bios. Ensure the bios is up to date. Turn off plug and play support in the bios.

wacko

Thanks for the quick replay..

however.. been there, done that: all NICs are intel. BIOS is stripped to the only necessary. the machine was actually purchased specifically as pfsense firewall and is brand new.

Actually I doubt hardware.. since from the webinterface i can still access all connected networks…(ping). Its just everything which goes through the filter is suddenly not allowed anymore (remeber: activated connections STAY ALIVE).

I even tried - when the effect happens - to deactivate the filtering bridge: no effect (besides, that of course now also the active connections broke). Also a "reset states" did not make any change. I still kept getting the "block default rule" messages in the syslog - in the same time it is possible to log on via ssh and ping in any direction. So for me, the states (i.e. pf itself) looked much more like the guilty one.

Hmmm.. if i change the maximum number ob states do I have to reboot? (The webinterface does not say anything, so i believe it is changed dynmically)

In order to check if pf is still working correct, when it happens - is there a command i can put, so we can draw conclusions later?

Best regards
Arno

cmb

What you're seeing with that blocked traffic is normal out of state dropped traffic. You'll always see it, you're wrongly associating that with a problem.

Need more info. What kind of Internet connection? What's your WAN config, static, DHCP, PPPoE, …? When it happens, can you access the Internet from pfsense itself (ping google.com or something)?

wacko

Hi there,

first the good news: the problem really seemes to be gone - at least it did not occur again since updating to the snapshot mentioned in my last post.

However, the reason why that is so, is still in the dark.

I changed since then only two things:

updated the states to 500000
updated the image.

Regarding states, I see even in heaviest times like 1000-2000 states - this is still very, very far away from even the standard 10000.

Again here my setup:

Internet routing x.y.z.w/25 , GW x.y.z.129 –--- x.y.z.130 (WAN-IF)---pfsense (transparent bridge mode)-----x.y.z.135 (LAN_IF)------ clients in the range x.y.z.w/25 via DHCP (without the used ones)

I agree that the blocked traffic looks like out-of-state; however, when the situation occured the GUI showed me much less states than have beeen configured.

Anyway.. for now the problem is solved and i hope it stays like that. If it will re-occur, I'll try to give even more details.

Best regards and keep up the good work!
Arno