Strange Reboot-Problems on NexCom 1085L (Oversized Packets?)



  • Hello!

    I've just got a demo-device from NexCom with 4Gbit and 4FE (NSA 1085L) and set up pfSense on it (1.0.1-SNAPSHOT-02-21-2007).

    At first everything seemed to work perfectly but then this happened and happens again from time to time:
    The FW is unresponsive, reboots or simply locks up.

    I've then started to log everything and found this entry here:

    
    2007-03-08 10:47:04	Kernel.Critical	80.190.152.2	Mar  8 10:46:44 kernel: em1: discard oversize frame (ether type 800 flags 3 len 16444 > max 1514)
    2007-03-08 10:47:05	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type 0 flags 3 len 65531 > max 1514)
    2007-03-08 10:47:05	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type 0 flags 3 len 51451 > max 1514)
    2007-03-08 10:47:05	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type 6e flags 3 len 10236 > max 1514)
    2007-03-08 10:47:05	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type 5245 flags 10003 len 17226 > max 1514)
    2007-03-08 10:47:05	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type e20a flags 3 len 15612 > max 1514)
    2007-03-08 10:47:06	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type 2064 flags 10003 len 29250 > max 1514)
    2007-03-08 10:47:06	Kernel.Critical	80.190.152.2	Mar  8 10:46:45 kernel: em1: discard oversize frame (ether type cec0 flags 10003 len 62815 > max 1514)
    2007-03-08 10:47:06	Kernel.Critical	80.190.152.2	Mar  8 10:46:46 kernel: em1: discard oversize frame (ether type 302e flags 10003 len 24944 > max 1514)
    2007-03-08 10:48:30	Kernel.Critical	80.190.152.2	Mar  8 10:48:07 kernel: em1: watchdog timeout -- resetting
    2007-03-08 10:48:30	Kernel.Notice	80.190.152.2	Mar  8 10:48:07 kernel: em1: link state changed to DOWN
    2007-03-08 10:48:30	Kernel.Notice	80.190.152.2	Mar  8 10:48:10 kernel: em1: link state changed to UP
    2007-03-08 10:48:31	User.Notice	80.190.152.2	Mar  8 10:48:11 check_reload_status: rc.linkup starting
    2007-03-08 10:48:31	User.Warning	80.190.152.2	Mar  8 10:48:11 php: : Arguments passed rc.linkup.  ''
    2007-03-08 10:48:31	User.Warning	80.190.152.2	Mar  8 10:48:11 php: : Incorrect number of arguments passed rc.linkup...exiting.
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dnsmasq[451]: read /etc/hosts - 2 addresses
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dnsmasq[451]: reading /etc/resolv.conf
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dnsmasq[451]: using nameserver 212.123.96.110#53
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Internet Systems Consortium DHCP Server V3.0.5
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Copyright 2004-2006 Internet Systems Consortium.
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: All rights reserved.
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Internet Systems Consortium DHCP Server V3.0.5
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Copyright 2004-2006 Internet Systems Consortium.
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: All rights reserved.
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: For info, please visit http://www.isc.org/sw/dhcp/
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Wrote 0 leases to leases file.
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Listening on BPF/em0/00:10:f3:05:ee:d9/10/8
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Sending on   BPF/em0/00:10:f3:05:ee:d9/10/8
    2007-03-08 10:50:30	Local7.Info	80.190.152.2	Mar  8 10:50:12 dhcpd: Sending on   Socket/fallback/fallback-net
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:13 mpd: mpd: pid 483, version 3.18 (root@builder6.pfsense.com 13:56 13-Feb-2007)
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:13 mpd: [pt0] ppp node is "mpd483-pt0"
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:13 mpd: mpd: local IP address for PPTP is 0.0.0.0
    2007-03-08 10:50:30	Daemon.Info	80.190.152.2	Mar  8 10:50:13 mpd: [pt0] using interface ng1
    
    

    As you can see there were some oversized frames which made the FW to get unresponsive and then made it reboot :-(

    I've never ever had that before…

    Please, does anybody know how to solve this?

    Thanks a lot for your help!

    Best regards,

    Chris



  • Just a weird thought (don't laugh): Are you sure that the cpucooler and fan is installed properly and the unit is not getting too hot? Have a look at the bios hardware monitor page (guess the unit should have one) and check for temperature running high. I had this once with a unit too that was installed in a server rack so you didn't heear the non working fan due to all the other noise. There doesn't seem to be a a single event that triggers the reboots (from looking at the logs). Checking /changing RAM might be worth a shot too.



  • Hey, Hoba!

    I am currently again at the Datacenter and I think all other reboots also had to do with that oversized packets… the errors didnt show up in syslog, but when I accessed the Device via my TFT, I could again see the oversized packed error, which leads to a non-functioning WAN-Port :-(

    So the problem are those packets...

    Do you know a trick I could try while I am here to kill those packets? (Where do they come from ?!)

    Thanks!



  • With all the hardware that you already tried and now with this unusual traffic at wan I really think you have some layer1/2 issues at WAN or some hardware at WAN that is acting up.



  • Hey, Hoba!

    I have now shortened the thread a little bit as I was able to watch on my TFT what happens here…
    And the problem is definitely those "Oversized Packets" ...

    As soon as they come it, WAN gets killed and sometimes pfSense/FreeBSD notices that and reboots, sometimes it doesn't and simply stays there...

    Where could those packets come from?
    How could I successfully block them?

    Two servers don't lock up with those packets but have problems with some packet getting delayed, one other server (this one we are talking about) locks up.... :'( :'( :'(

    Everything worked fine with that SonicWall we tested, but it has only 100 MBit/Ports and we'd like to have GBit-Ports...
    But I am slowly getting the idea that pfSense might for some reason simply not work here :-(

    I am seriously depressed...

    Any more ideas/clues?

    Thanks!

    Chris



  • I have had exactly the same probs with these oversized packets (using rl and vr network cards). It seems to be a FreeBSD problem. There are questions on FreeBSD mailing lists but no real answers - well maybe I dont know enough about BSD.

    If there is a router in front of your pfsense you might try to block these packets with that.

    Agree with Hoba that  packets probably  come from malfunctioning hardware on the WAN network. Other customers on the ISP?

    My problem was solved when the ISP changed my IP address to a different pool (fingers crossed - its been about 2 weeks, I think, with no bad packets). However my machine never rebooted - just lost WAN access. Had to reboot the modem and/or the firewall or the WAN interface to bring it back up.

    The packets you are getting are HUGE! Mine were 1508 or 1530 (MTU = 1500)



  • Well there are lots of other customers behind the huge cisco-core-router…

    I really don't know what else to do now...

    If I cannot find a fix somehow before monday, I am afraid we will buy a SonicWall...
    But I'd really like to stick to pfSense... Why is pfSense running on FreeBSD and not Linux?
    Wouldn't the Hardware-Support on Linux-Kernels be much better?



  • Well complaining to the data center might help. They might be able to help stop at the switch or router level.

    Linux vs FreeBSD is a bit OT  just now :-\

    I had the feeling that my problem was related to getting my ip address by DHCP, but that does not seem to be the case here.

    I definitely do not think that the problem is related to hardware as it seems to strike all kinds of ethernet cards. I have the same machine and modem so it was not that either.



  • Well they have a really huge cisco router there so I really doubt that the problem lies there…
    As with a SonicWall everything worked perfectly...

    As that seems to be a bug in FreeBSD as you say we can only hope that it gets fixed soon...  :-\



  • The bug seems to have been there for some time. Not too many people get it, and its difficult to reproduce. Hard to get something like that fixed.

    Its a bit sad that Linux works fine where FreeBSD falls down…

    I am not saying that there is a prolem with the cisco router, but that it could be stopped there. Make a noise, say that you are getting these huge packets that are messing up your machine. They might help out. Thats what I did. It wasnt the ISPs fault but they did what they could even though I am not a huge customer.



  • just a thought… if you are using gig ethernet cards, can you not increase the MTU significantly ? Might help..



  • Thanks for your help  :)

    But wouldn't an increase of the MTU affect all my Servers and those on the other side of pfSense?
    As the packets would have to be fragmented anyway to 1500 ethernet packet-length, wouldn't they?

    I've now taken the NexCom back to the office and will try to set up a test-environment on Sunday where I will
    attach two PC's and hammer pfSense with packets using "iperf" …
    Perhaps I can then try and make that error appear again and find a solution for it ;-)



  • I've used Iperf to test the performance of m0n0 based machines but never saw any problems. The problem packets are malformed - undefined ethertypes. You wont see that kind of output from iperf.

    Let us know what you find out.



  • Btw, even if pfSense would ignore these packets and not crash this probably has a negative effect on performance and throughput. They consume bandwidth on the line and you usually don't want to have that and if you are accounted for volume it's even worse as this traffic will count in too. Guess you want to get rid of this traffic in any case.



  • Hm.. how could I manually create those malformed packets in my test-scenario?

    Any ideas?

    Because I somehow have to trick pfSense into these errors to see if Changes really
    work ;-)

    Thanks :-)



  • Could the FreeBSD gurus tell me which FreeBSD mailing list to post this problem on? I have tried freebsd-questions@freebsd.org some weeks ago but no progress there.



  • @sai:

    Could the FreeBSD gurus tell me which FreeBSD mailing list to post this problem on? I have tried freebsd-questions@freebsd.org some weeks ago but no progress there.

    Start at questions, then make your way to freebsd-net@ and if that finally doesn't work try freebsd-current@


Log in to reply