Bad hdr length messages in logs (Dell R200 Machines)



  • We are running two pfsense machines on brand new Dell R200 machines, and are seeing allot of these messages:

    
    2008-10-31 01:28:14	Local0.Info	gateway.b-net.local	Oct 31 01:28:32 pf: 002502 rule 158/0(match): block in on bge0: (tos 0x0, ttl 52, id 41657, offset 0, flags [DF], proto TCP (6), length 60) 80.168.182.233.4903 > our.ip.here.com.23:  tcp 28 [bad hdr length 12 - too short, < 20]
    
    
    
    2008-10-31 01:28:08	Local0.Info	gateway.b-net.local	Oct 31 01:28:26 pf: 020594 rule 158/0(match): block in on bge0: (tos 0x0, ttl 52, id 28948, offset 0, flags [DF], proto TCP (6), length 60) 80.168.182.233.4903 > our.ip.here.com.23:  tcp 32 [bad hdr length 8 - too short, < 20]
    
    
    
    10-31-2008	17:57:55	Local0.Info	Oct 31 17:58:15 pf: 240896 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 64895, offset 0, flags [DF], proto TCP (6), length 48) 10.10.1.36.3533 > 172.16.108.150.110:  tcp 20 [bad hdr length 8 - too short, < 20]
    
    

    This can't be good. This is happening on the EM and BG interfaces (so it's not the Nics)

    Wan interface and DMZ –> LAN

    It's happening on both dell machines , so I wouldn't say it's the memory. I did a memory test to make sure (memtest) and no errors came back. The machines are brand new anyway.

    The dell unit has 2 built in broadcom gbit nics and a additional pci intel 2x gbit card.

    I've tried enabling ICMP traffic from wan (echo reply was disabled first) and I've tried disabling the hardware checksum in advanced.

    Still getting allot of these errors.

    We've had one of our clients complain that they cannot reach there download portal here (and logs show that they also get a hdr error), but furthermore we have had no problems for the last 5 weeks (just got these dell machines).

    Anybody know what's going on? The 1.20 stable version doesn't run our dell R200, the 1.2 version worked perfectly with a pentium 4 setup and the same switches / servers.

    Is this an issue with the Dell machine/hardware –> pfsene 1.21? or is it a bug in 1.2.1? Is there a way of running 1.2.0 stable with our dell machines?

    R200 Specs :

    210-20785 PE R200 Dual Core Xeon E3110, 3.0GHz, 6M, 1333FSB 2 S (single cpu) and installed selecting multi cpu
    330-10069 PCI-E Riser Card (1xPCI-E x8 slot, 1x PCI-E x4 slot) 2 S
    340-14917 English R200 Ship and Power Cord 2 S
    350-10205 R200 Front Bezel 2 S
    370-13063 4GB (2x2GB Dual Rank DIMMs) 800MHz 2 S
    400-14022 160GB Serial ATA 7.2k 3.5" HD Non Hot Plug 4 S   (IN RAID 1 with the SAS 6i RAID PCI card)
    403-10261 SAS 6i/R Internal Controller RAID PCIe 2 S
    429-13037 CD-RW/DVD-ROM Drive SATA 2 S  (used external dvd drive to get pfsense installed)
    540-10372 Intel PRO 1000PT Dual Port Server Adapter, Gigabit NIC, Cu, PCIe x4 2 S  Using both Intel and broadcom, both have the same problem

    We have two of these units, both have the same errors.. I'm also seeing these messages in the syslog.. not sure if it's bad or not.. never done this in pfsense 1.2



  • Anybody?? We were thinking of buying a support contract for pfsense, but if we can't even get these R200 stable with pfsense.. :(?

    Nobody else running the Dell R200 with the 1.2.1 RC snapshot?



  • try disabling tso on the cards with ifconfig $interface -tso



  • Ok turned off tso for all interfaces, will check the logs and report back tommorow.

    Thanks.



  • Nope.. bad news.. even with TSO turned off :

    
    2008-11-03 17:25:27	Local0.Info	gateway.b-net.local	Nov  3 17:25:56 pf: 002873 rule 218/0(match): pass in on em1: (tos 0x0, ttl 128, id 17773, offset 0, flags [DF], proto TCP (6), length 48) 172.16.108.150.38987 > 10.10.1.201.389:  tcp 24 [bad hdr length 4 - too short, < 20]
    2008-11-03 17:25:27	Local0.Info	gateway.b-net.local	Nov  3 17:25:56 pf: 023616 rule 218/0(match): pass in on em1: (tos 0x0, ttl 32, id 7957, offset 0, flags [none], proto ICMP (1), length 60) berlin.b-net.local > brussel.b-net.local: ICMP echo request, id 512, seq 24710, length 40
    2008-11-03 17:25:27	Local0.Info	gateway.b-net.local	Nov  3 17:25:56 pf: 001123 rule 218/0(match): pass in on em1: (tos 0x0, ttl 128, id 42001, offset 0, flags [DF], proto TCP (6), length 48) 172.16.108.150.38988 > 10.10.1.201.445:  tcp 24 [bad hdr length 4 - too short, < 20]
    2008-11-03 17:25:27	Local0.Info	gateway.b-net.local	Nov  3 17:25:56 pf: 020610 rule 218/0(match): pass in on em1: (tos 0x0, ttl 128, id 15915, offset 0, flags [DF], proto TCP (6), length 48) 172.16.108.150.38990 > 10.10.1.204.389:  tcp 24 [bad hdr length 4 - too short, < 20]
    2008-11-03 17:25:27	Local0.Info	gateway.b-net.local	Nov  3 17:25:56 pf: 003785 rule 218/0(match): pass in on em1: (tos 0x0, ttl 128, id 9527, offset 0, flags [DF], proto TCP (6), length 48) 172.16.108.150.38991 > 10.10.1.204.445:  tcp 28 [bad hdr length 0 - too short, < 20]
    2008-11-03 17:25:32	Local0.Info	gateway.b-net.local	Nov  3 17:26:01 pf: 554429 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 59173, offset 0, flags [none], proto UDP (17), length 78) 10.0.1.43.137 > 10.255.255.255.137: [|SMB]
    2008-11-03 17:25:32	Local0.Info	gateway.b-net.local	Nov  3 17:26:01 pf: 025805 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 535, offset 0, flags [none], proto UDP (17), length 293) 10.10.1.201.137 > 192.168.38.1.137: [|SMB]
    2008-11-03 17:25:33	Local0.Info	gateway.b-net.local	Nov  3 17:26:02 pf: 326012 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 36693, offset 0, flags [none], proto UDP (17), length 229) 10.0.1.104.138 > 10.255.255.255.138: NBT UDP PACKET(138)
    2008-11-03 17:25:33	Local0.Info	gateway.b-net.local	Nov  3 17:26:02 pf: 011779 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 12834, offset 0, flags [DF], proto TCP (6), length 48) 10.0.1.69.3181 > 172.16.108.150.1133:  tcp 24 [bad hdr length 4 - too short, < 20]
    2008-11-03 17:25:33	Local0.Info	gateway.b-net.local	Nov  3 17:26:02 pf: 000312 rule 222/0(match): pass in on bge1: (tos 0x0, ttl 128, id 4120, offset 0, flags [DF], proto TCP (6), length 48) 10.0.1.69.3182 > 172.16.108.150.1133:  tcp 24 [bad hdr length 4 - too short, < 20]
    2008-11-03 17:25:34	Local0.Info	gateway.b-net.local	Nov  3 17:26:03 pf: 112739 rule 231/0(match): block in on bge1: (tos 0x0, ttl 128, id 64837, offset 0, flags [none], proto TCP (6), length 48) 10.10.1.40.139 > 192.168.17.1.3182:  tcp 24 [bad hdr length 4 - too short, < 20]
    
    


  • Anybody have any other idea's? This is really frustating.. we would love to work with pfsense.. even buy a support contract (commercial) but if pfsense fails to work on both our R200 systems.. ? :(

    Is it worth trying the 6.3 freebsd version instead of the 7.0 (1.2.1 RC?) Can anybody tell me where I can find this iso?



  • Things I would try.
    Update bios if possible.
    Use pfSense version 2.0 http://snapshots.pfsense.org/FreeBSD7/RELENG_1/
    Remove Riser Cards, using onboard nic's with vlan



  • Version 2.0 in a production environment wouldn't be a good idea.. even 1.2.1 is not considered "stable".

    Update bios I could try, but are these errors related to the nics only? The strange thing is that it happens on both the intel and the broadcom cards.

    Maybe hardware checking on the nics?



  • This isn't hardware related. R200s work fine with 1.2.1.



  • I have two of the exact same machines which have the same problem? So i'm curious why you are so sure this is not a hardware problem :)?

    Before the R200's were installed, a previous machine was running 1.2 stable fine.. with the same network/switches etc.

    I'm going to the datacenter in a minute, to connect the pfsense machine directly to the onboard SATA instead of using the Raid controller.

    Also going to try a bios update and a single threaded base os instead of a multithreaded.



  • Arg.. spent 2 hours trying to fix it in the data center :

    • Checked bios version (latest was already installed)
    • Did a complete reinstall of the dell R200 –> set uni processor instead of multi
    • Changed disk setup from onboard raid to --> sata (single drive)
    • Turned off tso for all interfaces
    • Switched lan/wan interface to expansion card
    • Tested both Dell R200 units
    • Did memtest on both units

    Still getting the hdr length messages..

    
    11-04-2008	23:18:07	Local0.Info	Nov  4 23:18:40 pf: 211054 rule 145/0(match): block in on bge0: (tos 0x0, ttl 54, id 22968, offset 0, flags [DF], proto TCP (6), length 60) 80.85.189.226.2325 > x.x.130.130.23:  tcp 24 [bad hdr length 16 - too short, < 20]
    11-04-2008	23:18:04	Local0.Info	Nov  4 23:18:37 pf: 1\. 009550 rule 145/0(match): block in on bge0: (tos 0x0, ttl 54, id 20379, offset 0, flags [DF], proto TCP (6), length 60) 80.85.189.226.2325 > x.x.130.130.23:  tcp 28 [bad hdr length 12 - too short, < 20]
    
    

    To be safe I connected the old firewall… (P4 2.4ghz machine) on pfsense 1.2 to the exact same network (Just swapped network cables) and the bad hdr length messages are gone.. (or are not displayed in syslog on 1.2?)

    I'm running out of idea's.. anybody know how to solve this issue?



  • Is the firewall not performing as expected, or are the errors in the log the only problem?
    This seems to suggest that it may be a cosmetic issue due to the default snaplength of tcpdump:
    (http://kerneltrap.org/mailarchive/freebsd-pf/2008/10/28/3840344)
    _> In some of these lines, there is mention of "[bad hdr length 0 - too

    short, < 20]" BUT NOT IN ALL.

    That's because you're using tcpdump against a pflog interface.  You need
    to increase the snaplen from 68 bytes to something larger; try -s 256
    and that message will go away.  It's harmless._

    This is from the tcpdump man page:
    If  the snapshot was small enough that tcpdump didn't capture the full
          TCP header, it interprets as much of the header as  it can  and  then
          reports [|tcp]'' to indicate the remainder could not be interpreted.       If the header contains a bogus option (one with a length that's either       too  small  or  beyond  the  end  of the header), tcpdump reports it as[bad opt]'' and does not interpret any further  options  (since  it's
          impossible  to  tell where they start). If the header length indicates
          options are present but the IP datagram length is not long  enough  for
          the  options  to  actually  be  there, tcpdump reports it as ``[bad hdr
          length]''.



  • Is there anyway to verify/check that it's just cosmetic?

    Is there a way to modify the tcdump output to syslog? I have a rulle to allow all and this is set to log, after that I have my block rules (this way I acn log all traffic with a syslog daemon).



  • This was caused by a FreeBSD regression, which we have worked around now. Update to a new 1.2.1 snapshot and it should be gone. Let us know how it goes.



  • Thanks CMB, I will update the firewall tonight (it's in production) and have it rebooted.

    I will verify the logs tomorrow morning and report back.

    Right now the firewall is installed with a singlethreaded base OS (there is a xeon with 2 cores in the system) and setup without raid.

    I did this to troubleshoot the firewall, is it safe to put it back on the SAS 6i/R Internal Controller RAID PCIe? Also reinstall with a multithreaded base instead of single?

    Or should I leave it connected without raid 1 directly to the motherboard sata ports?

    edit: I assume the snapshot is also available as an iso? pfSense-20081105-1030.iso.gz ?



  • Still no hdr messages in the logs!! So far so good!! Thanks a bunch guys!!

    I'm still running on in " safe mode"  though, meaning, I'm not using the SAS raid controller or the multithreaded base OS.

    Not sure If I should reinstall with multithreaded os and raid controller.



  • hey AudiAddict,

    any new developments over the weekend?  we were getting ready to pull the trigger on a pair of R200s w/ the SAS6iR controllers and the onboard nics specifically for pfsense.  it sounds like you've come to some resolution but i didn't know if you were out of "safe mode" yet and whether or not you're on the road a dell/pfsense utopia, etc.

    just curious,

    -dp



  • Hey Plunger,

    Right now I'm running stable on non raid R200 onboard Sata with 1.2.1RC .

    With the following settings :

    • One 7200RPM Disk (Western Digital 160GB)
    • Directly connected to SATA Port 1
    • Install done with external DVD burner (dvd drive in the r200 didn't work properly)
    • Intel VT turned off in bios
    • Xeon 2.0GHZ cpu with 2 cores enabled in bios
    • Running SINGLE threaded and not multithreaded Pfsense install

    This seems to be rock stable, will try the sas 6ir raid 1 setup + multithreaded pfsense version this weekend.

    I'm indeed a dell guru ;-) I manage about 70 dell servers  8)



  • Hi AudiAddict,

    Have you tried the multithreaded pfSense install yet?



  • I'm running on a Singlethreaded OS, which is running perfect for 7 days now.

    I've done a major DDOS on it and it had no problems taking that on singlethreaded. So i'm going to leave it for now.

    I've been to the datacenter and reinstalled too often to try another trial of multithreaded etc etc.


Log in to reply