Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    pfsense 2.4.4 Rel.2 checksum error / after reboot fine for 20 sec

    Scheduled Pinned Locked Moved General pfSense Questions
    18 Posts 3 Posters 1.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      fgro79
      last edited by fgro79

      Hello,

      i am facing the problem, that my pfsense throws check sum errors even tpc offloading is disabled. while i reboot the box i can reach the services behind the pfsense (INTERNAL => DMZ) without any problems. But after 20 seconds it got stucked again. All Services from outside the LAN can be reached without any issues at all 😉

      The problem is, that not all of the services gaining errors. Services like ssh or ping doing fine

      So i am asking here if someone also noticed this beheavior?

      thanks

      1 Reply Last reply Reply Quote 0
      • stephenw10S
        stephenw10 Netgate Administrator
        last edited by

        Works but then fails after 20s sounds like asymmetric routing:
        https://docs.netgate.com/pfsense/en/latest/firewall/troubleshooting-blocked-log-entries-due-to-asymmetric-routing.html

        Do you see blocked traffic in the firewall log when it fails?

        Steve

        1 Reply Last reply Reply Quote 0
        • F
          fgro79
          last edited by fgro79

          Gateway set when it should not be set

          If a gateway is set on an internal interface, such as LAN, it can cause problematic behavior. Setting a gateway on an internal interface will tag that interface’s outbound rules with route-to, and inbound rules with reply-to which will cause packets to be forwarded to the defined gateway rather than following their natural path. For WANs this is typically a good thing! For LANs it is not. Among other ill effects, it can lead to a loop of sorts where packets bounce between the firewall and the defined gateway, eventually being blocked or dropped when their TTL expires.

          I don't think this is the case, since protocols TCP on Port 22 passes to the host without issues. Also https on port 445 (non nginx - Apache Service) passes the firewall to the dedicated host as well. Only https gets lost to the nginx server.

          so when there would be a "sloppy state" why are not other packets / protocols / ports affected?

          Addition: the both interfaces don't have gateways installed, since they are real subnet interfaces

          VLAN_DMZ => 192.168.13.2
          INTERNAL => 192.168.11.2
          

          So in general the pfsense box takes over to route into the specified LANS and the firewall rules are in place to allow communication between both networks.

          1 Reply Last reply Reply Quote 0
          • stephenw10S
            stephenw10 Netgate Administrator
            last edited by

            I would expect all TCP traffic to be affected. There would need to be some other route involved too so some thing else routing between Internal and DMZ.

            Do you see blocked traffic in the firewall log though?

            Steve

            1 Reply Last reply Reply Quote 0
            • F
              fgro79
              last edited by fgro79

              no I don't see any blocked traffic, if this would be the case I would open the port. So the firewall rule logs are empty. I made a pcap to the specified service from the VLAN_DMZ:

              21:36:00.307570 0c:c4:7a:ad:28:85 > 00:0c:29:dd:25:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
                  192.168.13.2.21281 > 192.168.13.17.443: Flags [S], cksum 0x769e (correct), seq 3168589996, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 871614 ecr 0], length 0
              21:36:00.307757 00:0c:29:dd:25:02 > 0c:c4:7a:ad:28:85, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
                  192.168.13.17.443 > 192.168.13.2.21281: Flags [S.], cksum 0xcb12 (correct), seq 4180594802, ack 3168589997, win 28960, options [mss 1460,sackOK,TS val 5861164 ecr 871614,nop,wscale 7], length 0
              21:36:00.307797 0c:c4:7a:ad:28:85 > 00:0c:29:dd:25:02, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
                  192.168.13.2.21281 > 192.168.13.17.443: Flags [.], cksum 0x68fe (correct), seq 1, ack 1, win 513, options [nop,nop,TS val 871614 ecr 5861164], length 0
              21:36:00.308000 0c:c4:7a:ad:28:85 > 00:0c:29:dd:25:02, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
                  192.168.13.2.21285 > 192.168.13.17.443: Flags [S], cksum 0x30a5 (correct), seq 3470265510, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 871614 ecr 0], length 0
              21:36:00.308105 00:0c:29:dd:25:02 > 0c:c4:7a:ad:28:85, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
                  192.168.13.17.443 > 192.168.13.2.21285: Flags [S.], cksum 0x9d81 (correct), seq 782795409, ack 3470265511, win 28960, options [mss 1460,sackOK,TS val 5861164 ecr 871614,nop,wscale 7], length 0
              

              This is from the Client:

              21:38:49.994816 60:f8:1d:cb:cb:9c > 0c:c4:7a:ad:28:86, ethertype IPv4 (0x0800), length 78: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
                  192.168.11.81.63884 > 192.168.13.17.443: Flags [S], cksum 0xbf2a (correct), seq 1173459999, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 119996831 ecr 0,sackOK,eol], length 0
              

              wireshark

              routing table

              here is an pcap from ssh port to the IP:

              22:15:36.684419 0c:c4:7a:ad:28:85 > 00:0c:29:dd:25:02, ethertype IPv4 (0x0800), length 102: (tos 0x48, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 88)
                  192.168.11.81.65076 > 192.168.13.17.22: Flags [P.], cksum 0x68ed (correct), seq 920942514:920942550, ack 1142215424, win 2048, options [nop,nop,TS val 122192186 ecr 6446297], length 36
              22:15:36.684770 00:0c:29:dd:25:02 > 0c:c4:7a:ad:28:85, ethertype IPv4 (0x0800), length 102: (tos 0x10, ttl 64, id 36928, offset 0, flags [DF], proto TCP (6), length 88)
              
              1 Reply Last reply Reply Quote 0
              • stephenw10S
                stephenw10 Netgate Administrator
                last edited by

                The only thing that looks like an issue there is that one TCP retransmission packet. The checksums all look good there too.

                Need to see more of the failing traffic. Or a pcap covering the traffic at the point where it fails.

                Steve

                1 Reply Last reply Reply Quote 0
                • F
                  fgro79
                  last edited by fgro79

                  revert - still no sucess. I reinstalled the whole pfsense and still cant access the PORT 443 over the pfsense. This is still Port 443 related. 🐛

                  i have uploaded the pcap here, if you like to view the scenario pcap

                  Ther filter needs to be set as: ip.addr eq 192.168.13.17 and tcp.port==443 to view the problem of the source host.

                  Still in debugging this error, and yes it works while the pfsense reboots. so i dont know what the pfsense does in the first 20 seconds.

                  Routeing Table looks like:

                  Destination        Gateway            Flags     Netif Expire
                  default            4x.237.xx.1       UGS        igb0
                  1.1.1.1            4x.237.xx.1       UGHS       igb0
                  4.2.2.2            4x.237.xx.1       UGHS       igb0
                  8.8.4.4            4x.237.xx.1       UGHS       igb0
                  4x.237.xx.0/26    link#1             U          igb0
                  4x.237.xx.8       link#1             UHS         lo0
                  4x.237.xx.10      link#4             UHS         lo0
                  4x.237.xx.11      link#2             UHS         lo0
                  4x.237.xx.12      link#3             UHS         lo0
                  82.212.30.1        4x.237.xx.1       UGHS       igb0
                  95.xx.xx.178       4x.237.xx.1       UGHS       igb0
                  127.0.0.1          link#10            UH          lo0
                  192.168.10.0/24    link#8             U          igb7
                  192.168.10.2       link#8             UHS         lo0
                  192.168.11.0/24    link#7             U          igb6
                  192.168.11.2       link#7             UHS         lo0
                  192.168.13.0/24    link#6             U          igb5
                  192.168.13.2       link#6             UHS         lo0
                  192.168.177.0/24   link#5             U          igb4
                  192.168.177.10     link#5             UHS         lo0
                  

                  all of the sudden now ssh gets also lost. So there might be a problem between pfsense and the network.

                  How does my setup look like: I have for each of the subnet a single interface. DMZ uses its own and INTERNAL as well. There are no routing table entries available since i thought pfsense will take care of it.

                  • DMZ (igb5) 192.168.13.2 (pfsense)
                  • INTERNAL (igb6 192.168.11.2 (pfsense)
                  1 Reply Last reply Reply Quote 0
                  • F
                    fgro79
                    last edited by fgro79

                    ok - needs some more help, i believe to figured the problem. i have multiple WAN Interfaces. To get the right gateway i add different rules in. might be the problem resides there

                    1 Reply Last reply Reply Quote 0
                    • stephenw10S
                      stephenw10 Netgate Administrator
                      last edited by

                      Just based on the time it takes it's almost certainly some state timeout.

                      If you have policy based rules on in the source interface for WAN failover you would need have a rule above that to reach the other interface(s) without being policy routed.

                      It's possible that rule did not apply before the WANs come up at boot so some traffic was initially allowed.

                      Steve

                      1 Reply Last reply Reply Quote 0
                      • F
                        fgro79
                        last edited by fgro79

                        ok, so lets get this streight: the docs mentioned that:

                        On Firewall > Rules, visit the tab for the internal interface to be used with the gateway group, either edit the existing pass rules and add the gateway setting, choosing the desired gateway, or add a new rule to match only certain traffic to direct into the gateway group. Remember that rules are processed from the top down, and once a rule is matched, processing stops.
                        

                        So in order from the DMZ i would add a rule which states

                        SOURCE: VLAN_DMZ => DEST: !RFC_NETWORK => GW: GW1
                        

                        ath the INTERNAL NET I would do a different rule, since i want to make sure INTERNAL Outbound traffic goes to GW3

                        SOURCE: INTERNAL => DEST: !RFC_NETWORK => GW: GW3
                        

                        all of them i would add as first rule? since yet, i had them as last rule.

                        But still i get this state:

                        LAN 	tcp 	192.168.11.62:60664 -> 192.168.13.17:443 	CLOSED:SYN_SENT 	2 / 0 	104 B / 0 B 	
                        VLAN_DMZ 	tcp 	192.168.11.62:60664 -> 192.168.13.17:443 	SYN_SENT:CLOSED 	2 / 0 	104 B / 0 B
                        

                        and no connection :/

                        1 Reply Last reply Reply Quote 0
                        • stephenw10S
                          stephenw10 Netgate Administrator
                          last edited by

                          Yes, those rules should not catch that traffic as long as the RFC_NETWORK alias has been set correctly.

                          That just looks like 192.168.13.17 is not responding. 2 packets to it going over both intefaces, 0 packets back.

                          Steve

                          1 Reply Last reply Reply Quote 0
                          • F
                            fgro79
                            last edited by fgro79

                            so i have set up this properly but why is this not working for the port 443? I adedd haproxy tcp mode to the wan interface.

                            WAN => 443 => HAPROXY (tcp/ssl mode) => tcp rule to matching ssl name to the specified server
                            

                            since i addes this i am not able to communicate back to the desired server over the port 443. If i add SSH Port to the tcp mode function to the haproxy i believe i will not get the SSH back to live on a internal net.

                            This drive me crazy and no solution nor failures to find.

                            1 Reply Last reply Reply Quote 0
                            • stephenw10S
                              stephenw10 Netgate Administrator
                              last edited by

                              Wait you have HAProxy in there running on those ports?

                              If you disable HAProxy does it all work as expected?

                              Steve

                              1 Reply Last reply Reply Quote 0
                              • F
                                fgro79
                                last edited by

                                Nope, HAProx runs on External IF (igb0/igb3) and catches from there at 443 to internal server. igb5/6 are standalone.

                                1 Reply Last reply Reply Quote 0
                                • stephenw10S
                                  stephenw10 Netgate Administrator
                                  last edited by

                                  Right that's what it's configured to do but you just said:

                                  since i addes this i am not able to communicate back to the desired server over the port 443.

                                  So before you added HAProxy it was working?

                                  Steve

                                  1 Reply Last reply Reply Quote 0
                                  • F
                                    fgro79
                                    last edited by

                                    yes thats what i am saying, but i dont know if this belongs together. Before i addedd the haproxy into play (fresh install) i was able to browse internally to the dedicated https port. So i would assume, something is broken within haproxy while its redericting ....

                                    P 1 Reply Last reply Reply Quote 0
                                    • P
                                      PiBa @fgro79
                                      last edited by

                                      @fgro79
                                      Sorry for late reply, but anyhow.

                                      I suspect if you disable the haproxy transparent-client-ip feature on the backend your troubles of reaching the webserver directly are gone. That feature comes with a warning message for a reason..

                                      1 Reply Last reply Reply Quote 0
                                      • F
                                        fgro79
                                        last edited by

                                        i noticed, sorry i can not update my configs yet since the i am facing the issue described in here

                                        So i need to wait until i can modify or downgrade the system to safely remove the transparent-client-ip feature. I was going to use this feature for internal smtp server to forward the original IP.

                                        1 Reply Last reply Reply Quote 0
                                        • First post
                                          Last post
                                        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.