Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Any known issues with HAproxy on 2.5.2?

    Scheduled Pinned Locked Moved General pfSense Questions
    40 Posts 3 Posters 2.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • stephenw10S
      stephenw10 Netgate Administrator
      last edited by

      Yes, multiple gateways on one network segment opens numerous possibilities to fail!

      Using HAProxy you could use a backend that was not using the firewall as it's default route. That would fail if you then tried to use a port forward instead.
      Since that's not what you're seeing here you are not hitting that particular failure mode but you need to be aware of it when using a network setup like that.

      Steve

      1 Reply Last reply Reply Quote 0
      • L
        lewis
        last edited by lewis

        I guess it means there's something I'm not understanding :).

        I've always had devices on different subnets communicating together with and without firewalls.

        Devices using 172.16.x.x talk with others on 172.16.x.x, 10.0.0.x talk with devices on 10.0.0.x and 192.168.0.x talk with others on the same.
        Some use a gateway, some don't depending on what their tasks are.

        In this case, it's that there is only one physical cable to get a LAN from one point to another but on that cable, there are now two firewalls, 10.0.0.x and 10.1.1.x. Only one firewall has DHCP, the other doesn't.

        Devices on the first firewall have that one as their gw and communicate with other devices on the same network subnet.
        Devices on the second firewall have that one sa their gw and communicate with other devices on the same network subnet.

        This conversation keeps diluting the problem with haproxy but you think there is a possibility that haproxy is not working well because of the above network.

        I've not seen any problems so based on your input, there must be something I am missing.

        Devices communicate with their own gw. The only time it was weird was while ARP was cached all over and one left over rule was overlooked.

        I've not seen any problems since other than this haproxy and not being able to update the firewall.

        Using HAProxy you could use a backend that was not using the
        firewall as it's default route.

        This firewall is only working with devices that have the same network which is 10.0.0.x/24.
        The back end servers are all on the same 10.0.0.x/24 and have the above as their gw.

        That would fail if you then tried to use a port forward instead.

        I think you are saying if I used 10.0.0.1 firewall with haproxy and sent traffic from that to 10.1.1.x/24 devices? Not doing that for sure :).

        It would not work anyhow since the devices on 10.1.1.x have their gw as 10.1.1.1 so traffic would not get to them without funky a config using vips or something and their outgoing traffic would want to go out the 10.1.1.1 gw.

        Since that's not what you're seeing here you are not hitting that particular >failure mode but you need to be aware of it when using a network setup >like that.

        Ok, I think you're just warning me not to do stuff like that. I agree, I won't be doing that.

        I believe you helped me when I was setting all this up and with some other problems and I've learned quite a lot, even if I don't yet remember it all just yet.

        1 Reply Last reply Reply Quote 0
        • stephenw10S
          stephenw10 Netgate Administrator
          last edited by

          Yes, just be aware it would be very easy to introduce asymmetry and I've seen that bite people many, many times!

          If you really are seeing an issue in HAProxy then a pcap should prove it.

          I would expect to see something logged though.

          Was this working in the old network setup?

          Steve

          1 Reply Last reply Reply Quote 0
          • L
            lewis
            last edited by

            We had the proxy going for the past couple of years approximately.
            During that time, we've had lots of complaints about 500/504 but always blamed our own resources, never once thinking it could be the proxy.

            So to answer your question, there is really no way to know other than when I posted this, that was around the time we realized what was happening.

            We had taken the proxy out of the mix to do some testing so it was off for maybe a week. Then when we re-enabled it, the timeout complaints started again which got me wondering what was going on. That's when I disabled it again and since then, the complaints stopped and we too were no longer getting them.

            We know one problem was a back end one in that there was an issue with the database and it wasn't responding fast enough causing 504's but we were aware of those and could see them in the logs.

            1 Reply Last reply Reply Quote 0
            • stephenw10S
              stephenw10 Netgate Administrator
              last edited by

              If something is responding with a 500 or 504 error that will be logged somewhere. That's not just a failure to respond at all. If that's HAProxt responding with that then I'd expect to see some other errors logged.

              1 Reply Last reply Reply Quote 0
              • L
                lewis
                last edited by

                What I mean is that we know about the 500/504 errors because we see them on the LAN side when we have problems.

                However, when users get them because they cannot reach the site, there aren't any errors that we've logged because we simply didn't think it was the load balancer.

                We would have to set up some kind of test to see if we can log but that will take a little time. We ended up upgrading a bunch of things, adding hardware, the multi-firewall thing and so on. Since we could not find the problem, we simply blamed ourselves after weeks of searching.

                It all got better yesterday when I removed the last server from the proxy.

                Now I'm more concerned about this segfault thing I'm seeing and not being able to upgrade. That feels like imminent failure to me.

                1 Reply Last reply Reply Quote 0
                • stephenw10S
                  stephenw10 Netgate Administrator
                  last edited by

                  There's nothing in the system log following the upgrade attempt?

                  1 Reply Last reply Reply Quote 0
                  • L
                    lewis
                    last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • L
                      lewis
                      last edited by lewis

                      Ah, here we are.

                      May 16 21:31:06 sshd 79909 Accepted keyboard-interactive/pam for root from x.x.x.x port xxx ssh2
                      May 16 21:31:20 kernel pid 59117 (pkg-static), jid 0, uid 0: exited on signal 11 (core dumped)
                      May 16 21:31:26 kernel pid 87625 (pkg-static), jid 0, uid 0: exited on signal 11 (core dumped)

                      Reboot will be required!!
                      Proceed with upgrade? (y/N) y
                      >>> Removing vital flag from php74... done.
                      >>> Downloading upgrade packages...
                      Updating pfSense-core repository catalogue...
                      pfSense-core repository is up to date.
                      Updating pfSense repository catalogue...
                      pfSense repository is up to date.
                      All repositories are up to date.
                      Checking for upgrades (201 candidates): ....
                      Child process pid=87625 terminated abnormally: Segmentation fault
                      pfSense - Netgate Device ID: xxx
                      
                      

                      Unrelated?
                      May 13 08:00:32 php-fpm 72646 /services_dhcp_edit.php: The command '/usr/sbin/arp -d '10.0.0.100'' returned exit code '1', the output was 'arp: writing to routing socket: No such file or directory'

                      I'm feeling a little nervous that this firewall is going to crash at some point.

                      1 Reply Last reply Reply Quote 0
                      • stephenw10S
                        stephenw10 Netgate Administrator
                        last edited by

                        The arp log is unrelated, something trying to remove an ARP entry that's already been removed.
                        That is unusual though. Do you have static ARP entries set?

                        Do you have Zabbix Agent installed? Specifically the obsolete 5_2 version?
                        If so you are probably hitting this: https://redmine.pfsense.org/issues/12796

                        Removing that before the upgrade should allow it.

                        Steve

                        1 Reply Last reply Reply Quote 0
                        • L
                          lewis
                          last edited by

                          I showed the arp log part because it's having a problem doing that which makes me nervous that the os might be getting messed up or something.

                          I do have static MAC/IP entries in the DHCP server. It's how I keep track of all the equipment. If first gets a DHCP IP which is how I identify it on the network so I enter a static entry into the DHCP server.

                          Yes, zabbix 5.2 is installed on this firewall. Removed.
                          The haproxy was a little out of date so that's updated now.

                          I'll try running the upgrade later today and see how it goes.

                          1 Reply Last reply Reply Quote 1
                          • stephenw10S
                            stephenw10 Netgate Administrator
                            last edited by

                            Static DHCP mappings are not the same as static ARP entries. You can enable static ARP on static dhcp mappings but it's almost always unnecessary and can cause problems.

                            https://docs.netgate.com/pfsense/en/latest/services/dhcp/ipv4.html#static-mappings

                            Steve

                            L 1 Reply Last reply Reply Quote 0
                            • L
                              lewis @stephenw10
                              last edited by

                              @stephenw10

                              Understood. Just saying I don't have any static ARP, just DHCP mappings I maintain.

                              I'll try the upgrade again tonight I hope.

                              1 Reply Last reply Reply Quote 1
                              • L
                                lewis
                                last edited by

                                Well, that worked, thanks so much. Feels a bit better seeing it upgrade and upgraded.

                                No idea how I'm going to test the proxy as I've decided to do something different. Have not gone back to it since finding the problem.

                                1 Reply Last reply Reply Quote 1
                                • First post
                                  Last post
                                Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.