Any known issues with HAproxy on 2.5.2?
-
I guess it means there's something I'm not understanding :).
I've always had devices on different subnets communicating together with and without firewalls.
Devices using 172.16.x.x talk with others on 172.16.x.x, 10.0.0.x talk with devices on 10.0.0.x and 192.168.0.x talk with others on the same.
Some use a gateway, some don't depending on what their tasks are.In this case, it's that there is only one physical cable to get a LAN from one point to another but on that cable, there are now two firewalls, 10.0.0.x and 10.1.1.x. Only one firewall has DHCP, the other doesn't.
Devices on the first firewall have that one as their gw and communicate with other devices on the same network subnet.
Devices on the second firewall have that one sa their gw and communicate with other devices on the same network subnet.This conversation keeps diluting the problem with haproxy but you think there is a possibility that haproxy is not working well because of the above network.
I've not seen any problems so based on your input, there must be something I am missing.
Devices communicate with their own gw. The only time it was weird was while ARP was cached all over and one left over rule was overlooked.
I've not seen any problems since other than this haproxy and not being able to update the firewall.
Using HAProxy you could use a backend that was not using the
firewall as it's default route.This firewall is only working with devices that have the same network which is 10.0.0.x/24.
The back end servers are all on the same 10.0.0.x/24 and have the above as their gw.That would fail if you then tried to use a port forward instead.
I think you are saying if I used 10.0.0.1 firewall with haproxy and sent traffic from that to 10.1.1.x/24 devices? Not doing that for sure :).
It would not work anyhow since the devices on 10.1.1.x have their gw as 10.1.1.1 so traffic would not get to them without funky a config using vips or something and their outgoing traffic would want to go out the 10.1.1.1 gw.
Since that's not what you're seeing here you are not hitting that particular >failure mode but you need to be aware of it when using a network setup >like that.
Ok, I think you're just warning me not to do stuff like that. I agree, I won't be doing that.
I believe you helped me when I was setting all this up and with some other problems and I've learned quite a lot, even if I don't yet remember it all just yet.
-
Yes, just be aware it would be very easy to introduce asymmetry and I've seen that bite people many, many times!
If you really are seeing an issue in HAProxy then a pcap should prove it.
I would expect to see something logged though.
Was this working in the old network setup?
Steve
-
We had the proxy going for the past couple of years approximately.
During that time, we've had lots of complaints about 500/504 but always blamed our own resources, never once thinking it could be the proxy.So to answer your question, there is really no way to know other than when I posted this, that was around the time we realized what was happening.
We had taken the proxy out of the mix to do some testing so it was off for maybe a week. Then when we re-enabled it, the timeout complaints started again which got me wondering what was going on. That's when I disabled it again and since then, the complaints stopped and we too were no longer getting them.
We know one problem was a back end one in that there was an issue with the database and it wasn't responding fast enough causing 504's but we were aware of those and could see them in the logs.
-
If something is responding with a 500 or 504 error that will be logged somewhere. That's not just a failure to respond at all. If that's HAProxt responding with that then I'd expect to see some other errors logged.
-
What I mean is that we know about the 500/504 errors because we see them on the LAN side when we have problems.
However, when users get them because they cannot reach the site, there aren't any errors that we've logged because we simply didn't think it was the load balancer.
We would have to set up some kind of test to see if we can log but that will take a little time. We ended up upgrading a bunch of things, adding hardware, the multi-firewall thing and so on. Since we could not find the problem, we simply blamed ourselves after weeks of searching.
It all got better yesterday when I removed the last server from the proxy.
Now I'm more concerned about this segfault thing I'm seeing and not being able to upgrade. That feels like imminent failure to me.
-
There's nothing in the system log following the upgrade attempt?
-
This post is deleted! -
Ah, here we are.
May 16 21:31:06 sshd 79909 Accepted keyboard-interactive/pam for root from x.x.x.x port xxx ssh2
May 16 21:31:20 kernel pid 59117 (pkg-static), jid 0, uid 0: exited on signal 11 (core dumped)
May 16 21:31:26 kernel pid 87625 (pkg-static), jid 0, uid 0: exited on signal 11 (core dumped)Reboot will be required!! Proceed with upgrade? (y/N) y >>> Removing vital flag from php74... done. >>> Downloading upgrade packages... Updating pfSense-core repository catalogue... pfSense-core repository is up to date. Updating pfSense repository catalogue... pfSense repository is up to date. All repositories are up to date. Checking for upgrades (201 candidates): .... Child process pid=87625 terminated abnormally: Segmentation fault pfSense - Netgate Device ID: xxx
Unrelated?
May 13 08:00:32 php-fpm 72646 /services_dhcp_edit.php: The command '/usr/sbin/arp -d '10.0.0.100'' returned exit code '1', the output was 'arp: writing to routing socket: No such file or directory'I'm feeling a little nervous that this firewall is going to crash at some point.
-
The arp log is unrelated, something trying to remove an ARP entry that's already been removed.
That is unusual though. Do you have static ARP entries set?Do you have Zabbix Agent installed? Specifically the obsolete 5_2 version?
If so you are probably hitting this: https://redmine.pfsense.org/issues/12796Removing that before the upgrade should allow it.
Steve
-
I showed the arp log part because it's having a problem doing that which makes me nervous that the os might be getting messed up or something.
I do have static MAC/IP entries in the DHCP server. It's how I keep track of all the equipment. If first gets a DHCP IP which is how I identify it on the network so I enter a static entry into the DHCP server.
Yes, zabbix 5.2 is installed on this firewall. Removed.
The haproxy was a little out of date so that's updated now.I'll try running the upgrade later today and see how it goes.
-
Static DHCP mappings are not the same as static ARP entries. You can enable static ARP on static dhcp mappings but it's almost always unnecessary and can cause problems.
https://docs.netgate.com/pfsense/en/latest/services/dhcp/ipv4.html#static-mappings
Steve
-
Understood. Just saying I don't have any static ARP, just DHCP mappings I maintain.
I'll try the upgrade again tonight I hope.
-
Well, that worked, thanks so much. Feels a bit better seeing it upgrade and upgraded.
No idea how I'm going to test the proxy as I've decided to do something different. Have not gone back to it since finding the problem.