Issues when HAProxy started

jokabo

Hello,

For some days (maybe after my upgrade from proxmox 6 to 7, but this does not have to be the same reason) I have had problems with ports used by haproxy backends.

It is hard to explain for me because it does not really make sense. I will try my best.

I have two netgate appliances with the latest pfSense+ Version installed. HAProxy was installed via the package manager and runs for more then a year very stable.

Setup:
This is only an example. I have the same issue on the same setup with proxmox mail gateway (internal port 26) and an nginx-cluster (port 80/443).

I have a Galera database cluster with two maxscale instances in front and before the maxscale, I have the haproxy served by pfSense.

So:

HAProxy Frontend with 2 Backend-Nodes
Both backends are maxscale on port 3306
Maxscale do the balancing to the Galera data nodes

Problem:
When I now try to connect directly to the backend server, I got a timeout. Example:

HAProxy Frontend: 10.0.10.1
Backend:

Maxscale 1: 10.0.10.5
Maxscale 2: 10.0.10.6

From my computer (10.100.2.10):
telnet 10.0.10.1 3306 => OK
telnet 10.0.10.5 3306 => Fail
telnet 10.0.10.6 3306 => Fail

(IP Adresses are fake addresses, just to clarify).

On pfSense Firewall Log I can see the approved traffic.

With tcpdump I can see one entry:

16:17:43.639539 IP 10.100.2.10.47040 > <maxscale-host>.mysql: Flags [S], seq 617035827, win 64240, options [mss 1328,sackOK,TS val 3064508424 ecr 0,nop,wscale 7], length 0

When I to telnet from MaxScale2 or from localhost => working

And the really strange thing begin:
I changed the port in maxscale from 3306 to 3307 and did telnet again.
WORKING. HAProxy marked the node as offline (because still has the backend on 3306).

Then I changed the port in HAProxy from 3306 to 3307 for the one member and what happened? I again can't access the port 3307.

Switch back to maxscale port 3306 - access to that port from PC is working. Switching HAProxy from 3307 back to 3306: again, I can't access the port from remote subnet (example my PC).

The same game with Proxmox Mail Gateway and port 26. I switched to port 27, 28 and it's directly working. But as soon as I change the backend port in HAProxy, I was unable to access this port.

Any idea?

My first thought was about some hanging connections, but as I can see its not much (about max 10 connections to the ports).
There is no local firewall. No fail2ban.
HAProxy can still serve traffic from and to the ports. Just the directly access is not working from external subnets. Like... in my feeling.. haproxy reserves the whole port (I don't think it's possible, but it looks/feels like this).

This is my generated HAProxy config. I just removed all other backends / frontends and sensitive names so we can focus on the two cases:

# Automaticaly generated, dont edit manually.
# Generated on: 2022-06-18 17:19
global
	maxconn			10000
	stats socket /tmp/haproxy.socket level admin  expose-fd listeners
	gid			80
	nbproc			1
	nbthread			1
	hard-stop-after		15m
	chroot				/tmp/haproxy_chroot
	daemon
	server-state-file /tmp/haproxy_server_state

listen HAProxyLocalStats
	bind 127.0.0.1:2200 name localstats
	mode http
	stats enable
	stats admin if TRUE
	stats show-legends
	stats uri /haproxy/haproxy_stats.php?haproxystats=1
	timeout client 5000
	timeout connect 5000
	timeout server 5000

frontend prod_maxscale
	bind			10.0.48.34:3306 name 10.0.48.34:3306   
	mode			tcp
	log			global
	timeout client		3000
	default_backend prod_maxscale_ipv4

frontend staging_mailgateway_out
	bind			10.0.33.140:26 name 10.0.33.140:26   
	mode			tcp
	log			global
	timeout client		30000
	default_backend staging_mailgateway_out_ipv4

backend prod_maxscale_ipv4
	mode			tcp
	id			10109
	log			global
	option			log-health-checks
	timeout connect		30000
	timeout server		30000
	retries			3
	source ipv4@ usesrc clientip
	server			fl0-mariadb-cluster-lb01.prod.mydomain 10.0.48.41:3307 id 10103 check inter 1000  weight 2 
	server			fl0-mariadb-cluster-lb02.prod.mydomain 10.0.48.42:3306 id 10104 check inter 1000 backup weight 1 

backend staging_mailgateway_out_ipv4
	mode			tcp
	id			10114
	log			global
	balance			roundrobin
	timeout connect		30000
	timeout server		30000
	retries			3
	source ipv4@ usesrc clientip
	option			smtpchk HELO 
	server			fra1-pmgc01-m01.prod.mydomain 10.0.33.141:26 id 10107 check inter 10000  
	server			fra1-pmgc01-m02.prod.mydomain 10.0.33.142:26 id 10108 check inter 10000

Thanks!