HAProxy: 503 errors on 2 domains

oguruma

I just moved to a PfSense box from a Ubiquiti Dream Machine. With the UBNT, I had HaProxy set up in a VM, and I forward all the port 80/443 traffic to that. I had the HAProxy VM handle the letsencrypt SSL. It worked as intended with all 3 of the domains I host.

Now, with PFSense (2.7.X), I get Connection Refused errors on 2 of the 3 domains, even though the HAProxy configuration looks to be the same. The web server that does work is Vaultwarden being hosted by my TrueNAS box. The two that don't work are an ERPNext VM and Nextcloud being hosted the TrueNAS box.

The 3 backends are all identical: Port 80 (on their respective IPs), No SSL, No Encrypt. HTTP method for health check.

I have a single shared frontend with ANY IPV4:80 and ANY IPV4:443 with SSL offloading enabled for the 443 listener.

In the frontend, I added all of the ACME certs (which are hosted on PfSene as well).

My (working) config from the HAProxy VM I was using was

frontend www
bind *:80
bind *:443 ssl crt /etc/letsencrypt/live/...

http-request redirect scheme https unless {ssl_fc}

acl ACL_domain1.com hdr(host) -i domain1.com
use_backend backend1 if ACL_domain1.com

acl ACL_domain2.com hdr(host) -i domain2.com
use_backend backend2 if ACL_domain2.com

acl ACL_domain3.com hdr(host) -i domain3.com
use_backend backend3 if ACL_domain3.com

backend backend1
 server srvr1 10.1.1.5:80

backend backend2
 server srvr2 192.168.2.1:9002

backend backend3
 server srvr3 192.168.2.1:9010

...

HAProxy in PFSense gives me an error message telling me backend2 and 3 are down, and I get connection refused messages from both of those. domain1.com works as intended with SSL.

kiokoman

@oguruma
for the moment delete the backend and create it again with health check -> none (changing only the health method does not work, at least for me never worked)
and try if it work

post the config from pfsense , bottom of Settings there is "SHOW" automatically generated configuration.

all domains are hosted on the same truenas ? same ip?

oguruma

@kiokoman said in HAProxy: 503 errors on 2 domains:

@oguruma
for the moment delete the backend and create it again with health check -> none (changing only the health method does not work, at least for me never worked)
and try if it work

post the config from pfsense , bottom of Settings there is "SHOW" automatically generated configuration.

all domains are hosted on the same truenas ? same ip?

2 of the domains are TrueNAS apps (Kubernetes), and one of the domains is a web server installed in a Debian VM (which is hosted by TrueNAS). Note:Config below doesn't have one of the domains listed, but the third domain (also a K8s app on TrueNAS) doesn't work.

# Automaticaly generated, dont edit manually.
# Generated on: 2023-12-18 12:41
global
	maxconn			100
	stats socket /tmp/haproxy.socket level admin  expose-fd listeners
	uid			80
	gid			80
	nbthread			1
	hard-stop-after		15m
	chroot				/tmp/haproxy_chroot
	daemon
	server-state-file /tmp/haproxy_server_state

frontend http-shared
	bind			0.0.0.0:80 name 0.0.0.0:80   
	bind			0.0.0.0:443 name 0.0.0.0:443   ssl crt-list /var/etc/haproxy/http-shared.crt_list  
	bind /tmp/haproxy_chroot/http-shared.socket name unixsocket uid 80 accept-proxy   ssl crt-list /var/etc/haproxy/http-shared.crt_list 
	mode			http
	log			global
	option			httplog
	option			http-keep-alive
	timeout client		30000
	acl			starts-domain1	var(txn.txnhost) -m beg -i domain1.com
	acl			starts-domain2	var(txn.txnhost) -m beg -i domain2.com
	acl			aclcrt_http-shared	var(txn.txnhost) -m reg -i ^domain1(:([0-9]){1,5})?$
	acl			aclcrt_http-shared	var(txn.txnhost) -m reg -i ^domain2(:([0-9]){1,5})?$
	http-request set-var(txn.txnhost) hdr(host)
	use_backend domain1_ipvANY  if  starts-domain1 aclcrt_http-shared
	use_backend domain2_ipvANY  if  starts-domain2 aclcrt_http-shared

backend domain1_ipvANY
	mode			http
	id			102
	log			global
	timeout connect		30000
	timeout server		30000
	retries			3
	load-server-state-from-file	global
	server			domain1server /http-shared.socket send-proxy-v2-ssl-cn id 103  

backend domain2_ipvANY
	mode			http
	id			100
	log			global
	http-check		send meth OPTIONS
	timeout connect		30000
	timeout server		30000
	retries			3
	load-server-state-from-file	global
	option			httpchk
	server			domain2server 192.168.2.1:9002 id 101 check inter 1000

kiokoman

@oguruma
domain1 -> debian -> work
domain2 -> kubernetes -> not working
domain3 > kubernetes -> not working

right?

did you try to remove httpchk ?
can you try Curl from another machine to 192.168.2.1:9002 and see if you receve an answer?

i have 30 pods running behind my haproxy but there isn't any particular settings apart from a couple that needed "timeout tunnel 3600s"
what services are they running? apache/nging or some kind of service?

oguruma

@kiokoman

domain1 -> debian -> not working
domain2 -> kubernetes -> working
domain3 -> kubernetes -> not working

If I use DNS Resolver to send domain1.com to 10.1.1.5 (bypassing HAProxy), it loads just fine, albeit without SSL of course.

The debian VM is ERPNext/Frappe which is basically python and nginx.

domain3 is Nextcloud, which is PHP via Apache.

kiokoman

@oguruma
HAproxy 503 Service Unavailable No server is available to handle this request is passed when the http check fail for some reason even if the service is up and running

like in this post https://serverfault.com/a/886319

you need to adjust that option in a way that it receve a valid response from the server or disable httpchk

oguruma

@kiokoman said in HAProxy: 503 errors on 2 domains:

@oguruma
HAproxy 503 Service Unavailable No server is available to handle this request is passed when the http check fail for some reason even if the service is up and running

like in this post https://serverfault.com/a/886319

you need to adjust that option in a way that it receve a valid response from the server or disable httpchk

Thanks again for the help. I got it working by deleting both the frontends and the backends for the not-working domains and recreating them, making sure to disable health checks from the outset when creating the backends.

One thing that is curious is that I re-installed ERPNext on separate, vanilla VM and pointed the backend to that new VM with healthcheck enabled, and it worked fine...