Dnsmasq dying regularly after upgrade from 2.2.2 to 2.2.3
-
I just upgraded my office firewalls to 2.2.3 from 2.2.2, mainly for the openssl fixes. The upgrade itself progressed without issue but now dnsmasq is dying very frequently post-upgrade, so much so that I had to write a restart script that checks for its PID every 2 seconds and restarts if failing. Here's what I'm seeing in my dmesg (note this is from a restart roughly 2 hours ago):
carp: VHID 2@em1_vlan1: BACKUP -> MASTER (preempting a slower master)
ovpns1: link state changed to DOWN
ovpns1: link state changed to UP
pid 68760 (dnsmasq), uid 65534: exited on signal 11
pid 58617 (dnsmasq), uid 65534: exited on signal 11
pid 61503 (dnsmasq), uid 65534: exited on signal 11
arp: 192.168.1.114 moved from 70:56:81:9a:94:25 to b8:78:2e:57:a6:bc on em1_vlan1
arp: 192.168.1.114 moved from 70:56:81:9a:94:25 to b8:78:2e:57:a6:bc on em1_vlan1
pid 65112 (dnsmasq), uid 65534: exited on signal 11
pid 61131 (dnsmasq), uid 65534: exited on signal 11
pid 64404 (dnsmasq), uid 65534: exited on signal 11
pid 6542 (dnsmasq), uid 65534: exited on signal 11
pid 9089 (dnsmasq), uid 65534: exited on signal 11
pid 67017 (dnsmasq), uid 65534: exited on signal 11
pid 70543 (dnsmasq), uid 65534: exited on signal 11
pid 38794 (dnsmasq), uid 65534: exited on signal 11
pid 41698 (dnsmasq), uid 65534: exited on signal 11
pid 78629 (dnsmasq), uid 65534: exited on signal 11
pid 82340 (dnsmasq), uid 65534: exited on signal 11
pid 86055 (dnsmasq), uid 65534: exited on signal 11
pid 30145 (dnsmasq), uid 65534: exited on signal 11
pid 33429 (dnsmasq), uid 65534: exited on signal 11
pid 36217 (dnsmasq), uid 65534: exited on signal 11
pid 88993 (dnsmasq), uid 65534: exited on signal 11
pid 92712 (dnsmasq), uid 65534: exited on signal 11
pid 61158 (dnsmasq), uid 65534: exited on signal 11
pid 64693 (dnsmasq), uid 65534: exited on signal 11
pid 68395 (dnsmasq), uid 65534: exited on signal 11
pid 71601 (dnsmasq), uid 65534: exited on signal 11
pid 3580 (dnsmasq), uid 65534: exited on signal 11
pid 51100 (dnsmasq), uid 65534: exited on signal 11
pid 54020 (dnsmasq), uid 65534: exited on signal 11
pid 58547 (dnsmasq), uid 65534: exited on signal 11
pid 8416 (dnsmasq), uid 65534: exited on signal 11
pid 12152 (dnsmasq), uid 65534: exited on signal 11
pid 79594 (dnsmasq), uid 65534: exited on signal 11
pid 83020 (dnsmasq), uid 65534: exited on signal 11
pid 86625 (dnsmasq), uid 65534: exited on signal 11
pid 88256 (dnsmasq), uid 65534: exited on signal 11
pid 91213 (dnsmasq), uid 65534: exited on signal 11
pid 69497 (dnsmasq), uid 65534: exited on signal 11
pid 71816 (dnsmasq), uid 65534: exited on signal 11
pid 75650 (dnsmasq), uid 65534: exited on signal 11
pid 18822 (dnsmasq), uid 65534: exited on signal 11
pid 22268 (dnsmasq), uid 65534: exited on signal 11
pid 98940 (dnsmasq), uid 65534: exited on signal 11
pid 1493 (dnsmasq), uid 65534: exited on signal 11
pid 6053 (dnsmasq), uid 65534: exited on signal 11
pid 12206 (dnsmasq), uid 65534: exited on signal 11
pid 14151 (dnsmasq), uid 65534: exited on signal 11
pid 16539 (dnsmasq), uid 65534: exited on signal 11
pid 94270 (dnsmasq), uid 65534: exited on signal 11
pid 96935 (dnsmasq), uid 65534: exited on signal 11
pid 40958 (dnsmasq), uid 65534: exited on signal 11
pid 44402 (dnsmasq), uid 65534: exited on signal 11
pid 95615 (dnsmasq), uid 65534: exited on signal 11
pid 25564 (dnsmasq), uid 65534: exited on signal 11
pid 28003 (dnsmasq), uid 65534: exited on signal 11
pid 35455 (dnsmasq), uid 65534: exited on signal 11
pid 37584 (dnsmasq), uid 65534: exited on signal 11
pid 14473 (dnsmasq), uid 65534: exited on signal 11
pid 17819 (dnsmasq), uid 65534: exited on signal 11
pid 21431 (dnsmasq), uid 65534: exited on signal 11
pid 35032 (dnsmasq), uid 65534: exited on signal 11
pid 68048 (dnsmasq), uid 65534: exited on signal 11
pid 17880 (dnsmasq), uid 65534: exited on signal 11
pid 40663 (dnsmasq), uid 65534: exited on signal 11
pid 50012 (dnsmasq), uid 65534: exited on signal 11
pid 52857 (dnsmasq), uid 65534: exited on signal 11
pid 56719 (dnsmasq), uid 65534: exited on signal 11
pid 25754 (dnsmasq), uid 65534: exited on signal 11
pid 30742 (dnsmasq), uid 65534: exited on signal 11
pid 44637 (dnsmasq), uid 65534: exited on signal 11
pid 47968 (dnsmasq), uid 65534: exited on signal 11
pid 29731 (dnsmasq), uid 65534: exited on signal 11
pid 33589 (dnsmasq), uid 65534: exited on signal 11
pid 35866 (dnsmasq), uid 65534: exited on signal 11
pid 64803 (dnsmasq), uid 65534: exited on signal 11
pid 67341 (dnsmasq), uid 65534: exited on signal 11
pid 71247 (dnsmasq), uid 65534: exited on signal 11
pid 40116 (dnsmasq), uid 65534: exited on signal 11
pid 42881 (dnsmasq), uid 65534: exited on signal 11
pid 47026 (dnsmasq), uid 65534: exited on signal 11
pid 59087 (dnsmasq), uid 65534: exited on signal 11
pid 62523 (dnsmasq), uid 65534: exited on signal 11
pid 42330 (dnsmasq), uid 65534: exited on signal 11
pid 46109 (dnsmasq), uid 65534: exited on signal 11
pid 48776 (dnsmasq), uid 65534: exited on signal 11
pid 85096 (dnsmasq), uid 65534: exited on signal 11
pid 88684 (dnsmasq), uid 65534: exited on signal 11
pid 92489 (dnsmasq), uid 65534: exited on signal 11
pid 62969 (dnsmasq), uid 65534: exited on signal 11
pid 66189 (dnsmasq), uid 65534: exited on signal 11
pid 77844 (dnsmasq), uid 65534: exited on signal 11
pid 81265 (dnsmasq), uid 65534: exited on signal 11
pid 69485 (dnsmasq), uid 65534: exited on signal 11
pid 71999 (dnsmasq), uid 65534: exited on signal 11
pid 9548 (dnsmasq), uid 65534: exited on signal 11
pid 11804 (dnsmasq), uid 65534: exited on signal 11
pid 13966 (dnsmasq), uid 65534: exited on signal 11
pid 54987 (dnsmasq), uid 65534: exited on signal 11
pid 87116 (dnsmasq), uid 65534: exited on signal 11
pid 90712 (dnsmasq), uid 65534: exited on signal 11
pid 99818 (dnsmasq), uid 65534: exited on signal 11
pid 2930 (dnsmasq), uid 65534: exited on signal 11
pid 82728 (dnsmasq), uid 65534: exited on signal 11
pid 85402 (dnsmasq), uid 65534: exited on signal 11
pid 22866 (dnsmasq), uid 65534: exited on signal 11
pid 25727 (dnsmasq), uid 65534: exited on signal 11
pid 68449 (dnsmasq), uid 65534: exited on signal 11
pid 71363 (dnsmasq), uid 65534: exited on signal 11
pid 4146 (dnsmasq), uid 65534: exited on signal 11
pid 10773 (dnsmasq), uid 65534: exited on signal 11
pid 13907 (dnsmasq), uid 65534: exited on signal 11
pid 96777 (dnsmasq), uid 65534: exited on signal 11
pid 502 (dnsmasq), uid 65534: exited on signal 11
pid 42592 (dnsmasq), uid 65534: exited on signal 11
pid 47377 (dnsmasq), uid 65534: exited on signal 11
pid 49599 (dnsmasq), uid 65534: exited on signal 11
pid 99488 (dnsmasq), uid 65534: exited on signal 11
pid 3353 (dnsmasq), uid 65534: exited on signal 11
pid 23311 (dnsmasq), uid 65534: exited on signal 11
pid 40213 (dnsmasq), uid 65534: exited on signal 11
pid 44069 (dnsmasq), uid 65534: exited on signal 11
pid 46908 (dnsmasq), uid 65534: exited on signal 11
pid 49554 (dnsmasq), uid 65534: exited on signal 11
pid 25695 (dnsmasq), uid 65534: exited on signal 11
pid 28149 (dnsmasq), uid 65534: exited on signal 11
pid 32890 (dnsmasq), uid 65534: exited on signal 11
pid 66124 (dnsmasq), uid 65534: exited on signal 11
pid 72324 (dnsmasq), uid 65534: exited on signal 11
pid 77280 (dnsmasq), uid 65534: exited on signal 11
pid 81724 (dnsmasq), uid 65534: exited on signal 11
pid 22626 (dnsmasq), uid 65534: exited on signal 11
pid 26410 (dnsmasq), uid 65534: exited on signal 11
pid 63678 (dnsmasq), uid 65534: exited on signal 11
pid 66056 (dnsmasq), uid 65534: exited on signal 11
pid 68401 (dnsmasq), uid 65534: exited on signal 11
pid 50773 (dnsmasq), uid 65534: exited on signal 11My restart script logs the time after each restart and what I'm seeing is several a minute to one every few minutes:
Wed Jul 8 23:08:14 UTC 2015
Wed Jul 8 23:08:18 UTC 2015
Wed Jul 8 23:08:24 UTC 2015
Wed Jul 8 23:12:12 UTC 2015
Wed Jul 8 23:12:16 UTC 2015
Wed Jul 8 23:14:56 UTC 2015
Wed Jul 8 23:15:00 UTC 2015
Wed Jul 8 23:15:47 UTC 2015
Wed Jul 8 23:15:51 UTC 2015
Wed Jul 8 23:18:29 UTC 2015
Wed Jul 8 23:18:34 UTC 2015
Wed Jul 8 23:22:20 UTC 2015
Wed Jul 8 23:22:26 UTC 2015
Wed Jul 8 23:22:31 UTC 2015
Wed Jul 8 23:25:05 UTC 2015
Wed Jul 8 23:25:12 UTC 2015
Wed Jul 8 23:25:16 UTC 2015
Wed Jul 8 23:25:58 UTC 2015
Wed Jul 8 23:26:02 UTC 2015
Wed Jul 8 23:28:38 UTC 2015
Wed Jul 8 23:28:44 UTC 2015
Wed Jul 8 23:28:48 UTC 2015
Wed Jul 8 23:32:35 UTC 2015
Wed Jul 8 23:32:41 UTC 2015
Wed Jul 8 23:35:22 UTC 2015
Wed Jul 8 23:35:26 UTC 2015
Wed Jul 8 23:35:32 UTC 2015
Wed Jul 8 23:36:07 UTC 2015
Wed Jul 8 23:36:13 UTC 2015
Wed Jul 8 23:38:55 UTC 2015
Wed Jul 8 23:38:59 UTC 2015
Wed Jul 8 23:39:05 UTC 2015
Wed Jul 8 23:42:46 UTC 2015
Wed Jul 8 23:42:50 UTC 2015
Wed Jul 8 23:45:37 UTC 2015
Wed Jul 8 23:45:41 UTC 2015
Wed Jul 8 23:45:47 UTC 2015
Wed Jul 8 23:46:17 UTC 2015
Wed Jul 8 23:46:21 UTC 2015
Wed Jul 8 23:49:10 UTC 2015
Wed Jul 8 23:49:14 UTC 2015
Wed Jul 8 23:49:20 UTC 2015
Wed Jul 8 23:52:57 UTC 2015
Wed Jul 8 23:53:01 UTC 2015
Wed Jul 8 23:53:05 UTC 2015
Wed Jul 8 23:55:51 UTC 2015
Wed Jul 8 23:55:56 UTC 2015
Wed Jul 8 23:56:26 UTC 2015
Wed Jul 8 23:56:32 UTC 2015
Wed Jul 8 23:59:23 UTC 2015
Wed Jul 8 23:59:29 UTC 2015
Wed Jul 8 23:59:33 UTC 2015
Thu Jul 9 00:03:12 UTC 2015
Thu Jul 9 00:03:16 UTC 2015
Thu Jul 9 00:06:02 UTC 2015
Thu Jul 9 00:06:06 UTC 2015
Thu Jul 9 00:06:11 UTC 2015
Thu Jul 9 00:06:37 UTC 2015
Thu Jul 9 00:06:43 UTC 2015
Thu Jul 9 00:09:40 UTC 2015
Thu Jul 9 00:09:44 UTC 2015
Thu Jul 9 00:13:21 UTC 2015
Thu Jul 9 00:13:25 UTC 2015
Thu Jul 9 00:13:32 UTC 2015
Thu Jul 9 00:16:16 UTC 2015
Thu Jul 9 00:16:22 UTC 2015
Thu Jul 9 00:16:48 UTC 2015
Thu Jul 9 00:16:52 UTC 2015
Thu Jul 9 00:19:49 UTC 2015
Thu Jul 9 00:19:55 UTC 2015
Thu Jul 9 00:19:59 UTC 2015
Thu Jul 9 00:23:36 UTC 2015
Thu Jul 9 00:23:40 UTC 2015
Thu Jul 9 00:23:46 UTC 2015
Thu Jul 9 00:26:26 UTC 2015
Thu Jul 9 00:26:30 UTC 2015
Thu Jul 9 00:26:36 UTC 2015
Thu Jul 9 00:26:56 UTC 2015
Thu Jul 9 00:27:03 UTC 2015
Thu Jul 9 00:30:03 UTC 2015
Thu Jul 9 00:30:09 UTC 2015
Thu Jul 9 00:30:13 UTC 2015
Thu Jul 9 00:33:52 UTC 2015
Thu Jul 9 00:33:56 UTC 2015
Thu Jul 9 00:34:02 UTC 2015
Thu Jul 9 00:36:40 UTC 2015
Thu Jul 9 00:36:47 UTC 2015
Thu Jul 9 00:37:07 UTC 2015
Thu Jul 9 00:37:13 UTC 2015
Thu Jul 9 00:40:20 UTC 2015
Thu Jul 9 00:40:24 UTC 2015
Thu Jul 9 00:44:07 UTC 2015
Thu Jul 9 00:44:11 UTC 2015
Thu Jul 9 00:44:15 UTC 2015
Thu Jul 9 00:46:52 UTC 2015
Thu Jul 9 00:46:56 UTC 2015
Thu Jul 9 00:47:02 UTC 2015
Thu Jul 9 00:47:18 UTC 2015
Thu Jul 9 00:47:22 UTC 2015
Thu Jul 9 00:50:29 UTC 2015
Thu Jul 9 00:50:33 UTC 2015
Thu Jul 9 00:54:22 UTC 2015
Thu Jul 9 00:54:26 UTC 2015
Thu Jul 9 00:57:07 UTC 2015
Thu Jul 9 00:57:11 UTC 2015
Thu Jul 9 00:57:15 UTC 2015
Thu Jul 9 00:57:27 UTC 2015
Thu Jul 9 00:57:31 UTC 2015
Thu Jul 9 01:00:38 UTC 2015
Thu Jul 9 01:00:44 UTC 2015
Thu Jul 9 01:04:30 UTC 2015
Thu Jul 9 01:04:36 UTC 2015
Thu Jul 9 01:04:40 UTC 2015
Thu Jul 9 01:07:20 UTC 2015
Thu Jul 9 01:07:27 UTC 2015
Thu Jul 9 01:07:31 UTC 2015
Thu Jul 9 01:07:37 UTC 2015
Thu Jul 9 01:07:43 UTC 2015
Thu Jul 9 01:07:47 UTC 2015
Thu Jul 9 01:07:51 UTC 2015
Thu Jul 9 01:10:50 UTC 2015
Thu Jul 9 01:10:54 UTC 2015
Thu Jul 9 01:10:58 UTC 2015
Thu Jul 9 01:14:45 UTC 2015
Thu Jul 9 01:14:51 UTC 2015
Thu Jul 9 01:14:55 UTC 2015
Thu Jul 9 01:15:02 UTC 2015
Thu Jul 9 01:17:35 UTC 2015
Thu Jul 9 01:17:42 UTC 2015
Thu Jul 9 01:17:58 UTC 2015
Thu Jul 9 01:18:02 UTC 2015
Thu Jul 9 01:18:06 UTC 2015
Thu Jul 9 01:21:04 UTC 2015
Thu Jul 9 01:21:09 UTC 2015
Thu Jul 9 01:25:05 UTC 2015
Thu Jul 9 01:25:11 UTC 2015
Thu Jul 9 01:27:48 UTC 2015
Thu Jul 9 01:27:52 UTC 2015
Thu Jul 9 01:27:56 UTC 2015
Thu Jul 9 01:28:12 UTC 2015Has anyone else seen this? Is there anything I can do besides continue running this script and deal with the outage? I'm happy to provide any debug info I can offer.
Of note, this firewall is in an HA pair and the secondary has only had 3 restarts in this same time frame. The main difference is that the secondary is second in the DNS search order so the primary is hit much more frequently in regards to DNS requests.
Help?!?
For reference. my restart script is below:
#!/bin/sh
DNSCMDLINE="/usr/local/sbin/dnsmasq –all-servers --server=/10.in-addr.arpa/ --server=/168.192.in-addr.arpa/ --server=/16.172.in-addr.arpa/ --server=/17.172.in-addr.arpa/ --server=/18.172.in-addr.arpa/ --server=/19.172.in-addr.arpa/ --server=/20.172.in-addr.arpa/ --server=/21.172.in-addr.arpa/ --server=/22.172.in-addr.arpa/ --server=/23.172.in-addr.arpa/ --server=/24.172.in-addr.arpa/ --server=/25.172.in-addr.arpa/ --server=/26.172.in-addr.arpa/ --server=/27.172.in-addr.arpa/ --server=/28.172.in-addr.arpa/ --server=/29.172.in-addr.arpa/ --server=/30.172.in-addr.arpa/ --server=/31.172.in-addr.arpa/ --dns-forward-max=5000 --cache-size=10000 --local-ttl=1"
while true; do
DNSPID=ps axww | fgrep '/usr/local/sbin/dnsmasq --all-servers' | grep -v grep | awk '{print $1}'
if [ -z "$DNSPID" ]; then $DNSCMDLINE ; date >>/tmp/dnsrestart; fi
sleep 2;
done -
Alternatively, can someone send me the AMD64 binary of dnsmasq from 2.2.2 so I can try that? I can try to extract from the install media when I'm at work tomorrow but this might save me a bit of time.
Thanks!
-
So I had some time to my lonesome at the local watering hole and was able to extract the 2.2.2 dnsmasq binary from the install media via a loop mount (THANK YOU PFSENSE FOR USING A REGULAR DIRECTORY STRUCTURE!!) and have swapped that in for the time being.
In the last 10 minutes things are looking much better (no restarts). I don't know what changed with the recent dnsmasq build but something, at least in my case, is a bit funky. I still have the old binary saved should someone want to delve further but the 2.2.2 build seems to be working much better with the 2.2.3 release.
MD5 sigs for the dnsmasq binaries (for those interested):
AMD64 dnsmasq from 2.2.2 (GOOD) -> MD5 (/usr/local/sbin/dnsmasq) = 8e9eb7759989bd2c04c0f7bf6c5bf303
AMD64 dnsmasq from 2.2.3 (ISSUES) -> MD5 (/usr/local/sbin/dnsmasq.old) = 65408562620b5ae48202f28e241706d3 -
em1_vlan1
What is that? Seems the untagged, default VLAN should be em1, not em1_vlan1.
-
em1_vlan1
What is that? Seems the untagged, default VLAN should be em1, not em1_vlan1.
I have VLANs on my internal interface and VLAN1 is my workstation VLAN, so em1_vlan1 is correct (I have several additional VLANs on that interface as well for other development networks, vlan10, vlan20, etc…). There is no traffic over the untagged interface internally.
-
lots of hardware doesn't support tagging vlan 1 … thats (what i think) is what derelict is referring to
-
In my case both of my Netgear switches support it and have through several iterations of pfSense (2.0 to now). In fact one of the Netgear switches requires management via VLAN1 so I'm somewhat stuck there.
In any case, I did try to have dnsmasq listen on all interfaces as well as specific VLAN interfaces when it was flapping yesterday. In both instances I saw the same flapping behavior.
I am happy to note that since reverting the dnsmasq binary back to 2.2.2 I haven't seen a single signal 11 crash, it has stayed remarkably stable on both my primary and secondary firewall.