Dnsmasq dying regularly after upgrade from 2.2.2 to 2.2.3



  • I just upgraded my office firewalls to 2.2.3 from 2.2.2, mainly for the openssl fixes. The upgrade itself progressed without issue but now dnsmasq is dying very frequently post-upgrade, so much so that I had to write a restart script that checks for its PID every 2 seconds and restarts if failing. Here's what I'm seeing in my dmesg (note this is from a restart roughly 2 hours ago):

    carp: VHID 2@em1_vlan1: BACKUP -> MASTER (preempting a slower master)
    ovpns1: link state changed to DOWN
    ovpns1: link state changed to UP
    pid 68760 (dnsmasq), uid 65534: exited on signal 11
    pid 58617 (dnsmasq), uid 65534: exited on signal 11
    pid 61503 (dnsmasq), uid 65534: exited on signal 11
    arp: 192.168.1.114 moved from 70:56:81:9a:94:25 to b8:78:2e:57:a6:bc on em1_vlan1
    arp: 192.168.1.114 moved from 70:56:81:9a:94:25 to b8:78:2e:57:a6:bc on em1_vlan1
    pid 65112 (dnsmasq), uid 65534: exited on signal 11
    pid 61131 (dnsmasq), uid 65534: exited on signal 11
    pid 64404 (dnsmasq), uid 65534: exited on signal 11
    pid 6542 (dnsmasq), uid 65534: exited on signal 11
    pid 9089 (dnsmasq), uid 65534: exited on signal 11
    pid 67017 (dnsmasq), uid 65534: exited on signal 11
    pid 70543 (dnsmasq), uid 65534: exited on signal 11
    pid 38794 (dnsmasq), uid 65534: exited on signal 11
    pid 41698 (dnsmasq), uid 65534: exited on signal 11
    pid 78629 (dnsmasq), uid 65534: exited on signal 11
    pid 82340 (dnsmasq), uid 65534: exited on signal 11
    pid 86055 (dnsmasq), uid 65534: exited on signal 11
    pid 30145 (dnsmasq), uid 65534: exited on signal 11
    pid 33429 (dnsmasq), uid 65534: exited on signal 11
    pid 36217 (dnsmasq), uid 65534: exited on signal 11
    pid 88993 (dnsmasq), uid 65534: exited on signal 11
    pid 92712 (dnsmasq), uid 65534: exited on signal 11
    pid 61158 (dnsmasq), uid 65534: exited on signal 11
    pid 64693 (dnsmasq), uid 65534: exited on signal 11
    pid 68395 (dnsmasq), uid 65534: exited on signal 11
    pid 71601 (dnsmasq), uid 65534: exited on signal 11
    pid 3580 (dnsmasq), uid 65534: exited on signal 11
    pid 51100 (dnsmasq), uid 65534: exited on signal 11
    pid 54020 (dnsmasq), uid 65534: exited on signal 11
    pid 58547 (dnsmasq), uid 65534: exited on signal 11
    pid 8416 (dnsmasq), uid 65534: exited on signal 11
    pid 12152 (dnsmasq), uid 65534: exited on signal 11
    pid 79594 (dnsmasq), uid 65534: exited on signal 11
    pid 83020 (dnsmasq), uid 65534: exited on signal 11
    pid 86625 (dnsmasq), uid 65534: exited on signal 11
    pid 88256 (dnsmasq), uid 65534: exited on signal 11
    pid 91213 (dnsmasq), uid 65534: exited on signal 11
    pid 69497 (dnsmasq), uid 65534: exited on signal 11
    pid 71816 (dnsmasq), uid 65534: exited on signal 11
    pid 75650 (dnsmasq), uid 65534: exited on signal 11
    pid 18822 (dnsmasq), uid 65534: exited on signal 11
    pid 22268 (dnsmasq), uid 65534: exited on signal 11
    pid 98940 (dnsmasq), uid 65534: exited on signal 11
    pid 1493 (dnsmasq), uid 65534: exited on signal 11
    pid 6053 (dnsmasq), uid 65534: exited on signal 11
    pid 12206 (dnsmasq), uid 65534: exited on signal 11
    pid 14151 (dnsmasq), uid 65534: exited on signal 11
    pid 16539 (dnsmasq), uid 65534: exited on signal 11
    pid 94270 (dnsmasq), uid 65534: exited on signal 11
    pid 96935 (dnsmasq), uid 65534: exited on signal 11
    pid 40958 (dnsmasq), uid 65534: exited on signal 11
    pid 44402 (dnsmasq), uid 65534: exited on signal 11
    pid 95615 (dnsmasq), uid 65534: exited on signal 11
    pid 25564 (dnsmasq), uid 65534: exited on signal 11
    pid 28003 (dnsmasq), uid 65534: exited on signal 11
    pid 35455 (dnsmasq), uid 65534: exited on signal 11
    pid 37584 (dnsmasq), uid 65534: exited on signal 11
    pid 14473 (dnsmasq), uid 65534: exited on signal 11
    pid 17819 (dnsmasq), uid 65534: exited on signal 11
    pid 21431 (dnsmasq), uid 65534: exited on signal 11
    pid 35032 (dnsmasq), uid 65534: exited on signal 11
    pid 68048 (dnsmasq), uid 65534: exited on signal 11
    pid 17880 (dnsmasq), uid 65534: exited on signal 11
    pid 40663 (dnsmasq), uid 65534: exited on signal 11
    pid 50012 (dnsmasq), uid 65534: exited on signal 11
    pid 52857 (dnsmasq), uid 65534: exited on signal 11
    pid 56719 (dnsmasq), uid 65534: exited on signal 11
    pid 25754 (dnsmasq), uid 65534: exited on signal 11
    pid 30742 (dnsmasq), uid 65534: exited on signal 11
    pid 44637 (dnsmasq), uid 65534: exited on signal 11
    pid 47968 (dnsmasq), uid 65534: exited on signal 11
    pid 29731 (dnsmasq), uid 65534: exited on signal 11
    pid 33589 (dnsmasq), uid 65534: exited on signal 11
    pid 35866 (dnsmasq), uid 65534: exited on signal 11
    pid 64803 (dnsmasq), uid 65534: exited on signal 11
    pid 67341 (dnsmasq), uid 65534: exited on signal 11
    pid 71247 (dnsmasq), uid 65534: exited on signal 11
    pid 40116 (dnsmasq), uid 65534: exited on signal 11
    pid 42881 (dnsmasq), uid 65534: exited on signal 11
    pid 47026 (dnsmasq), uid 65534: exited on signal 11
    pid 59087 (dnsmasq), uid 65534: exited on signal 11
    pid 62523 (dnsmasq), uid 65534: exited on signal 11
    pid 42330 (dnsmasq), uid 65534: exited on signal 11
    pid 46109 (dnsmasq), uid 65534: exited on signal 11
    pid 48776 (dnsmasq), uid 65534: exited on signal 11
    pid 85096 (dnsmasq), uid 65534: exited on signal 11
    pid 88684 (dnsmasq), uid 65534: exited on signal 11
    pid 92489 (dnsmasq), uid 65534: exited on signal 11
    pid 62969 (dnsmasq), uid 65534: exited on signal 11
    pid 66189 (dnsmasq), uid 65534: exited on signal 11
    pid 77844 (dnsmasq), uid 65534: exited on signal 11
    pid 81265 (dnsmasq), uid 65534: exited on signal 11
    pid 69485 (dnsmasq), uid 65534: exited on signal 11
    pid 71999 (dnsmasq), uid 65534: exited on signal 11
    pid 9548 (dnsmasq), uid 65534: exited on signal 11
    pid 11804 (dnsmasq), uid 65534: exited on signal 11
    pid 13966 (dnsmasq), uid 65534: exited on signal 11
    pid 54987 (dnsmasq), uid 65534: exited on signal 11
    pid 87116 (dnsmasq), uid 65534: exited on signal 11
    pid 90712 (dnsmasq), uid 65534: exited on signal 11
    pid 99818 (dnsmasq), uid 65534: exited on signal 11
    pid 2930 (dnsmasq), uid 65534: exited on signal 11
    pid 82728 (dnsmasq), uid 65534: exited on signal 11
    pid 85402 (dnsmasq), uid 65534: exited on signal 11
    pid 22866 (dnsmasq), uid 65534: exited on signal 11
    pid 25727 (dnsmasq), uid 65534: exited on signal 11
    pid 68449 (dnsmasq), uid 65534: exited on signal 11
    pid 71363 (dnsmasq), uid 65534: exited on signal 11
    pid 4146 (dnsmasq), uid 65534: exited on signal 11
    pid 10773 (dnsmasq), uid 65534: exited on signal 11
    pid 13907 (dnsmasq), uid 65534: exited on signal 11
    pid 96777 (dnsmasq), uid 65534: exited on signal 11
    pid 502 (dnsmasq), uid 65534: exited on signal 11
    pid 42592 (dnsmasq), uid 65534: exited on signal 11
    pid 47377 (dnsmasq), uid 65534: exited on signal 11
    pid 49599 (dnsmasq), uid 65534: exited on signal 11
    pid 99488 (dnsmasq), uid 65534: exited on signal 11
    pid 3353 (dnsmasq), uid 65534: exited on signal 11
    pid 23311 (dnsmasq), uid 65534: exited on signal 11
    pid 40213 (dnsmasq), uid 65534: exited on signal 11
    pid 44069 (dnsmasq), uid 65534: exited on signal 11
    pid 46908 (dnsmasq), uid 65534: exited on signal 11
    pid 49554 (dnsmasq), uid 65534: exited on signal 11
    pid 25695 (dnsmasq), uid 65534: exited on signal 11
    pid 28149 (dnsmasq), uid 65534: exited on signal 11
    pid 32890 (dnsmasq), uid 65534: exited on signal 11
    pid 66124 (dnsmasq), uid 65534: exited on signal 11
    pid 72324 (dnsmasq), uid 65534: exited on signal 11
    pid 77280 (dnsmasq), uid 65534: exited on signal 11
    pid 81724 (dnsmasq), uid 65534: exited on signal 11
    pid 22626 (dnsmasq), uid 65534: exited on signal 11
    pid 26410 (dnsmasq), uid 65534: exited on signal 11
    pid 63678 (dnsmasq), uid 65534: exited on signal 11
    pid 66056 (dnsmasq), uid 65534: exited on signal 11
    pid 68401 (dnsmasq), uid 65534: exited on signal 11
    pid 50773 (dnsmasq), uid 65534: exited on signal 11

    My restart script logs the time after each restart and what I'm seeing is several a minute to one every few minutes:

    Wed Jul  8 23:08:14 UTC 2015
    Wed Jul  8 23:08:18 UTC 2015
    Wed Jul  8 23:08:24 UTC 2015
    Wed Jul  8 23:12:12 UTC 2015
    Wed Jul  8 23:12:16 UTC 2015
    Wed Jul  8 23:14:56 UTC 2015
    Wed Jul  8 23:15:00 UTC 2015
    Wed Jul  8 23:15:47 UTC 2015
    Wed Jul  8 23:15:51 UTC 2015
    Wed Jul  8 23:18:29 UTC 2015
    Wed Jul  8 23:18:34 UTC 2015
    Wed Jul  8 23:22:20 UTC 2015
    Wed Jul  8 23:22:26 UTC 2015
    Wed Jul  8 23:22:31 UTC 2015
    Wed Jul  8 23:25:05 UTC 2015
    Wed Jul  8 23:25:12 UTC 2015
    Wed Jul  8 23:25:16 UTC 2015
    Wed Jul  8 23:25:58 UTC 2015
    Wed Jul  8 23:26:02 UTC 2015
    Wed Jul  8 23:28:38 UTC 2015
    Wed Jul  8 23:28:44 UTC 2015
    Wed Jul  8 23:28:48 UTC 2015
    Wed Jul  8 23:32:35 UTC 2015
    Wed Jul  8 23:32:41 UTC 2015
    Wed Jul  8 23:35:22 UTC 2015
    Wed Jul  8 23:35:26 UTC 2015
    Wed Jul  8 23:35:32 UTC 2015
    Wed Jul  8 23:36:07 UTC 2015
    Wed Jul  8 23:36:13 UTC 2015
    Wed Jul  8 23:38:55 UTC 2015
    Wed Jul  8 23:38:59 UTC 2015
    Wed Jul  8 23:39:05 UTC 2015
    Wed Jul  8 23:42:46 UTC 2015
    Wed Jul  8 23:42:50 UTC 2015
    Wed Jul  8 23:45:37 UTC 2015
    Wed Jul  8 23:45:41 UTC 2015
    Wed Jul  8 23:45:47 UTC 2015
    Wed Jul  8 23:46:17 UTC 2015
    Wed Jul  8 23:46:21 UTC 2015
    Wed Jul  8 23:49:10 UTC 2015
    Wed Jul  8 23:49:14 UTC 2015
    Wed Jul  8 23:49:20 UTC 2015
    Wed Jul  8 23:52:57 UTC 2015
    Wed Jul  8 23:53:01 UTC 2015
    Wed Jul  8 23:53:05 UTC 2015
    Wed Jul  8 23:55:51 UTC 2015
    Wed Jul  8 23:55:56 UTC 2015
    Wed Jul  8 23:56:26 UTC 2015
    Wed Jul  8 23:56:32 UTC 2015
    Wed Jul  8 23:59:23 UTC 2015
    Wed Jul  8 23:59:29 UTC 2015
    Wed Jul  8 23:59:33 UTC 2015
    Thu Jul  9 00:03:12 UTC 2015
    Thu Jul  9 00:03:16 UTC 2015
    Thu Jul  9 00:06:02 UTC 2015
    Thu Jul  9 00:06:06 UTC 2015
    Thu Jul  9 00:06:11 UTC 2015
    Thu Jul  9 00:06:37 UTC 2015
    Thu Jul  9 00:06:43 UTC 2015
    Thu Jul  9 00:09:40 UTC 2015
    Thu Jul  9 00:09:44 UTC 2015
    Thu Jul  9 00:13:21 UTC 2015
    Thu Jul  9 00:13:25 UTC 2015
    Thu Jul  9 00:13:32 UTC 2015
    Thu Jul  9 00:16:16 UTC 2015
    Thu Jul  9 00:16:22 UTC 2015
    Thu Jul  9 00:16:48 UTC 2015
    Thu Jul  9 00:16:52 UTC 2015
    Thu Jul  9 00:19:49 UTC 2015
    Thu Jul  9 00:19:55 UTC 2015
    Thu Jul  9 00:19:59 UTC 2015
    Thu Jul  9 00:23:36 UTC 2015
    Thu Jul  9 00:23:40 UTC 2015
    Thu Jul  9 00:23:46 UTC 2015
    Thu Jul  9 00:26:26 UTC 2015
    Thu Jul  9 00:26:30 UTC 2015
    Thu Jul  9 00:26:36 UTC 2015
    Thu Jul  9 00:26:56 UTC 2015
    Thu Jul  9 00:27:03 UTC 2015
    Thu Jul  9 00:30:03 UTC 2015
    Thu Jul  9 00:30:09 UTC 2015
    Thu Jul  9 00:30:13 UTC 2015
    Thu Jul  9 00:33:52 UTC 2015
    Thu Jul  9 00:33:56 UTC 2015
    Thu Jul  9 00:34:02 UTC 2015
    Thu Jul  9 00:36:40 UTC 2015
    Thu Jul  9 00:36:47 UTC 2015
    Thu Jul  9 00:37:07 UTC 2015
    Thu Jul  9 00:37:13 UTC 2015
    Thu Jul  9 00:40:20 UTC 2015
    Thu Jul  9 00:40:24 UTC 2015
    Thu Jul  9 00:44:07 UTC 2015
    Thu Jul  9 00:44:11 UTC 2015
    Thu Jul  9 00:44:15 UTC 2015
    Thu Jul  9 00:46:52 UTC 2015
    Thu Jul  9 00:46:56 UTC 2015
    Thu Jul  9 00:47:02 UTC 2015
    Thu Jul  9 00:47:18 UTC 2015
    Thu Jul  9 00:47:22 UTC 2015
    Thu Jul  9 00:50:29 UTC 2015
    Thu Jul  9 00:50:33 UTC 2015
    Thu Jul  9 00:54:22 UTC 2015
    Thu Jul  9 00:54:26 UTC 2015
    Thu Jul  9 00:57:07 UTC 2015
    Thu Jul  9 00:57:11 UTC 2015
    Thu Jul  9 00:57:15 UTC 2015
    Thu Jul  9 00:57:27 UTC 2015
    Thu Jul  9 00:57:31 UTC 2015
    Thu Jul  9 01:00:38 UTC 2015
    Thu Jul  9 01:00:44 UTC 2015
    Thu Jul  9 01:04:30 UTC 2015
    Thu Jul  9 01:04:36 UTC 2015
    Thu Jul  9 01:04:40 UTC 2015
    Thu Jul  9 01:07:20 UTC 2015
    Thu Jul  9 01:07:27 UTC 2015
    Thu Jul  9 01:07:31 UTC 2015
    Thu Jul  9 01:07:37 UTC 2015
    Thu Jul  9 01:07:43 UTC 2015
    Thu Jul  9 01:07:47 UTC 2015
    Thu Jul  9 01:07:51 UTC 2015
    Thu Jul  9 01:10:50 UTC 2015
    Thu Jul  9 01:10:54 UTC 2015
    Thu Jul  9 01:10:58 UTC 2015
    Thu Jul  9 01:14:45 UTC 2015
    Thu Jul  9 01:14:51 UTC 2015
    Thu Jul  9 01:14:55 UTC 2015
    Thu Jul  9 01:15:02 UTC 2015
    Thu Jul  9 01:17:35 UTC 2015
    Thu Jul  9 01:17:42 UTC 2015
    Thu Jul  9 01:17:58 UTC 2015
    Thu Jul  9 01:18:02 UTC 2015
    Thu Jul  9 01:18:06 UTC 2015
    Thu Jul  9 01:21:04 UTC 2015
    Thu Jul  9 01:21:09 UTC 2015
    Thu Jul  9 01:25:05 UTC 2015
    Thu Jul  9 01:25:11 UTC 2015
    Thu Jul  9 01:27:48 UTC 2015
    Thu Jul  9 01:27:52 UTC 2015
    Thu Jul  9 01:27:56 UTC 2015
    Thu Jul  9 01:28:12 UTC 2015

    Has anyone else seen this? Is there anything I can do besides continue running this script and deal with the outage? I'm happy to provide any debug info I can offer.

    Of note, this firewall is in an HA pair and the secondary has only had 3 restarts in this same time frame. The main difference is that the secondary is second in the DNS search order so the primary is hit much more frequently in regards to DNS requests.

    Help?!?

    For reference. my restart script is below:

    #!/bin/sh

    DNSCMDLINE="/usr/local/sbin/dnsmasq –all-servers --server=/10.in-addr.arpa/ --server=/168.192.in-addr.arpa/ --server=/16.172.in-addr.arpa/ --server=/17.172.in-addr.arpa/ --server=/18.172.in-addr.arpa/ --server=/19.172.in-addr.arpa/ --server=/20.172.in-addr.arpa/ --server=/21.172.in-addr.arpa/ --server=/22.172.in-addr.arpa/ --server=/23.172.in-addr.arpa/ --server=/24.172.in-addr.arpa/ --server=/25.172.in-addr.arpa/ --server=/26.172.in-addr.arpa/ --server=/27.172.in-addr.arpa/ --server=/28.172.in-addr.arpa/ --server=/29.172.in-addr.arpa/ --server=/30.172.in-addr.arpa/ --server=/31.172.in-addr.arpa/ --dns-forward-max=5000 --cache-size=10000 --local-ttl=1"

    while true; do
    DNSPID=ps axww | fgrep '/usr/local/sbin/dnsmasq --all-servers' | grep -v grep | awk '{print $1}'

    if [ -z "$DNSPID" ]; then $DNSCMDLINE ; date >>/tmp/dnsrestart; fi

    sleep 2;
    done



  • Alternatively, can someone send me the AMD64 binary of dnsmasq from 2.2.2 so I can try that? I can try to extract from the install media when I'm at work tomorrow but this might save me a bit of time.

    Thanks!



  • So I had some time to my lonesome at the local watering hole and was able to extract the 2.2.2 dnsmasq binary from the install media via a loop mount (THANK YOU PFSENSE FOR USING A REGULAR DIRECTORY STRUCTURE!!) and have swapped that in for the time being.

    In the last 10 minutes things are looking much better (no restarts). I don't know what changed with the recent dnsmasq build but something, at least in my case, is a bit funky. I still have the old binary saved should someone want to delve further but the 2.2.2 build seems to be working much better with the 2.2.3 release.

    MD5 sigs for the dnsmasq binaries (for those interested):

    AMD64 dnsmasq from 2.2.2 (GOOD) -> MD5 (/usr/local/sbin/dnsmasq) = 8e9eb7759989bd2c04c0f7bf6c5bf303
    AMD64 dnsmasq from 2.2.3 (ISSUES) -> MD5 (/usr/local/sbin/dnsmasq.old) = 65408562620b5ae48202f28e241706d3


  • Netgate

    em1_vlan1

    What is that?  Seems the untagged, default VLAN should be em1, not em1_vlan1.



  • @Derelict:

    em1_vlan1

    What is that?  Seems the untagged, default VLAN should be em1, not em1_vlan1.

    I have VLANs on my internal interface and VLAN1 is my workstation VLAN, so em1_vlan1 is correct (I have several additional VLANs on that interface as well for other development networks, vlan10, vlan20, etc…). There is no traffic over the untagged interface internally.



  • lots of hardware doesn't support tagging vlan 1  … thats (what i think) is what derelict is referring to



  • In my case both of my Netgear switches support it and have through several iterations of pfSense (2.0 to now). In fact one of the Netgear switches requires management via VLAN1 so I'm somewhat stuck there.

    In any case, I did try to have dnsmasq listen on all interfaces as well as specific VLAN interfaces when it was flapping yesterday. In both instances I saw the same flapping behavior.

    I am happy to note that since reverting the dnsmasq binary back to 2.2.2 I haven't seen a single signal 11 crash, it has stayed remarkably stable on both my primary and secondary firewall.