Update 2.1.5 to 2.2 fails on APU
-
Hello All,
After doing the upgrade to pfsense 2.2 on a APU based system it reboots and keeps on rebooting.
Last boot messages:
Configuring firewall…...done.
Starting SNMP daemon... done.
Generating RRD graphs...Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer = 0x20:0xffffffff80b6d4e5
stack pointer = 0x28:0xfffffe0036095840
frame pointer = 0x28:0xfffffe0036095850
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 37401 (bsnmpd)
[ thread pid 37401 tid 100104 ]
Stopped at strlcpy+0x25: movb (%rax),%dl
db:0:kdb.enter.default> textdump set
textdump set
db:0:kdb.enter.default> capture on
db:0:kdb.enter.default> run lockinfo
db:1:lockinfo> show locks
No such command
db:1:locks> show alllocks
No such command
db:1:alllocks> show lockedvnods
Locked vnodes
db:0:kdb.enter.default> show pcpu
cpuid = 0
dynamic pcpu = 0x637700
curthread = 0xfffff8000ab7f490: pid 37401 "bsnmpd"
curpcb = 0xfffffe0036095cc0
fpcurthread = none
idlethread = 0xfffff8000320f000: tid 100003 "idle: cpu0"
curpmap = 0xfffff80003216678
tssp = 0xffffffff8218d010
commontssp = 0xffffffff8218d010
rsp0 = 0xfffffe0036095cc0
gs32p = 0xffffffff8218ea68
ldt = 0xffffffff8218eaa8
tss = 0xffffffff8218ea98
db:0:kdb.enter.default> bt
Tracing pid 37401 tid 100104 td 0xfffff8000ab7f490
strlcpy() at strlcpy+0x25/frame 0xfffffe0036095850
sysctl_rman() at sysctl_rman+0x1e1/frame 0xfffffe0036095930
sysctl_root() at sysctl_root+0x232/frame 0xfffffe0036095980
userland_sysctl() at userland_sysctl+0x1d8/frame 0xfffffe0036095a30
sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe0036095ae0
amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0036095bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0036095bf0
–- syscall (202, FreeBSD ELF64, sys___sysctl),Any idea how to solve this ?
-
Are you running embedded or full install? SD card, mSATA, or HD?
Were you previously using SNMP?
Steve
-
I am running on mSata, full Install with Serial console (I believe it is embedded kernel)
Yes snmp was active in 2.1.5
-
I did a clean install, all was good. Restored the old configuration file. Same issue again.
I am now going to try to disable snmpd during boot.
-
Hmm, interesting. Looking back there's nothing recent but previous SNMP issues seem to be related to attempting to read a large amount of data at one time.
Steve
Edit: Can't type.
-
So the workaround is:
- Boot into single user mode
- Do a fsck on your boot disk
- mount it as R/W
- Rename the bsnmpd file
- reboot
=> All Fine but no snmp working anymore
Which is critical because I want to use it for monitoring the device.
-
Not seeing that. Do you know if just enabling SNMP is adequate, or do you have to poll it before it breaks? Either way it's not something I can replicate by just enabling it, nor by snmpwalking it repeatedly. What is your monitoring polling on it? Maybe if I hit some specific OID repeatedly in the way you're doing it'll be replicable.
-
At the moment the situation is if I enable bsnmpd the system does the general protection fault immediate.
I have nothing polling snmp at the moment, 1 of my next projects is investigating ZenOSS Core monitoring which requires SNMP because of this I called it a blocker.
If you test on a clean install you not have the issue. I only had the issue when I was doing upgrade from 2.1.5 with snmp active and Installing 2.2 clean and importing the backup file from 2.1.5 with snmp enabled.
Disabling it solves the issue, but it happens so fast you first need to do the rename trick in singer user mode.
It is not the only issue I have had:
- Unbound had a corrupt root.key file. I was using unbound on 2.1.5
- ssh host keys where corrupt, had to recreate them.
-
Same problem on my side :
I upgraded an ALIX 2D3 to 2.2-RELEASE (i386) without any issue, but when I tried to upgrade my APU to 2.2-RELEASE (amd64), a kernel crash occured at boot apparently in bsnmpd :___ ___/ f \ / p \___/ Sense \___/ \ \___/ Welcome to pfSense 2.2-RELEASE ... Creating symlinks......ELF ldconfig path: /lib /usr/lib /usr/lib/compat /usr/local/lib 32-bit compatibility ldconfig path: /usr/lib32 done. External config loader 1.0 is now starting... Launching the init system... done. Initializing...................... done. Starting device manager (devd)...done. Loading configuration......done. Updating configuration...done. Cleaning backup cache........done. Setting up extended sysctls...done. Setting timezone...done. Configuring loopback interface...done. Starting syslog...done. Starting Secure Shell Services...done. Setting up polling defaults...done. Setting up interfaces microcode...done. Configuring loopback interface...done. Creating wireless clone interfaces...done. Configuring LAGG interfaces...done. Configuring VLAN interfaces...done. Configuring QinQ interfaces...done. Configuring INTERNET interface...done. Configuring DMZ interface...done. Configuring LAN interface...done. Configuring CARP settings...done. Syncing OpenVPN settings...done. Configuring firewall......done. Starting PFLOG...done. Setting up gateway monitors...done. Synchronizing user settings...done. Starting webConfigurator...done. Configuring CRON...done. Starting DNS forwarder...done. Starting NTP time client...done. Starting DHCP service...done. Configuring firewall......done. Configuring IPsec VPN... done Starting SNMP daemon... done. Generating RRD graphs... Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0xffffffff80b6d4e5 stack pointer = 0x28:0xfffffe0094bdd840 frame pointer = 0x28:0xfffffe0094bdd850 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 32455 (bsnmpd) [ thread pid 32455 tid 100120 ] Stopped at strlcpy+0x25: movb (%rax),%dl db:0:kdb.enter.default> textdump set textdump set db:0:kdb.enter.default> capture on db:0:kdb.enter.default> run lockinfo db:1:lockinfo> show locks No such command db:1:locks> show alllocks No such command db:1:alllocks> show lockedvnods Locked vnodes db:0:kdb.enter.default> show pcpu cpuid = 1 dynamic pcpu = 0xfffffe00f795a700 curthread = 0xfffff8000cde3000: pid 32455 "bsnmpd" curpcb = 0xfffffe0094bddcc0 fpcurthread = none idlethread = 0xfffff80003504920: tid 100004 "idle: cpu1" curpmap = 0xfffff8000c2a1f38 tssp = 0xffffffff8218d078 commontssp = 0xffffffff8218d078 rsp0 = 0xfffffe0094bddcc0 gs32p = 0xffffffff8218ead0 ldt = 0xffffffff8218eb10 tss = 0xffffffff8218eb00 db:0:kdb.enter.default> bt Tracing pid 32455 tid 100120 td 0xfffff8000cde3000 strlcpy() at strlcpy+0x25/frame 0xfffffe0094bdd850 sysctl_rman() at sysctl_rman+0x1e1/frame 0xfffffe0094bdd930 sysctl_root() at sysctl_root+0x232/frame 0xfffffe0094bdd980 userland_sysctl() at userland_sysctl+0x1d8/frame 0xfffffe0094bdda30 sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe0094bddae0 amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0094bddbf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0094bddbf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fb598a, rsp = 0x7fffffffa3c8, rbp = 0x7fffffffa400 --- db:0:kdb.enter.default> ps pid ppid pgrp uid state wmesg wchan cmd 32757 307 21 0 R+ CPU 0 bsdtar 32455 1 32455 0 Rs CPU 1 bsnmpd 31844 31745 31844 0 Ss (threaded) charon 100137 S uwait 0xfffff8000c232700 charon 100136 S uwait 0xfffff8000c232800 charon 100135 S uwait 0xfffff8000c232900 charon 100134 S uwait 0xfffff8000c232a00 charon 100133 S uwait 0xfffff8000c233680 charon 100132 S uwait 0xfffff8000c20f000 charon 100131 S uwait 0xfffff8000c1aee80 charon 100130 S uwait 0xfffff8000c769f00 charon 100129 S uwait 0xfffff8000c210200 charon 100128 S select 0xfffff8000cd749c0 charon 100127 S uwait 0xfffff8000c210800 charon 100126 S select 0xfffff8000cd74ac0 charon 100125 S uwait 0xfffff8000c210a00 charon 100124 S accept 0xfffff8000ce015d6 charon 100123 S uwait 0xfffff8000cc5b600 charon 100122 S uwait 0xfffff8000cc5ba80 charon 100113 S sigwait 0xfffff8000c5bc000 charon 31745 1 31745 0 Ss select 0xfffff8001825ec40 starter 28327 21809 21 0 S+ kqread 0xfffff8000c5fa200 ntpdate 27985 1 27985 1002 Ss select 0xfffff8000c1aee40 dhcpd 26844 1 26774 65534 S select 0xfffff8000cd77940 dnsmasq 21809 1 21 0 S+ wait 0xfffff8000cde14c0 sh 20182 1 20094 0 S kqread 0xfffff8000c283900 lighttpd 15922 15629 15629 0 S piperd 0xfffff8000c30d000 rrdtool 15629 1 15629 0 Ss select 0xfffff8000cc71d40 apinger 13334 5924 13334 0 Ss (threaded) sshlockout_pf 100108 S nanslp 0xffffffff81f5df70 sshlockout_pf 100106 S piperd 0xfffff8000c3072e8 sshlockout_pf 12980 1 12980 0 Ss select 0xfffff8000c769ac0 sshd 12902 1 12902 0 Ss select 0xfffff8000c769cc0 inetd 11647 287 287 0 S accept 0xfffff8000c53c066 php-fpm 10063 1 10063 0 Ss select 0xfffff8000c76a140 openvpn 9779 1 9779 0 Ss bpf 0xfffff8000c221800 filterlog 5924 1 5924 0 Ss select 0xfffff8000c76a8c0 syslogd 316 1 316 0 Ss select 0xfffff8000c212dc0 devd 307 21 21 0 S+ piperd 0xfffff8000c3068b8 php 304 302 302 0 S kqread 0xfffff8000c1eed00 check_reload_status 302 1 302 0 Ss kqread 0xfffff8000c248e00 check_reload_status 287 1 287 0 Ss kqread 0xfffff8000c213100 php-fpm 59 0 0 0 DL mdwait 0xfffff8000a0c6800 [md1] 54 0 0 0 DL mdwait 0xfffff8000c209000 [md0] 21 1 21 0 Ss+ pause 0xfffff8000c2b4a28 sh 20 0 0 0 DL syncer 0xffffffff81faef08 [syncer] 19 0 0 0 DL vlruwt 0xfffff8000c224980 [vnlru] 18 0 0 0 DL psleep 0xffffffff81fae104 [bufdaemon] 17 0 0 0 DL pgzero 0xffffffff82100e8c [pagezero] 9 0 0 0 DL pollid 0xffffffff81f5c8f0 [idlepoll] 8 0 0 0 DL psleep 0xffffffff821005c0 [vmdaemon] 7 0 0 0 DL psleep 0xffffffff8218c384 [pagedaemon] 6 0 0 0 DL waiting_ 0xffffffff8217cdf0 [sctp_iterator] 5 0 0 0 DL pftm 0xffffffff80cff710 [pf purge] 16 0 0 0 DL (threaded) [usb] 100063 D - 0xfffffe0000976e18 [usbus6] 100062 D - 0xfffffe0000976dc0 [usbus6] 100061 D - 0xfffffe0000976d68 [usbus6] 100060 D - 0xfffffe0000976d10 [usbus6] 100059 D - 0xfffffe0000981460 [usbus5] 100058 D - 0xfffffe0000981408 [usbus5] 100057 D - 0xfffffe00009813b0 [usbus5] 100056 D - 0xfffffe0000981358 [usbus5] 100055 D - 0xfffffe000096d460 [usbus4] 100054 D - 0xfffffe000096d408 [usbus4] 100053 D - 0xfffffe000096d3b0 [usbus4] 100052 D - 0xfffffe000096d358 [usbus4] 100049 D - 0xfffffe0000962e18 [usbus3] 100048 D - 0xfffffe0000962dc0 [usbus3] 100047 D - 0xfffffe0000962d68 [usbus3] 100046 D - 0xfffffe0000962d10 [usbus3] 100045 D - 0xfffffe0000959460 [usbus2] 100044 D - 0xfffffe0000959408 [usbus2] 100043 D - 0xfffffe00009593b0 [usbus2] 100042 D - 0xfffffe0000959358 [usbus2] 100041 D - 0xfffffe000092ce18 [usbus1] 100040 D - 0xfffffe000092cdc0 [usbus1] 100039 D - 0xfffffe000092cd68 [usbus1] 100038 D - 0xfffffe000092cd10 [usbus1] 100036 D - 0xfffffe0000923460 [usbus0] 100035 D - 0xfffffe0000923408 [usbus0] 100034 D - 0xfffffe00009233b0 [usbus0] 100033 D - 0xfffffe0000923358 [usbus0] 4 0 0 0 DL (threaded) [cam] 100071 D - 0xffffffff81e96ac0 [scanner] 100027 D - 0xffffffff81e96c80 [doneq0] 3 0 0 0 DL crypto_r 0xffffffff820fea90 [crypto returns] 2 0 0 0 DL crypto_w 0xffffffff820fe938 [crypto] 15 0 0 0 DL - 0xffffffff81eb4180 [rand_harvestq] 14 0 0 0 DL (threaded) [geom] 100013 D - 0xffffffff82171560 [g_down] 100012 D - 0xffffffff82171558 [g_up] 100011 D - 0xffffffff82171550 [g_event] 13 0 0 0 DL (threaded) [ng_queue] 100010 D sleep 0xffffffff81e54fc8 [ng_queue1] 100009 D sleep 0xffffffff81e54fc8 [ng_queue0] 12 0 0 0 WL (threaded) [intr] 100079 I [swi1: netisr 1] 100069 I [swi1: pfsync] 100067 I [swi1: pf send] 100064 I [swi0: uart uart] 100051 I [irq15: ata1] 100050 I [irq14: ata0] 100037 I [irq17: ehci0 ehci1+] 100032 I [irq18: ohci0 ohci1*] 100031 I [irq19: ahci0] 100030 I [irq261: re2] 100029 I [irq260: re1] 100028 I [irq259: re0] 100025 I [swi5: fast taskq] 100023 I [swi6: Giant taskq] 100021 I [swi6: task queue] 100008 I [swi3: vm] 100007 I [swi4: clock] 100006 I [swi4: clock] 100005 I [swi1: netisr 0] 11 0 0 0 RL (threaded) [idle] 100004 CanRun [idle: cpu1] 100003 CanRun [idle: cpu0] 1 0 1 0 SLs wait 0xfffff800034fe4c0 [init] 10 0 0 0 DL audit_wo 0xffffffff82183970 [audit] 0 0 0 0 DLs (threaded) [kernel] 100070 D - 0xfffff800035a9000 [CAM taskq] 100065 D - 0xfffff8000a150900 [mca taskq] 100026 D - 0xfffff800035a9200 [kqueue taskq] 100024 D - 0xfffff800035a9700 [thread] 100022 D - 0xfffff800035a9c00 [ffs_trim taskq] 100020 D - 0xfffff800035aa400 [acpi_task_2] 100019 D - 0xfffff800035aa400 [acpi_task_1] 100018 D - 0xfffff800035aa400 [acpi_task_0] 100014 D - 0xfffff800034f2500 [firmware taskq] 100000 D swapin 0xffffffff82171658 [swapper][/thread]
-
bsnmp is something that should be replaced with net-snmp. Already done in PCBSD, FreeNAS… for good reasons.
1 of my next projects is investigating ZenOSS Core
That's the monster Java-based bloatware that eats 8+GB of RAM to get barely running? Good luck. I'd rather waste my time with something else.
-
Hello!
i have the same issue.
I'm not so familiar with the command line. Would it be possible to explain the workaround a little bit more? Maybe it would be possible to post the commands?
Tanks a lot! -
bsnmp is something that should be replaced with net-snmp. Already done in PCBSD, FreeNAS… for good reasons.
1 of my next projects is investigating ZenOSS Core
That's the monster Java-based bloatware that eats 8+GB of RAM to get barely running? Good luck. I'd rather waste my time with something else.
I know but the customer wants to have it, so if I say no the I not earn my money.
-
@a_r:
Hello!
i have the same issue.
I'm not so familiar with the command line. Would it be possible to explain the workaround a little bit more? Maybe it would be possible to post the commands?
Tanks a lot!I will try, but some parts are different depending on the hardware used
- Single user mode boot: Select single user in the selection menu.
- accept the shell command presented
- run command: mount
- look for a line like this: /dev/ada0s1a on / (ufs, local)
- Remember the /dev/xxxxx part
- run command: fsck -y /dev/xxxxx
- run again the same command
- run command: mount -o rw /
- run command: mv /usr/sbin/bsnmpd /usr/sbin/bsnmpd.old
- run command: reboot
Now the system will start normally. Do not forget to disable snmp after the boot.
-
Thanks for the answer. but i have one problem. If I boot the device over serial console (Win / putty) i cannot see the selection menu for the single user menu. Have anyone a idea? Do you see this screen?
-
What serial speed are you using - the BIS might be coming out at one speed (e.g. 38400) and then the FreeBSD/pfSense at another (e.g. 115200). My APU console spits out the menu at 115200.
If it is some other issue, then the menu is started with:
/etc/rc.initial
But I am guessing your issue is just console speed?
-
i use 115200. I also tried 38400 but no luck…
-
Finally I restored my pfsense. Here are the steps I tried:
After I had no change to boot into the single user mode I used an USB Stick with an pfsense Live environment. There I was able to boot in the single user mode. Then I mount the original drive and rename the /usr/sbin/bsnmpd. After that I was able to boot but the Webinterface always showed a 500 internal Server error and no traffic was passed the pfsense. In the end I decide to reinstall 2.1.5 and restore my configuration. So I will wait for a fixed 2.2… -
@a_r:
Hello!
i have the same issue.
I'm not so familiar with the command line. Would it be possible to explain the workaround a little bit more? Maybe it would be possible to post the commands?
Tanks a lot!I will try, but some parts are different depending on the hardware used
- Single user mode boot: Select single user in the selection menu.
- accept the shell command presented
- run command: mount
- look for a line like this: /dev/ada0s1a on / (ufs, local)
- Remember the /dev/xxxxx part
- run command: fsck -y /dev/xxxxx
- run again the same command
- run command: mount -o rw /
- run command: mv /usr/sbin/bsnmpd /usr/sbin/bsnmpd.old
- run command: reboot
Now the system will start normally. Do not forget to disable snmp after the boot.
I got only a "command not found" when running mount, fsck or reboot.
I noticed that /sbin/ was not in the PATH-variable, so I had to use the full path for mount, fsck and reboot: /sbin/mount, /sbin/fsck and /sbin/reboot -
I also tried to rename the bsnmpd.conf - without luck.
I tried different things and in the end, it's working now …
1. I installed 2.2 (memstick-serial-release) using an usb stick - using default values during installation process.
2. I booted up the pfsense and configured the necessary VLAN, the WAN and the LAN interface via serial console.
3. I imported the config.xml (copied away from the "old" failed installation via usb drive) over the webinterface - NOPE, after a reboot, I had the same problems again
4. I started over (steps 1 & 2) and imported only parts (areas) from the old config.xml via the webinterface - SUCCESS. Just without OpenVPN certificates yet ...
5. I exported the current configuration via the webinterface, inserted ONLY the <ca>- and <cert>-section of the old config.xml into the new config.xml. Then I restored the modified configuration via the webinterface (areas: "all") and rebooted - NOPE, same problems ...
6. I started over (steps 1, 2, 4). First I thought there might be something wrong with the certificates, but: I exported the configuration via the webinterface, and restored EXACTLY THE SAME configuration via the webinterface (areas: "all") - NOPE, same problems ...
7. In the end, I started over (steps 1, 2, 4), downloaded the current configuration via webinterface, inserted the <ca>- and <cert>-sections, copied this config to an usb-drive, and replaced the file /cf/conf/config.xml with this file. After a reboot, everything is working, even OpenVPN.I'm wondering if the WebGUI-Restore-Functionality is broken in 2.2 ... ?
Export seems to work without problems, as I used it for my current configuration (step 7)Greetz
lousek</cert></ca></cert></ca> -
If the problem is only related to bsnmpd and for those who can live without snmp, isn't it enough to simply disable snmp before upgrading and then upgrade to 2.2 ?