Updated today to latest snap - very slow

phil.davis

Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

$ mount -v
/dev/ufs/pfsense0 on / (ufs, local, noatime, synchronous, fsid c050954f24ebab91)
devfs on /dev (devfs, local, fsid 00ff000303000000)
/dev/md0 on /tmp (ufs, local, fsid 8843964f53e8126e)
/dev/md1 on /var (ufs, local, fsid 8843964f5d6ec7d7)
/dev/ufs/cf on /cf (ufs, local, noatime, synchronous, fsid 0758954f63bcd04b)
devfs on /var/dhcpd/dev (devfs, local, fsid 01ff000303000000)

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?

markky

@markky:

@markky:

@jimp:

Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.

I have a soekris 4801 (and a couple of 4501's, but not enough ram). Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc. If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string. usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in. I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause. If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime. ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /. On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

Mark

I forgot to say that this is likely a race condition, probably triggered by the longer flash write time. It sounds to me very similar to
http://www.freebsd.org/cgi/query-pr.cgi?pr=149022
which apparently was seen on normal drives under heavy load. The bug wasn't resolved and was closed due to lack of feedback.
There was a query about whether a fix to the soft-update code fixed it.
Soft-updates aren't enabled on the flash device. However I did try enabling soft-updates just to see if a change in the write timing might work around the issue I was seeing.

Would be much easier to debug if I could get a 2.1 build environment working, but that's the topic of another post…

"show mount" from the kernel debugger would be very interesting.

Mark

Well after way too much messing around, and finally abandoning the prebuilt developer ova virtual image, starting with a fresh install of freebsd 8.3 worked far simpler.

I upgraded my soekris 4801 to my first self built upgrade test image (from build_nano.sh). That worked fine, but still exhibits the stall condition.
Then I tweaked build_nano.sh so that I had a ddb enabled kernel (i.e. call build_embedded_dev_kernel instead of build_embedded_kernel).
This kernel sees the same issues. show mount is indeed very interesting. There are way way more secondary writes happening than I'd expect on a quiescent filesystem. I haven't had time to look at this in depth, but it kind of looks like every open vnode is resulting in an update write of the inode, despite the fact noasync is set… It's been a bit problematic investigating as show mount root produces so much output that it uncovered a bug in the usb serial interface driver I use to to connect to the 4801 (generic prolific usb device, the osx driver seems to not handle overruns very well).

After poking around a little in the kernel debugger, I finally got around to trying the exact same image on a sandisk flash card (by dd'ing from the cheapo card to the sandisk card). Interestingly enough, there were no stalls going from read only to read write when I was using the (higher spec) sandisk flash. I speculated earlier that this seemed like a race condition, this seems to confirm at least that timing is an important factor.

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card. I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

Mark

jimp

I only had the problem with a full fresh image. I haven't tried a new one yet, but I can write out the exact same CF I had problems with.

@phil.davis:

Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

So you're saying that, on the same CF card that was slow with previous images, it's now fast with a new fresh image?

@phil.davis:

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?

Diag > Command does switch to rw. The shell does not. However, some packages can monkey with what happens there… In particular, after a firmware update that reinstalls packages, often you have to reboot after the update before it goes to ro properly.

markky

@markky:

…

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card. I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

Mark

As expected, the slow CF card still exhibits stalls in mount rw -> ro irrespective of whether the image is an upgrade or a complete reimage of the flash.

As far as I can remember, my slow CF card can be written at just under 4 MB/s from the dev virtual machine. The sandisk card by comparison writes at over 5MB/s.

Mark

phil.davis

2.1-DEVELOPMENT (i386)
built on Wed Apr 25 15:47:29 EDT 2012
FreeBSD 8.3-RELEASE

Here is some testing to observe what happens with read-only on a fresh 2.1 nanobsd system:
After initial boot, from console shell (option 8), mount -v - "/" and "/cf" are read-only - good.
During the wizard they remain read-only, after clicking "reload" to implement the wizard settings, they go to read-write, but then they stay that way. I tried logging off from the webGUI, still read-write - unexpected but most people will do more configuration at this point and probably reboot anyway.
After another reboot they are read-only again - good.
After defining an alias and applying, they are read-only - good. So a simple change like defining an alias is putting "/cf" back to read-only.

"mount -ur /" and "mount -uw /" are really quick (<<1sec) on this test box with a simple configuration.

phil.davis

I installed cron and pfblocker - the installation of those packages correctly switches to read-write then back to read-only when the installation is done.
Then I installed squid - at the end of the installation, "/" and "/cf" are both left read-write.
After a reboot, "/" and "/cf" are still both read-write.
So squid does something to the system that negates the read-only filesystems.

tix

I would like to add that I have the slow response problem as well on my Firebox x700 (non "e") that will last for about 20 - 30 mins after booting. After that the webif responds a what consider a normal speed.

I've also noticed that on every boot that all filesystems are RW and remain that way unless I remount them RO. However, if I force /cf to be RO then the web interface returns to 'normal' speeds.

I do have the LCDProc-dev package and the OpenVPN Client Export Utility installed but I'm going to uninstall them to see if that corrects the RW filesystem problem at least.

stephenw10

Try testing with the /etc/rc.conf_mount_rw and /etc/rc.conf_mount_ro commands instead of calling mount directly since this is what the package system uses.

Steve

eweri

Hello!

Just installed pfSense-2.1-DEVELOPMENT-4g-i386-nanobsd-20120425-2326.img fresh (no update) on my 8GB Kingston CF with my ALIX board and it is still very slow.
Well it is more like it is sleeping for minutes than does its job until it is sleeping for minutes again.

For example: I am connected to the serial console and the first sleep happens when it tries to configure the WAN interface - boot process hangs for at least 2 minutes. After pfsense is completely up and running I select "Assign interfaces" from the menu and try to assign the interfaces in the right order. Well when I tried to detect interfaces automatically it takes very long, until I get any response.
After every interface is assigned it takes very long "writing configuration" but after that my WAN got no IP-address (should use DHCP).
Than I selected "Reboot system" and once again it takes ages ;-) until the system reboots.

But one thing is better than the snapshots before - now after reboot my WAN-interface got an IP-address from my DHCP server.

Hope this helps to solve this problem, bye,
eweri

moullas

re-flashed today to latest snap… still extremely slow, when re-mounting as ro

Btw, my CF is a Lexar 4GB if it matters.. are there any logs that may shed some light?

jimp

To add a couple more data points:

Enabling TRIM on the slice - no effect, still slow
Enabling softupdates on the slice - no effect, still slow
Running sync;sync;sync; before mounting ro - no effect, still slow
The mount process sticks in biowr state, which is disk i/o
While mount is stuck, the system is somewhat unresponsive - can't run a process or connect with ssh, but does appear to continue routing.

jimp

I went ahead and opened a redmine ticket for this:
https://redmine.pfsense.org/issues/2401

markky

@markky:

@markky:

…

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card. I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

Mark

As expected, the slow CF card still exhibits stalls in mount rw -> ro irrespective of whether the image is an upgrade or a complete reimage of the flash.

As far as I can remember, my slow CF card can be written at just under 4 MB/s from the dev virtual machine. The sandisk card by comparison writes at over 5MB/s.

Mark

Within half an hour of putting the 4801 in production with a sandisk 8G card instead of the previous slower 4G card, the system started exhibiting "stall" conditions.

Unfortunately, my production placement makes serial access a pain (and hence kernel debugger) so I haven't garnered much since.

I'm at the point where I'm thinking I'll just rebuild the kernel and use unionfs to reduce flash wear and tear and do the occasional sync to either underlaying fs or a persistent overlay.

Mark

markky

@jimp:

I went ahead and opened a redmine ticket for this:
https://redmine.pfsense.org/issues/2401

Here's a bit more detail for what you described. The fact that flushfiles is flushing out so much on a quiescent filesystem is rather suspect (as a counter point, run lsof and look for writeable files).

The next line is a tty interrogation with ^T
load: 0.22 cmd: mount 2123 [biowr] 5.82r 0.02u 0.08s 7% 1044k

This is the alternate ddb entry, "~^b"

~KDB: enter: Break sequence on console

moullas

So…. has this been abandoned or still actively looked into?

Weekend coming up, have time for testing :-D

jimp

It hasn't been forgotten, but I haven't had any new leads to try. Ermal thought it might be something that was recently fixed on FreeBSD 8-STABLE, but I wasn't able to make usable patches out of the commits I found so for me that was a dead end (someone else might give it a try)

Beerman

Any (good) News? :)

databeestje

None yet

tritron

It seems that the bug still exist interface is very slow unless I execute /etc/rc.conf_mount_rw in ssh. So workaround is to run /etc/rc.conf_mount_rw before making any changes

jimp

We're still aware it's an issue, there is an open bug in redmine for it.