Updated today to latest snap - very slow

phil.davis

My test system with 8.3-RELEASE also had Squid and SquidGuard packages running. For some reason, between "Configuring WAN interface" and "Configuring LAN interface", Squid processes would start. Then later the package startup would also start Squid. There is some sort of double-startup of Squid with the current 2.1-DEVELOPMENT and whatever is happening with the PBI installation process - I will look at that later and start another thread for it.

I removed SquidGuard and Squid.

Now "mount -uw /" is instant (as always) and "mount -ur /" is about 2 seconds.

My 8.3-RC2 snapshot didn't have any packages installed, so that seems to be why it is fast to mount read-only.

After adding packages (at least PBIs like Squid/SquidGuard) the time to mount read-only increases substantially.

jimp

The freshly imaged card was indeed much slower, taking ~30 seconds to mount ro for me.

I don't think it's FreeBSD 8.3 in general though, there were other changes made to how the nanobsd images are generated which could be contributing.

markky

@jimp:

Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.

I have a soekris 4801 (and a couple of 4501's, but not enough ram). Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc. If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string. usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in. I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause. If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime. ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /. On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

Mark

jimp

Recently we changed the builder to use conv=sparse when making image files to save time/space on the builders. Not sure if that's related, but I backed it out today. We'll have to wait and see if the snapshots from tomorrow, when freshly imaged, work better.

I never did have a problem with my card that I had running 2.0.1 and firmware upgraded to 2.1. Only with a fresh image.

moullas

jimp, I guess that means tomorrow when new images are created a full re-flash is in order right?

Can you let us know when we can test? (weekend coming up, good opportunity) :)

thanks !

bardelot

@jimp:

I never did have a problem with my card that I had running 2.0.1 and firmware upgraded to 2.1. Only with a fresh image.

On my system it's exactly the other way around. The fresh image (same snapshot) is much faster. Although it might also be the CF card which I replaced from 1G to 4G. The datarate while downloading a snapshot increased from 20kbps up to 500kbps, and any saving operation only takes a few seconds again.

markky

@markky:

@jimp:

Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.

I have a soekris 4801 (and a couple of 4501's, but not enough ram). Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc. If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string. usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in. I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause. If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime. ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /. On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

Mark

I forgot to say that this is likely a race condition, probably triggered by the longer flash write time. It sounds to me very similar to
http://www.freebsd.org/cgi/query-pr.cgi?pr=149022
which apparently was seen on normal drives under heavy load. The bug wasn't resolved and was closed due to lack of feedback.
There was a query about whether a fix to the soft-update code fixed it.
Soft-updates aren't enabled on the flash device. However I did try enabling soft-updates just to see if a change in the write timing might work around the issue I was seeing.

Would be much easier to debug if I could get a 2.1 build environment working, but that's the topic of another post…

"show mount" from the kernel debugger would be very interesting.

Mark

bao

@jimp:

Recently we changed the builder to use conv=sparse when making image files to save time/space on the builders. Not sure if that's related, but I backed it out today. We'll have to wait and see if the snapshots from tomorrow, when freshly imaged, work better.

I noticed something different, which may be related.

In the create_nanobsd_diskimage (), a change was made to the zeroing of the nanobsd image:

dd if=/dev/zero of=${IMG} bs=${NANO_SECTS}b
count=0 seek=`expr ${NANO_MEDIASIZE} / ${NANO_SECTS}

That did not really do anything.

Later on, the dd command was used to write the image, with "conv=sparse". I don't think it matters at this time, since the image probably contains a lot of garbage.

We changed it back for our purposes, since we build 8G and 16GB nanobsd images and we can't compress them to reasonable sizes.

dd if=/dev/zero of=${IMG} bs=${NANO_SECTS}b
count=`expr ${NANO_MEDIASIZE} / ${NANO_SECTS}

On another note, the last patch from ermalluci broke the build process in the last couple of days. We made changes by hands to the headers to get it going again, i.e.

From
….....
diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c
index a4890dd..5b5fb19 100644
--- a/sys/netinet/ip_carp.c
+++ b/sys/netinet/ip_carp.c
.........
To
.........
Index: sys/netinet/ip_carp.c

--- sys/netinet/ip_carp.c
+++ sys/netinet/ip_carp.c
.........

databeestje

the count=0 and seek=number means we don't actually write the image out with zeros to disk.

When reading the file it will have the correct size and everything will be returned as 0.

But you don't actually write an entire 4-16GB file. So it's much faster.

podilarius

On another note, the last patch from ermalluci broke the build process in the last couple of days. We made changes by hands to the headers to get it going again, i.e.

The latest updates seems to have fixed the issue. I ran through the patches and they seem to be fine now. I am going to complete a build tonight and I am guessing the build servers will also produce a snapshot sometime later tonight.

moullas

So.. I've updated to snapshot 2.1-DEVELOPMENT (i386) built on Fri Apr 20 22:46:42 EDT 2012, however I did not re-flash, just update from the Webif

Speed is still the same (slow).

Should I do a full re-image to see if this got fixed?

podilarius

I ran mine as an update, but, I had just completed a clean install. I would backup and re-image to to make sure there is nothing holding you back.

phil.davis

Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

$ mount -v
/dev/ufs/pfsense0 on / (ufs, local, noatime, synchronous, fsid c050954f24ebab91)
devfs on /dev (devfs, local, fsid 00ff000303000000)
/dev/md0 on /tmp (ufs, local, fsid 8843964f53e8126e)
/dev/md1 on /var (ufs, local, fsid 8843964f5d6ec7d7)
/dev/ufs/cf on /cf (ufs, local, noatime, synchronous, fsid 0758954f63bcd04b)
devfs on /var/dhcpd/dev (devfs, local, fsid 01ff000303000000)

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?

markky

@markky:

@markky:

@jimp:

Things are smooth/snappy on my ALIX with a snapshot from today, I've not seen this slowness myself. Saving does take about 6-7 seconds but it always has on ALIX.

Using HTTPS to get to the GUI, everything seems happy.

I have a soekris 4801 (and a couple of 4501's, but not enough ram). Works fine with 2.0.1, but every 2.1 snapshot I've tried in the last week or two exhibits the stall problem.

A number of people have mentioned shell commands like 'ls' hanging etc. If you type ^T (ctrl T) during one of these stalls you'll see that the process is waiting on suspfs, which is one of two functions in kern/vfs_vnops.c (both vn_start_write and vn_start_secondary_write call msleep with the same wmesg string. usually the wait message is unique so that there is no confusion which [tm]sleep we're waiting in. I think this is a bug in itself and the msleep in vn_start_secondary_write should be changed to "suspf2" or something similar to differentiate it ).

Anyway, seeing that processes are stalling in either of these functions leads you back to the mount updates being the cause. If you run /etc/rc.conf_mount_ro from a shell, followed by /etc/rc.conf_mount_rw I see a stall on the second command everytime. ^T reports that the mount process is waiting on biowr.

I put some logging into /etc/inc/config.lib.php (where conf_mount_ro and rw function reside) to determine which of the 4 mount updates was stalling (syslog conveniently timestamps the messages) and it's the read only mount update of /. On my system it's a stall of between 107 and 110 seconds.

I can't reproduce the behaviour using the same image under virtualbox.

The CF card in this 4801 is a cheap card with a brand I had not heard of, I'm going to try a sandisk card when I can find it.

Mark

I forgot to say that this is likely a race condition, probably triggered by the longer flash write time. It sounds to me very similar to
http://www.freebsd.org/cgi/query-pr.cgi?pr=149022
which apparently was seen on normal drives under heavy load. The bug wasn't resolved and was closed due to lack of feedback.
There was a query about whether a fix to the soft-update code fixed it.
Soft-updates aren't enabled on the flash device. However I did try enabling soft-updates just to see if a change in the write timing might work around the issue I was seeing.

Would be much easier to debug if I could get a 2.1 build environment working, but that's the topic of another post…

"show mount" from the kernel debugger would be very interesting.

Mark

Well after way too much messing around, and finally abandoning the prebuilt developer ova virtual image, starting with a fresh install of freebsd 8.3 worked far simpler.

I upgraded my soekris 4801 to my first self built upgrade test image (from build_nano.sh). That worked fine, but still exhibits the stall condition.
Then I tweaked build_nano.sh so that I had a ddb enabled kernel (i.e. call build_embedded_dev_kernel instead of build_embedded_kernel).
This kernel sees the same issues. show mount is indeed very interesting. There are way way more secondary writes happening than I'd expect on a quiescent filesystem. I haven't had time to look at this in depth, but it kind of looks like every open vnode is resulting in an update write of the inode, despite the fact noasync is set… It's been a bit problematic investigating as show mount root produces so much output that it uncovered a bug in the usb serial interface driver I use to to connect to the 4801 (generic prolific usb device, the osx driver seems to not handle overruns very well).

After poking around a little in the kernel debugger, I finally got around to trying the exact same image on a sandisk flash card (by dd'ing from the cheapo card to the sandisk card). Interestingly enough, there were no stalls going from read only to read write when I was using the (higher spec) sandisk flash. I speculated earlier that this seemed like a race condition, this seems to confirm at least that timing is an important factor.

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card. I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

Mark

jimp

I only had the problem with a full fresh image. I haven't tried a new one yet, but I can write out the exact same CF I had problems with.

@phil.davis:

Alix 2D13, 2G CF card
2.1-DEVELOPMENT (i386)
built on Mon Apr 23 08:11:16 EDT 2012
FreeBSD 8.3-RELEASE

I just flashed this current build to a CF card.
"mount -uw /" and "mount -ur /" are both <<1 second straight after a basic wizard setup.
After installing cron, pfblocker, squid and squidguard, "mount -ur /" is 10 to 15 seconds - I guess it is doing some extra thinking about all the files that are open on "/" before switching it to read-only.

So you're saying that, on the same CF card that was slow with previous images, it's now fast with a new fresh image?

@phil.davis:

I am also wondering about the mount state of "/" immediately after boot. I had assumed it would be mounted read-only, but if I use the webGUI Diagnostics, Command Prompt to do "mount -v" then this is what is displayed:

No read-only to be seen.

Same thing if a access the console, option 8 "Shell" and "mount -v".

Is this expected? Maybe the code that does the commands or drops to the shell enables write before giving control to the user command?

Diag > Command does switch to rw. The shell does not. However, some packages can monkey with what happens there… In particular, after a firmware update that reinstalls packages, often you have to reboot after the update before it goes to ro properly.

markky

@markky:

…

Many have also suggested that there's a difference in behaviour between an upgrade an a fresh install of the flash card. I doubt it actually has any impact, but I'll run that test tomorrow on the cheapo card to eliminate/confirm it.

Mark

As expected, the slow CF card still exhibits stalls in mount rw -> ro irrespective of whether the image is an upgrade or a complete reimage of the flash.

As far as I can remember, my slow CF card can be written at just under 4 MB/s from the dev virtual machine. The sandisk card by comparison writes at over 5MB/s.

Mark

phil.davis

2.1-DEVELOPMENT (i386)
built on Wed Apr 25 15:47:29 EDT 2012
FreeBSD 8.3-RELEASE

Here is some testing to observe what happens with read-only on a fresh 2.1 nanobsd system:
After initial boot, from console shell (option 8), mount -v - "/" and "/cf" are read-only - good.
During the wizard they remain read-only, after clicking "reload" to implement the wizard settings, they go to read-write, but then they stay that way. I tried logging off from the webGUI, still read-write - unexpected but most people will do more configuration at this point and probably reboot anyway.
After another reboot they are read-only again - good.
After defining an alias and applying, they are read-only - good. So a simple change like defining an alias is putting "/cf" back to read-only.

"mount -ur /" and "mount -uw /" are really quick (<<1sec) on this test box with a simple configuration.

phil.davis

I installed cron and pfblocker - the installation of those packages correctly switches to read-write then back to read-only when the installation is done.
Then I installed squid - at the end of the installation, "/" and "/cf" are both left read-write.
After a reboot, "/" and "/cf" are still both read-write.
So squid does something to the system that negates the read-only filesystems.

tix

I would like to add that I have the slow response problem as well on my Firebox x700 (non "e") that will last for about 20 - 30 mins after booting. After that the webif responds a what consider a normal speed.

I've also noticed that on every boot that all filesystems are RW and remain that way unless I remount them RO. However, if I force /cf to be RO then the web interface returns to 'normal' speeds.

I do have the LCDProc-dev package and the OpenVPN Client Export Utility installed but I'm going to uninstall them to see if that corrects the RW filesystem problem at least.

stephenw10

Try testing with the /etc/rc.conf_mount_rw and /etc/rc.conf_mount_ro commands instead of calling mount directly since this is what the package system uses.

Steve

Updated today to latest snap - very slow

From …..... diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c index a4890dd..5b5fb19 100644 --- a/sys/netinet/ip_carp.c +++ b/sys/netinet/ip_carp.c ......... To ......... Index: sys/netinet/ip_carp.c

From
….....
diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c
index a4890dd..5b5fb19 100644
--- a/sys/netinet/ip_carp.c
+++ b/sys/netinet/ip_carp.c
.........
To
.........
Index: sys/netinet/ip_carp.c