CIFS: Pathetic performance across pfSense

tlum

I am experiencing significant performance problems with CIFS traffic traversing pfSense. Meanwhile iSCSI traffic has no issue, nor does CIFS traffic on the same subnet.

This is a CIFS performance example on the same subnet:

(root@vm4srvp01:/mnt/win/Images)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 1.11347 seconds, 58.9 MB/s
(root@vm4srvp01:/mnt/win/Images)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 1.11088 seconds, 59.0 MB/s
(root@vm4srvp01:/mnt/win/Images)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 1.13448 seconds, 57.8 MB/s

This is an iSCSI performance example on the same subnet:

(root@vm4srvp01:/vmware)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.937938 seconds, 69.9 MB/s
(root@vm4srvp01:/vmware)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.929954 seconds, 70.5 MB/s
(root@vm4srvp01:/vmware)# dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.931392 seconds, 70.4 MB/s

This is an iSCSI performance example traversing pfSense:

(root@my1mdbp01:/mnt/db)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.863001 s, 75.9 MB/s
(root@my1mdbp01:/mnt/db)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.752081 s, 87.1 MB/s
(root@my1mdbp01:/mnt/db)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 0.720176 s, 91.0 MB/s

This is an example of the CIFS issue when traversing pfSense:

(root@my1mdbp01:/mnt/win/Images)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 55.074 s, 1.2 MB/s
(root@my1mdbp01:/mnt/win/Images)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 55.0722 s, 1.2 MB/s
(root@my1mdbp01:/mnt/win/Images)$ dd bs=64k count=1000 if=/dev/zero of=test conv=fdatasync
1000+0 records in
1000+0 records out
65536000 bytes (66 MB) copied, 55.0715 s, 1.2 MB/s

So, this appears to be a protocol problem, and not an infrastructure issue, since there is no bandwidth limitation. This issue was first noted on two different Windows 7 clients, before being testing on the Linux box, so this issue is not specific to a particular OS or flavor. About the only thing in common is the pfSense box.

Any clue what might be causing this?

johnpoz

so how exactly are you mounting it via cifs? and its really cifs or smb or smb2?

I would like to duplicate your testing.

tlum

This is the /etc/fstab entry:

\\MyServer\MyShare /mnt/win cifs user,uid=500,rw,suid,username=MyUser,password=MyPasswd 0 0

I believe it's actually SMB (over port 445). Of course, Windows is far less transparent regarding the protocol it's negotiating.

Wireshark reports the following SMB Service Response Time Statistics:

SMB Commands
Index  	Procedure  Calls  Min SRT   Max SRT   Avg SRT
    47  Write AndX  1000  0.007591  0.058489  0.016606
    50  Trans2         4  0.000328  0.025732  0.006738
     4  Close          1  0.000463  0.000463  0.000463
     5  Flush          1  0.000230  0.000230  0.000230

Transaction 2 Sub-Commands
Index  	Procedure      Calls  Min SRT   Max SRT   Avg SRT
     8  SET_FILE_INFO      2  0.000328  0.000410  0.000369
     5  QUERY_PATH_INFO    1  0.000481  0.000481  0.000481
     6  SET_PATH_INFO      1  0.025732  0.025732  0.025732

It also gives the following overall statistics:

Avg. packets/sec  1201.671
Avg. packet size  995 bytes
Bytes             70666546
Avg. bytes/sec    1195960.346
Avg. Mbit/sec     9.5868

tlum

It just occurred to me that I had a traffic shaper enabled, specifically CODELQ. I tried to delete those queues, but after applying changes I lost all connectivity with the box. I used the console to restore to a point before I delete and then restarted the box. After I got control back I deleted it again, and this time they are gone and the box is still running.

I repeated the CIFS test and the performance problem seems to have been resolved. But now the question turns to why would the traffic shaper do that?