Strange WRITE_DMA errors when switching on network port
-
Hi Guys,
I have two pfsense boxes on which CARP was working fine. However I have now changed my switches from two Cisco 2950s to two Cisco 3750Gs that are stacked.
I have one interface that we run all VLANs on, the first firewall and first switch run fun, the port is a dot1q trunk.
However as soon as i turn the switchport on the sw2 (connected to fw2) i see the following errors:
Jun 15 18:18:04 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=1129359
Jun 15 18:18:04 kernel: g_vfs_done():ad0s1a[WRITE(offset=578191360, length=16384)]error = 5
Jun 15 18:18:04 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=1505711
Jun 15 18:18:04 kernel: g_vfs_done():ad0s1a[WRITE(offset=770883584, length=16384)]error = 5
Jun 15 18:18:05 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=2258575
Jun 15 18:18:05 kernel: g_vfs_done():ad0s1a[WRITE(offset=1156349952, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3387471
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=5645583
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=2890498048, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=6398287
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=3275882496, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=7151599
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=3661578240, length=16384)]error = 5
Jun 15 18:18:12 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=7527343
Jun 15 18:18:12 kernel: g_vfs_done():ad0s1a[WRITE(offset=3853959168, length=16384)]error = 5
Jun 15 18:18:18 kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=376655Once I turn the switchport off the errors disappear, but obviously I can't access my vlans.
I have tried everything I can think of, including reinstalling pfsense, and even creating a whole new config.
Any ideas what is causing this?
Many Thanks,</aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error>
-
Your HDD controller might be sharing an IRQ with that port, you can check with:
vmstat -i
At a shell prompt or Diagnostics > Command
You might have to change some options in the BIOS to fix that, or shut off DMA for the hard drive.
Usually that error is indicative of a hard drive, cable, or controller error (typically one of them is faulty) but if it only happens when you enable something else, there may be hope.
-
Hi Jimp,
Thanks for that information, I have disabled DMA and the exact same thing is happening. When I enable the switchport the errors appear.
I don't think it is sharing an IRQ either. The HDD is a 4GB CF Card connected with a Sata-CF Converter, and has been working fine until upgrading to 1.2.3 and changing our switches.
Do you have any idea what else could be causing the problem?
Output from vmstat -i
$ vmstat -i
interrupt total rate
irq1: atkbd0 12 0
irq14: ata0 2539 10
irq16: re3 uhci3 35 0
irq18: re1 uhci2 521 2
irq19: re2 uhci1 3634 14
irq23: uhci0 ehci0 1 0
cpu0: timer 500965 1995
irq256: re0 5466 21
cpu1: timer 500908 1995
cpu3: timer 500908 1995
cpu2: timer 500909 1995
Total 2015898 8031Many Thanks,
-
Try editing /boot/loader.conf and adding this line:
hw.ata.ata_dma=0
And then reboot
CF converters are not known for their great DMA compatibility…
-
Hi Jimp,
I tried that, it did reduce the errors but they were still there. As a last ditch attempt I stuck in a 160gb SATA disk i had laying around and that worked perfectly. So it must have been something strange with the converter.
Strange thing is, I have the exact same setup on my primary firewall, with a 4GB CF card and converter, upgraded that to 1.2.3 and worked without any problems. So I am not sure why I had issues with the backup firewall, it would be a very strange coincidence if there was a hardware failure at the same time as upgrading the software.
Either way things are back up and running, thanks for your help, much appreciated.