Strange WRITE_DMA errors when switching on network port
Hi Guys,
I have two pfsense boxes on which CARP was working fine. However I have now changed my switches from two Cisco 2950s to two Cisco 3750Gs that are stacked.
I have one interface that we run all VLANs on, the first firewall and first switch run fun, the port is a dot1q trunk.
However as soon as i turn the switchport on the sw2 (connected to fw2) i see the following errors:
Jun 15 18:18:04 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=1129359
Jun 15 18:18:04 kernel: g_vfs_done():ad0s1a[WRITE(offset=578191360, length=16384)]error = 5
Jun 15 18:18:04 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=1505711
Jun 15 18:18:04 kernel: g_vfs_done():ad0s1a[WRITE(offset=770883584, length=16384)]error = 5
Jun 15 18:18:05 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=2258575
Jun 15 18:18:05 kernel: g_vfs_done():ad0s1a[WRITE(offset=1156349952, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=3387471
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=5645583
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=2890498048, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=6398287
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=3275882496, length=16384)]error = 5
Jun 15 18:18:11 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=7151599
Jun 15 18:18:11 kernel: g_vfs_done():ad0s1a[WRITE(offset=3661578240, length=16384)]error = 5
Jun 15 18:18:12 kernel: ad0: FAILURE - WRITE_DMA status=51 <ready,dsc,error>error=4 <aborted>dma=0x06 LBA=7527343
Jun 15 18:18:12 kernel: g_vfs_done():ad0s1a[WRITE(offset=3853959168, length=16384)]error = 5
Jun 15 18:18:18 kernel: ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=376655Once I turn the switchport off the errors disappear, but obviously I can't access my vlans.
I have tried everything I can think of, including reinstalling pfsense, and even creating a whole new config.
Any ideas what is causing this?
Many Thanks,</aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error></aborted></ready,dsc,error>
Your HDD controller might be sharing an IRQ with that port, you can check with:
vmstat -i
At a shell prompt or Diagnostics > Command
You might have to change some options in the BIOS to fix that, or shut off DMA for the hard drive.
Usually that error is indicative of a hard drive, cable, or controller error (typically one of them is faulty) but if it only happens when you enable something else, there may be hope.
Hi Jimp,
Thanks for that information, I have disabled DMA and the exact same thing is happening. When I enable the switchport the errors appear.
I don't think it is sharing an IRQ either. The HDD is a 4GB CF Card connected with a Sata-CF Converter, and has been working fine until upgrading to 1.2.3 and changing our switches.
Do you have any idea what else could be causing the problem?
Output from vmstat -i
$ vmstat -i
interrupt total rate
irq1: atkbd0 12 0
irq14: ata0 2539 10
irq16: re3 uhci3 35 0
irq18: re1 uhci2 521 2
irq19: re2 uhci1 3634 14
irq23: uhci0 ehci0 1 0
cpu0: timer 500965 1995
irq256: re0 5466 21
cpu1: timer 500908 1995
cpu3: timer 500908 1995
cpu2: timer 500909 1995
Total 2015898 8031Many Thanks,
Try editing /boot/loader.conf and adding this line:
And then reboot
CF converters are not known for their great DMA compatibility…
Hi Jimp,
I tried that, it did reduce the errors but they were still there. As a last ditch attempt I stuck in a 160gb SATA disk i had laying around and that worked perfectly. So it must have been something strange with the converter.
Strange thing is, I have the exact same setup on my primary firewall, with a 4GB CF card and converter, upgraded that to 1.2.3 and worked without any problems. So I am not sure why I had issues with the backup firewall, it would be a very strange coincidence if there was a hardware failure at the same time as upgrading the software.
Either way things are back up and running, thanks for your help, much appreciated.