RDD Graphs showing RX traffic on OPT2 interface when none should pass through!



  • Hi PF'ers - first post here.

    We have set up a pfsense firewall on a Seokris net5501. We have WAN, LAN, and OPT1 & OPT2. OPT2 is a DMZ LAN for we are using only for iSCSI storage, and apart from DHCP, NTP time sync requests, and the odd software update check for the SAN software - we should not be seeing any traffic actually passing through on this port.

    However - every day, we are seeing 'SAN' like traffic - 30Mb/s being logged via the RRD graphs 'on' OPT2 (selecting OPT2 from the 'Traffic Tab')

    So naturally I think - uh oh - perhaps there is a system on the LAN trying to get iSCSI routing through the LAN gw to the iSCSI gw @ 100Mb - that's not good. I checked the state table - there is no traffic crossing showing up on this list. I then did a packet trace. Traffic is detected but it's all iSCSI traffic local to the iSCSI LAN - so nothing wrong with that.

    Does traffic that shows up on an interface via the RDD  Graphs (RX only) - is that traffic passing THROUGH the interface and on to somewhere else, or just that that interface is 'sniffing' the traffic somehow of the LAN? Surely if it was passing through - there would be TX and RX no?

    Cheers,

    JD


  • Rebel Alliance Developer Netgate

    It would be traffic that is seen on the interface. Is the traffic the same speed in and out? Or is it only seen in one direction?

    I think, but can't remember 100%, that broadcast traffic is also included in that.

    What kind of switch do you have on OPT2? It might be showing all traffic to all ports like an old-style hub and not really switching. I've heard rumors about some lower-end gigabit gear doing this but haven't seen one myself.



  • Hey thanks for the post…

    The weird thing is that the traffic is only RX - there is nothing TX at all!

    It's almost like this interface is in promiscuous mode, sucking up traffic that is just 'passing' the interface, but not actually crossing it.

    If Broadcast traffic was included, surely that would show up in the packet trace? I am only seeing communication from one IP to another on the same subnet.

    The switch is quite a capable Allied Telesis x600 (Stack). Not low end at all. But I guess it could be a switching configuration thing. It's a very simple config, port based vlans - no tagging, not even layer 3 routes - just layer 2.

    It's not a problem with the RRD as our SNMP system is also picking up traffic 'on' this interface also.

    Any ideas how I might dig a little deeper on this?

    DJ


  • Rebel Alliance Developer Netgate

    You are right, if it were broadcast, you'd see it in the packet capture

    First thing, if you don't see the traffic going anywhere else away from the router (tcpdump/packet capture each other interface), it's probably just being exposed to pfSense on that port.

    If traffic from one local host to another is being exposed to a port that has neither of those IPs (and it isn't broadcast) I'd look at the switch. You might check to see if you enabled something like Private VLANs which might be causing one port to think it needs to send the traffic via its gateway instead of directly.



  • Ah - I forgot to mention…

    I have 8 CARP VIPS on this setup - do you think that might be it?

    1. There is a tonne of multi cast stuff being detected on the switch
    2. On the firewall I see (after only 5 seconds of packet capture)

    19:48:40.044593 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:41.054594 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:42.064731 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:43.074625 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:44.084625 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:45.094641 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:46.104905 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:47.114649 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36
    19:48:48.124657 IP 192.168.200.2 > 224.0.0.18: VRRPv2, Advertisement, vrid 1, prio 0, authtype none, intvl 1s, length 36

    This is all RX traffic not TX.

    So can we conclude that all this traffic is multicast and CARP related?

    This amount of traffic normal?

    JD


  • Rebel Alliance Developer Netgate

    CARP traffic should not reach 30Mb/s. It does a heartbeat on the line but it's a very small packet and only once per second.



  • OK - this is issue if officially completely WEIRD! I have lots more info on this one now:

    1. RRD graphs show that this rouge bandwidth event is cyclical and regular. 408 seconds of 'quiet where there is only VRRP, then we have 180 seconds of SAN traffic. This is why I wasn't picking anything up in the packet trace.

    2. When I run the packet trace, the same two devices show up with the data passing in the same direction between them. One server is the SAN (HP server) the other is a web server (Supermicro). There are other servers on this iSCSI LAN, but for some reason, the only one that ever shows up in the traces are the web server. The traces look like this:

    11:35:16.620787 IP (tos 0x0, ttl 64, id 64490, offset 0, flags [DF], proto TCP (6), length 40) 192.168.200.100.3260 > 192.168.200.36.1025: ., cksum 0x4152 (correct), ack 4212662283 win 65520
    11:35:16.620847 IP (tos 0x0, ttl 64, id 64491, offset 0, flags [DF], proto TCP (6), length 40) 192.168.200.100.3260 > 192.168.200.36.1025: ., cksum 0x4122 (correct), ack 49 win 65520
    11:35:16.620907 IP (tos 0x0, ttl 64, id 64492, offset 0, flags [DF], proto TCP (6), length 40) 192.168.200.100.3260 > 192.168.200.36.1025: ., cksum 0x40f2 (correct), ack 97 win 65520

    Where 192.168.200.100 is the iSCSI NIC on the SAN, and 192.168.200.36 is the iSCSI NIC on the Supermicro.

    Opening up in wireshark - it all looks OK to me. I see the MAC of the HP as the source, and the MAC of the Supermicro as the destination. But like I said, never the other way around, and never with any other servers!

    In the three minute data window I am seeing full on iSCSI traffic - all the traffic that is destined for the Webserver, that has a gigabit NIC. So we have 80Mb/s showing at one point.

    I am going to take a look at the switch port that this server is plugged in to, and compare with another one that doesn't have this issue.

    If you / anyone else has any other suggestions please feed them over!

    JD


  • Rebel Alliance Developer Netgate

    That is definitely weird, but still points to the switch being the culprit somehow, as that traffic should not be exposed to any other port.



  • Yeah that's a switch problem for sure. Make sure your firewall's port isn't setup as a span/monitor port. Aside from that scenario, that traffic should never appear like you're seeing it.



  • Thanks for the reply.

    The most weird thing to happen with this issue is that suddenly a few days ago - the behaviour just stopped!

    There was no switch reconfiguration done and nothing changed on the firewall. The only single 'incident' that linked dot around the same time was a J2EE application restart on the server in question, around about the time when this weirdness stopped.

    I will keep monitoring and have enabled sys log on the device to see if I can catch any events if/when the issue returns….

    Cheeers,

    JD


Locked