Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    Repeated ENA TX Timeout on AWS pfSense Instances (Affecting Multiple Firewalls Randomly)

    Scheduled Pinned Locked Moved General pfSense Questions
    2 Posts 2 Posters 64 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      rattle007_beat
      last edited by

      Hello everyone,

      We’re running multiple pfSense+ firewalls (24.11-RELEASE (amd64)) on AWS EC2 (mostly m6i.xlarge and m6i.large instances), and have been observing recurring network interruptions caused by ENA driver TX timeouts — even during periods of low or moderate traffic.

      At random times (not tied to peak load), the pfSense network card drops out. After reboot, the system comes back up fine, but logs consistently show messages like:

      ena0: Found a Tx that wasn't completed on time, qid 2, index 501. 18980 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
      ena0: Found a Tx that wasn't completed on time, qid 2, index 506. 18980 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
      ena0: Found a Tx that wasn't completed on time, qid 2, index 508. 18980 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
      ena0: Found a Tx that wasn't completed on time, qid 2, index 510. 18980 msecs have passed since last cleanup. Missing Tx timeout value 5000 msecs.
      ena0: The number of lost tx completion is above the threshold (244 > 128). Reset the device
      ena0: ena_com_validate_version() [TID:100038]: ENA device version: 0.10
      ena0: ena_com_validate_version() [TID:100038]: ENA controller version: 0.0.1 implementation version 1
      ena0: Trigger reset is on
      ena0: device is going DOWN
      

      All the times our monitoring tool observed that the CPU core in question was peaked to max.

      AWS support confirmed:
      No EC2 or hypervisor-level issues (status checks OK).

      We still don't know what's causing the CPU to spike or if the ENA itself is causing the CPU to spike

      ENA Driver version:
      ena0: Elastic Network Adapter (ENA)ena v2.8.0
      ena0: ena_com_validate_version() [TID:100000]: ENA device version: 0.10
      ena0: ena_com_validate_version() [TID:100000]: ENA controller version: 0.0.1 implementation version 1
      

      Has anyone seen similar ENA TX timeout or “Found a Tx that wasn’t completed” issues on AWS-based pfSense instances?
      Any best practices for interrupt balancing / RSS tuning on AWS instances with ENA?

      1 Reply Last reply Reply Quote 0
      • M Offline
        marcosm Netgate
        last edited by

        That's expected behavior on AWS when the CPU is maxed. See https://docs.netgate.com/pfsense/en/latest/solutions/aws-vpn-appliance/instance-type-and-sizing.html

        You'll need to find the cause of the core maxing out.

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.