Netgate Discussion Forum
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Search
    • Register
    • Login

    [Solved] Postfix timeout caused by lost packets

    Scheduled Pinned Locked Moved General pfSense Questions
    3 Posts 2 Posters 1.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • S
      stenio
      last edited by

      Hi,

      I just discovered a lot of postfix timeouts caused apparently from some weird network error.
      The firewall has two wan interfaces, WAN1 and WAN2,  and the mail server is on a DMZ interface.
      WAN1 is used by the mail server and for DNS queries while WAN2 (which is connected to the default gateway) for all the rest.
      There is rule that forces all outgoing traffic from the DMZ to the gateway of WAN1.

      Here is an excerpt from Whireshark of what seems to create the network problem:

      No.    Time          Source                Destination          Protocol Length Info
          168 1.340890      213.205.33.215        192.168.1.2          TLSv1.2  1376  [TCP Previous segment not captured] Ignored Unknown Record

      Frame 168: 1376 bytes on wire (11008 bits), 1376 bytes captured (11008 bits)
      Ethernet II, Src: CiscoInc_a4:fa:fc (00:13:80:a4:fa:fc), Dst: Fabiatec_07:94:78 (00:04:a7:07:94:78)
      Internet Protocol Version 4, Src: 213.205.33.215, Dst: 192.168.1.2
      Transmission Control Protocol, Src Port: 38411 (38411), Dst Port: 25 (25), Seq: 137865, Ack: 2149, Len: 1322
      Secure Sockets Layer

      No.    Time          Source                Destination          Protocol Length Info
          169 1.341198      192.168.1.2          213.205.33.215        TCP      60    [TCP Dup ACK 167#1] 25 → 38411 [ACK] Seq=2149 Ack=122001 Win=63456 Len=0

      Frame 169: 60 bytes on wire (480 bits), 60 bytes captured (480 bits)
      Ethernet II, Src: Fabiatec_07:94:78 (00:04:a7:07:94:78), Dst: CiscoInc_a4:fa:fc (00:13:80:a4:fa:fc)
      Internet Protocol Version 4, Src: 192.168.1.2, Dst: 213.205.33.215
      Transmission Control Protocol, Src Port: 25 (25), Dst Port: 38411 (38411), Seq: 2149, Ack: 122001, Len: 0

      It seems that some fragments were lost and that the peers were not able to recover.
      I googled a lot and found that the problem could be related to MTU discovery. I already tried to lower the MTU and to permit ICMP traffic, but it hasn't worked.

      I've attached the full decoded tcpdump.

      What can it be?

      Thanks,
      Stenio

      Edit:

      It seems that the problem was the provider's router. After a reboot no more packets were lost.

      capture.txt

      1 Reply Last reply Reply Quote 0
      • A
        AEITS_Inc
        last edited by

        Not sure if it's related, but if you are using DKIM to sign your postfix email Cisco has a habit of corrupting the packets, and then dropping them as malformed.

        http://www.arschkrebs.de/postfix/postfix_cisco_pix_bugs.shtml

        Steve

        1 Reply Last reply Reply Quote 0
        • S
          stenio
          last edited by

          Hi Steve,

          No, I'm not using DKIM.
          The problem seems to be related to TLS and to the length of the email message: the bigger the email and more probable the network problem and hence the timeout.
          Also the "distance" between the servers seems to have an influence, probably because more hops imply more time and more chance to lose fragments.

          A lot of messages come from google's servers (209.85.128.0/17, 74.125.0.0/16).
          I tried to decrease the MTU of the server's interface from 1500 to 1362 and this had a positive effect. I'll try to lower it more.

          Thanks,
          Stenio

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          Copyright 2025 Rubicon Communications LLC (Netgate). All rights reserved.