add new Congestion-Control Algorithms

yon 0

@jimp
the current pf version cannot load cdg modules successfully.
So what I want to know is whether all modules supported by freebsd can be loaded successfully?

ldload cc_cdg
kldload: an error occurred while loading module cc_cdg. Please check dmesg(8) for more details

yon 0

@mrancier

i am copy cc_cdg.ko from freebsd 12.0 version to /boot/kernel,chmod 0555. use ldload cc_cdg can't load this.

mrancier

Cc_cdg.ko depends on h_ertt, so both modules need to be there, otherwise cdg will fail kload.

yon 0

@mrancier said in add new Congestion-Control Algorithms:

Cc_cdg.ko depends on h_ertt, so both modules need to be there, otherwise cdg will fail kload.

yeah, this is work. my local ISP Packet loss due to Qos,Which is suitable for packet loss and long distance network environment?

yon 0

@mrancier said in add new Congestion-Control Algorithms:

cdg has a handful of other tunables that might be handy. I am using cubic myself.

what optimization parameters and tunables can be used?

mrancier

You can take a look at net.inet.tcp.cc.cdg.smoothing_factor as well as net.inet.tco.cdg.alpha_inc for window resizing. As far as the advantage over newreno, cdg will act exactly like newreno until a congestion event of the specific type triggers it. Its design seems geared toward the situation you describe, however only way gauge its efficacy is to put it in place and observe.

yon 0

@mrancier said in add new Congestion-Control Algorithms:

You can take a look at net.inet.tcp.cc.cdg.smoothing_factor as well as net.inet.tco.cdg.alpha_inc for window resizing. As far as the advantage over newreno, cdg will act exactly like newreno until a congestion event of the specific type triggers it. Its design seems geared toward the situation you describe, however only way gauge its efficacy is to put it in place and observe.

i try setup this. But how do I check if it is set properly?
net.inet.tcp.cc.cdg.alpha_inc =1
net.inet.tcp.cc.cdg.smoothing_factor=10

mrancier

https://www.freebsd.org/cgi/man.cgi?query=cc_cdg

Information is very sparse when it comes testing the performance of congestion control algorithms. All you can really do is compare behavior before and after activating the desired method. Check that the sysctl are applied after reboot. If you don't see them, pfsense is like ignoring them and the need to be in the loader.conf.local or even in the rc.conf.local (netgate uses a wacky conf structure).

yon 0

@mrancier said in add new Congestion-Control Algorithms:

https://www.freebsd.org/cgi/man.cgi?query=cc_cdg

Information is very sparse when it comes testing the performance of congestion control algorithms. All you can really do is compare behavior before and after activating the desired method. Check that the sysctl are applied after reboot. If you don't see them, pfsense is like ignoring them and the need to be in the loader.conf.local or even in the rc.conf.local (netgate uses a wacky conf structure).

this two options add sysctl can show. I haven't felt any improvement yet.

sysctl net.inet.tcp.cc.cdg.smoothing_factor
net.inet.tcp.cc.cdg.smoothing_factor: 10
sysctl net.inet.tcp.cc.cdg.alpha_inc
net.inet.tcp.cc.cdg.alpha_inc: 1

and i find the option, it add sysctl not work, maybe add to loader.conf.local?

http://caia.swin.edu.au/urp/newtcp/tools/cc_cdg-0.1.readme
net.inet.tcp.cc.cdg.wif

This allows cdg to more aggressively increase its window to better
utilise high BDP links. It is a additive increase mechanism where the
amount of additive increase can be increased ofter wif RTTs with no
congestion. (w = w + i, where i is incremented every wif RTTs
or i=1 if wif = 0) Default is 0 (Reno like behaviour).

mrancier

I would put it in loader.conf.local and see if pfsense will load it at boot then. If you see no change, then, perhaps, the reason for your dropped packets is not congestion. Could be buffer overrun. Have you tweaked your TCP buffers and receive/send spaces ?

jimp

If you are testing from a local client on LAN, that setting will make no difference on the firewall. The only thing affected by that will be TCP connections to/from the firewall itself. It will not affect traffic passing through the firewall.

You probably want to enable something like that on your clients and workstations, not the firewall.

yon 0

@mrancier said in add new Congestion-Control Algorithms:

I would put it in loader.conf.local and see if pfsense will load it at boot then. If you see no change, then, perhaps, the reason for your dropped packets is not congestion. Could be buffer overrun. Have you tweaked your TCP buffers and receive/send spaces ?

my wan route in show packet loss.

ip:
163146 total packets received
114504 packets for this host
38010 packets forwarded
450 packets not forwardable
113767 packets sent from this host
393 output packets discarded due to no route

netstat -in
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
igb0 1500 <Link#1> a0:36:9f:83:90:40 636 0 0 10217 0 0
igb1 1500 <Link#2> a0:36:9f:83:90:40 25993 0 0 20120 0 0
igb2 1508 <Link#3> 24:fd:52:3f:40:56 39502 0 0 35663 0 0
igb2 - fe80::%igb2/6 fe80::26fd:52ff:f 0 - - 2 - -
igb3 1508 <Link#4> 44:6e:e5:1d:b1:a7 39065 0 0 38053 0 0
igb3 - fe80::%igb3/6 fe80::466e:e5ff:f 0 - - 1 - -
re0* 1500 <Link#5> bc:5f:f4:7b:28:6d 0 0 0 0 0 0
enc0* 1536 <Link#6> enc0 0 0 0 0 0 0
lo0 16384 <Link#7> lo0 21 0 0 21 0 0
lo0 - ::1/128 ::1 21 - - 21 - -
lo0 - fe80::%lo0/64 fe80::1%lo0 0 - - 0 - -
lo0 - 127.0.0.0/8 127.0.0.1 0 - - 0 - -
pflog 33160 <Link#8> pflog0 0 0 0 1086 0 0
pfsyn 1500 <Link#9> pfsync0 0 0 0 0 0 0
lagg0 1500 <Link#10> a0:36:9f:83:90:40 26706 0 0 30337 19 0
lagg0 - fe80::%lagg0/ fe80::a236:9fff:f 12 - - 438 - -
lagg0 - 185.230.191.0 185.230.191.1 1714 - - 4988 - -
lagg0 - 2602:fed5:702 2602:fed5:7021::f 65 - - 90 - -
lagg0 - 185.230.191.0 185.230.191.2 70 - - 0 - -
lagg0 1500 <Link#11> a0:36:9f:83:90:40 0 0 0 376 8 0
lagg0 - fe80::%lagg0. fe80::a236:9fff:f 0 - - 2 - -
lagg0 - 192.168.101.0 192.168.101.254 28 - - 0 - -
pppoe 1492 <Link#12> pppoe0 39439 0 0 35601 0 0

mrancier

It appears that most errors are being logged on your LAGG port. I am going to assume that the same errors are not showing on the stats for the LAGG members ? If that is the case, only time I've seen this behavior is when you have ports LAGGed but not all of them have actual active links, so say, you have igb0 thru igb3 in a LAGG conf, but igb1 is actually disconnected (Physically). Other than that, I am unsure what the issue might be. Sorry.

yon 0

@mrancier said in add new Congestion-Control Algorithms:

It appears that most errors are being logged on your LAGG port. I am going to assume that the same errors are not showing on the stats for the LAGG members ? If that is the case, only time I've seen this behavior is when you have ports LAGGed but not all of them have actual active links, so say, you have igb0 thru igb3 in a LAGG conf, but igb1 is actually disconnected (Physically). Other than that, I am unsure what the issue might be. Sorry.

thanks, What should I check or how to determine the problem? i am using two intel I350-T2 NIC. one nic two interface for pppoe wan, and one for lagged to LAN switch.

igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether a0:36:9f:83:90:40
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
igb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether a0:36:9f:83:90:40
hwaddr a0:36:9f:83:90:41
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1508
options=e520bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 24:fd:52:3f:40:56
hwaddr a0:36:9f:83:8b:5c
inet6 fe80::26fd:52ff:fe3f:4656%igb2 prefixlen 64 scopeid 0x3
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
igb3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1508
options=e500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether 44:6e:e5:1d:b0:a7
hwaddr a0:36:9f:83:8b:5d
inet6 fe80::466e:e5ff:fe1d:b4a7%igb3 prefixlen 64 scopeid 0x4
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
ether bc:5f:f4:7b:29:6d
media: Ethernet autoselect (none)
status: no carrier
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
enc0: flags=0<> metric 0 mtu 1536
groups: enc
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pfsync0: flags=0<> metric 0 mtu 1500
syncpeer: 224.0.0.240 maxupd: 128 defer: on
syncok: 1
groups: pfsync
pflog0: flags=100<PROMISC> metric 0 mtu 33160
groups: pflog
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e500bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
ether a0:36:9f:83:90:40
inet6 fe80::a236:9fff:fe83:9940%lagg0 prefixlen 64 scopeid 0xa
inet6 2602:fed5:7021::face prefixlen 48
inet 185.230.191.1 netmask 0xffffffe0 broadcast 185.230.191.31
inet 185.230.191.2 netmask 0xffffff00 broadcast 185.230.191.255
laggproto lacp lagghash l2,l3,l4
laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
groups: lagg
media: Ethernet autoselect
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

LADVD Devices
Capability Codes:
r - Repeater, B - Bridge, H - Host, R - Router, S - Switch,
W - WLAN Access Point, C - DOCSIS Device, T - Telephone, O - Other

Device ID Local Intf Proto Hold-time Capability Port ID
H3C igb1 LLDP 119 BR Gi1/0/5 In
H3C igb0 LLDP 119 BR Gi1/0/7 In
LADVD Detailed decode
Chassis id: 04:d7:a5:db:db:16
Port id: 04:d7:a5:db:db:1c
Time remaining: 119 seconds
Port Description: GigabitEthernet1/0/5 Interface
System Name: H3C
System Description:
H3C Switch S1850-10P Software Version 5.20.99, Release 1102
Copyright(c)2004-2017 New H3C Technologies Co., Ltd. All rights reserved.
System Capabilities: BR
Enabled Capabilities: BR
Management Address IPv4: 192.168.0.233
Port VLAN ID: 1

Chassis id: 04:d7:a5:db:db:16
Port id: 04:d7:a5:db:db:1e
Time remaining: 119 seconds
Port Description: GigabitEthernet1/0/7 Interface
System Name: H3C
System Description:
H3C Switch S1850-10P Software Version 5.20.99, Release 1102
Copyright(c)2004-2017 New H3C Technologies Co., Ltd. All rights reserved.
System Capabilities: BR
Enabled Capabilities: BR
Management Address IPv4: 192.168.0.233
Port VLAN ID: 1

yon 0

i try deleted lagg, change to normal lan interface.

use netstat -in show all normal.

but netstat -s still show 2929 packets not forwardable. so the route in still show packet loss.
ip:
110688 total packets received
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with ip length > max ip packet size
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets reassembled ok
62783 packets for this host
0 packets for unknown/unsupported protocol
35269 packets forwarded (0 packets fast forwarded)
2929 packets not forwardable
0 packets received for unknown multicast group
0 redirects sent
67139 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
2742 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
0 tunneling packets that can't find gif
0 datagrams with bad address in header

tcp:
29618 packets sent
1843 data packets (2075087 bytes)
1 data packet (126 bytes) retransmitted
2 data packets unnecessarily retransmitted
0 resends initiated by MTU discovery
27749 ack-only packets (0 delayed)
0 URG only packets
0 window probe packets
10 window update packets
16 control packets
28200 packets received
433 acks (for 2074414 bytes)
49 duplicate acks
0 acks for unsent data
22516 packets (27777099 bytes) received in-sequence
11 completely duplicate packets (8678 bytes)
0 old duplicate packets
0 packets with some dup. data (0 bytes duped)
5214 out-of-order packets (6964840 bytes)
0 packets (0 bytes) of data after window
0 window probes
8 window update packets
2 packets received after close
0 discarded for bad checksums
0 discarded for bad header offset fields
0 discarded because packet too short
0 discarded due to memory problems
10 connection requests
3 connection accepts
1 bad connection attempt
0 listen queue overflows
0 ignored RSTs in the windows
12 connections established (including accepts)
0 times used RTT from hostcache
0 times used RTT variance from hostcache
0 times used slow-start threshold from hostcache
46 connections closed (including 3 drops)
0 connections updated cached RTT on close
0 connections updated cached RTT variance on close
0 connections updated cached ssthresh on close
0 embryonic connections dropped
372 segments updated rtt (of 379 attempts)
4 retransmit timeouts
0 connections dropped by rexmit timeout
0 persist timeouts
0 connections dropped by persist timeout
0 Connections (fin_wait_2) dropped because of timeout
29 keepalive timeouts
29 keepalive probes sent
0 connections dropped by keepalive
350 correct ACK header predictions
22475 correct data packet header predictions
3 syncache entries added
0 retransmitted
0 dupsyn
0 dropped
3 completed
0 bucket overflow
0 cache overflow
0 reset
0 stale
0 aborted
0 badack
0 unreach
0 zone failures
1 cookie sent
0 cookies received
0 hostcache entries added
0 bucket overflow
0 SACK recovery episodes
0 segment rexmits in SACK recovery episodes
0 byte rexmits in SACK recovery episodes
0 SACK options (SACK blocks) received
4748 SACK options (SACK blocks) sent
0 SACK scoreboard overflow
0 packets with ECN CE bit set
26630 packets with ECN ECT(0) bit set
0 packets with ECN ECT(1) bit set
9 successful ECN handshakes
1 time ECN reduced the congestion window
0 packets with matching signature received
0 packets with bad signature received
0 times failed to make signature due to no SA
0 times unexpected signature received
0 times no signature provided by segment
0 Path MTU discovery black hole detection activations
0 Path MTU discovery black hole detection min MSS activations
0 Path MTU discovery black hole detection failures
TCP connection count by state:
0 connections in CLOSED state
15 connections in LISTEN state
0 connections in SYN_SENT state
0 connections in SYN_RCVD state
6 connections in ESTABLISHED state
0 connections in CLOSE_WAIT state
0 connections in FIN_WAIT_1 state
0 connections in CLOSING state
0 connections in LAST_ACK state
0 connections in FIN_WAIT_2 state
0 connections in TIME_WAIT state
udp:
38109 datagrams received
0 with incomplete header
0 with bad data length field
0 with bad checksum
0 with no checksum
26 dropped due to no socket
458 broadcast/multicast datagrams undelivered
0 dropped due to full socket buffers
0 not for hashed pcb
37625 delivered
37845 datagrams output
0 times multicast source filter matched

mrancier

Typically when you see counters for packets not forwardable in netstat, it is because you are missing a route to localhost.

For reference :

https://docs.oracle.com/cd/E19253-01/816-4555/ppp.trouble-108/index.html

yon 0

@mrancier said in add new Congestion-Control Algorithms:

Typically when you see counters for packets not forwardable in netstat, it is because you are missing a route to localhost

What rules or routes do I need to add?

mrancier

@yon-0 Hard to say without a visual, but I will ask this : If the issue does happen on 2.4.4p3, there is a possibility, albeit small, that there may be a bug in the PPPoE client.

yon 0

@mrancier i am using pf2.5 newest version

yon 0

ip6:
1301746 total packets received
0 with size smaller than minimum
0 with data size < data length
0 with bad options
18 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 fragments that exceeded limit
0 packets reassembled ok
416553 packets for this host
855286 packets forwarded
2615 packets not forwardable
0 redirects sent
500503 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
1225 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can't be fragmented
0 packets that violated scope rules
325 multicast packets which we don't join
Input histogram:
hop by hop: 235
TCP: 568585
UDP: 597299
ICMP6: 135609
Mbuf statistics:
557747 one mbuf
two or more mbuf:
lo0= 2398
741601 one ext mbuf
0 two or more ext mbuf
0 packets whose headers are not contiguous
0 tunneling packets that can't find gif
0 packets discarded because of too many headers
450 failures of source address selection
source addresses on an outgoing I/F
8986 link-locals
11506 globals
source addresses on a non-outgoing I/F
2 globals
450 addresses scope=0xf
source addresses of same scope
8984 link-locals
11508 globals
source addresses of a different scope
2 link-locals
Source addresses selection rule applied:
20494 first candidate
450 same address
8290 appropriate scope
3216 outgoing interface
885 longest match