What is the biggest attack in GBPS you stopped

doktornotor · May 3, 2015, 6:34 AM

Come on. Conspiracy theory is self proved and self destructive by its very nature. My sole point was that babbling about it here is in no way helpful to the original topic. Please stop.

+1. This topic and the way of "disclosing" the issue is already shitty enough – even without this conspiracy junk.

firewalluser · May 3, 2015, 7:42 AM

@Harvy66:

@firewalluser:

@Harvy66:

System Activity or "ps" will tell you total CPU time consumed. Just remember, a quad core can consume 4 CPU seconds per second.

Not always, you need to understand how the L2 cache works, ie its shared between cores on Intel, but AMD tend to have a cache amount per core, ie AMD would be less prone to cache collisions unlike Intel cpu's.

Cache misses counts as CPU time. If it takes an extra 250 cycles because of a cache miss, well, that's counting against you. CPU time is the amount of time a process has been scheduled. What it does during that time is irrelevant from the schedulers's standpoint.

@Harvy66:

System Activity or "ps" will tell you total CPU time consumed. Just remember, a quad core can consume 4 CPU seconds per second.

Yes

Yes & No

If no cache collisions occur then yes your "4 CPU seconds per second" would be right but when a cache collision occurs then its a matter of debate whether the cpu is giving you any cpu time useful to the task being asked of it by said software because a cache collision by definition is a failure of the cpu/core depending on where the cache collision occurs ie L1,2,3 which means no cpu processing useful to the task being asked of it as it backs out and resolves the cache collision.

To then make it a little more complicated or simpler depending on perspective, if the cache collision occurs on cache shared across all the cores then no you dont get your 4 cpu seconds per second as the CPU backs out and resolves the cache collision which holds up one or more other cores.

If the cache collision occurs on cache available only to a single core like L1 and some L2 (L2 on some chips is shared and on others its a small % of the total L2 but unique to each core), then you could consider it in your 4 cpu seconds per second statement but then there is still the matter of whether the CPU is giving you any "useful" processing time whilst it resolves the collision. Technically the time spent/clock cycles filling the cache having a collision and then resolving the collision is time wasted but it could still show as 100% core or CPU activity depending on the cache affected. So Yes when you see CPU activity at 100%, that would be correct but its not the whole picture as its hiding the cock ups of the CPU cache and the bus waits that are occuring.

Now even if we dont have any cache collisions, on a multi core cpu, time is then further spent wasted as the individual cores spend time waiting to access ram or the disk depending on bus architecture.

I've got software here which I have written which can run mulithreaded and multi cored, but its also capable of running on a single thread on a single core or x threads on a single core or x threads on x cores.

Guess which one runs the fastest?

The single threaded single core version.

Why is this?

Its because there is no time wasted handshaking between threads at the OS level and cores at the HW to access the ram and disk. Disk activity shows this up the most as disk/permanent storage is an order of magnitude slower to access even SSD's when compared to ram.

In some respects even though Arm chips are RISC ie dont have as many common tasks normally carried out by OS software functions which have made it into the cpu architecture unlike say Intels AES-NI to pick a relevant example http://en.wikipedia.org/wiki/AES_instruction_set
of where some common software functions have made it into the cpu architecture, they generally but not always tend to speed up the software but all of this ultimately depends on how the software is written and to a lessor extent the language and compiler used as optimising compilers like cache can work for you and against you as well depending on the chip used to run the software.

This is why I suggested right back at the beginning to try a 1.x version of pfsense. Considering the new features and improvements to functionality made to OS's over time, not only can code be compared easily, it will be possible to workout by elimination and some observations where the problem lies. I suspect knowing how HW drivers used to be for printers especially HP printers in the Win3.1,W95,W98, NT3.5, NT4 days that the drivers have not been updated enough to keep pace with OS developments, hence why I agree with KOM and suspect its a NIC hook issue in the OS, but it will also be compounded by the multi core's seen in cpu's today which is why I also suggested for those running it virtualised like on ESXI, to restrict the core's available to 1.

Apologies if this making you suck eggs, but due to limited data ie not knowing you or your past I dont know how much you know or dont know, hence the explaination above. :)

kejianshi · May 3, 2015, 8:26 AM

Its not that I don't believe hardware is intercepted and modified for some people or that state level agencies don't hack and compromise target systems. Its just that unless there is some reason I doubt seriously they are doing it to you.

Are you someone worth targeting?

firewalluser · May 3, 2015, 9:51 AM

@kejianshi:

Its not that I don't believe hardware is intercepted and modified for some people or that state level agencies don't hack and compromise target systems. Its just that unless there is some reason I doubt seriously they are doing it to you.

Are you someone worth targeting?

I dont think so, but thats a matter of opinion even when people have a thirst for knowledge which reminds me of the saying curiosity killed the cat. The saying is like a warning to not be educated.

Here in the UK whilst there is a saying, no knowledge of the law is no defence from the law http://en.wikipedia.org/wiki/Ignorantia_juris_non_excusat considering we are born into a world where we are not even taught the laws of the land some of which go back in time before many of us were born like this http://www.channel4.com/news/1946-agreement-nsa-read-your-email-prism-data and the state acts in a duplicious secretative manner like this http://www.channel4.com/news/nsa-edward-snowden-america-britain-tony-blair how can the state be trusted on so many matters to act in any of our [edit - or all of our] best interests?

I'll come back and add more but got to sort something out.

Edit.

In light of the previous comments about thread drift and the fact the your question "Are you someone worth targeting?" has many parallels with religion, maths, biology, physics, quantum physics, philosophy, law both UK, international and foreign country laws, perhaps best summed up as the meaning of life, it would perhaps best be continued in off topic?

Supermule · May 3, 2015, 11:34 AM

Thank you :D

SO I will test the VM with only 1 core available and see how it fares.

1st picture with 1 CPU idle.
2nd is under D0S.

Some notes to this. With only one, it did A LOT better lasting 35 seconds before it lost connection to the outside world compared to 5 seconds using 8 CORES.

It did crash the Webgui as well and lost all contact to the system activity page showing no connection in the browser.

Youtube Video

Supermule · May 3, 2015, 11:57 AM

MBUF change to "1.000.000"

Youtube Video

Testing of 2 CORES on the way.

Youtube Video

Fares a lot better than 1 core

Supermule · May 3, 2015, 12:11 PM

A little breakthrough in regards to responsiveness!

http://youtu.be/bzFHBOshmlY

Changed the KERN.IPC.NMBUF setting to "65536". Dont know what 10.1 has as std. setting but it made the damn thing much more responsive.

Going 4 CORE testing….

Supermule · May 3, 2015, 12:22 PM

4 Cores were not the improvement I had been hoping for.

Youtube Video

It actually did worse then the 2 core test.

Upping to 8 cores.

Supermule · May 3, 2015, 12:38 PM

8 cores

http://youtu.be/-xTtzLEQx08

Not as good as hoped but not running 100% CPU like all the others. It seems that the response on the WAN graph are related to the PING on WAN.

It seems that the 2 CORE setup is the one that performs best in beginning until around 35 seconds into the attack. Then crash. 4 and 8 cores keep the GUI online.

doktornotor · May 3, 2015, 12:50 PM

@Supermule:

Testing of 2 CORES on the way.
Fares a lot better than 1 core

@Supermule:

It seems that the 2 CORE setup is the one that performs best in beginning until around 35 seconds into the attack. Then crash. 4 and 8 cores keep the GUI online.

Hmmm, so more cores provide more CPU performance to use for other purposes beyond handling the packets filtering (like, running the webserver) . Amazing discovery.

Supermule · May 3, 2015, 1:02 PM

Thanks man.

I really dig your positive attitude.

Should more cores not equal better performanve instaed than JUST keeping the webserver online?

The SYN script shouldnt even TOUCH the GUI and make it unresponsive….

What about this on a ALIX board or whatever low performanve ATOM?

Go get laid and come back with a more positive attitude. ;)

On another note, then pls. tell me HOW you would like me to test the systems?

WHAT do you recommend doing to get to the bottom of this other than handing over the script causing it?

Pls. use a bullet list to point out the obvious....

doktornotor · May 3, 2015, 1:03 PM

Positive attitude to what? More junky YT videos? Sigh. The last 3 pages are filled with clueless guesses, YT junk, OT noise and spiced with a bit of conspiracy idiocy, so pardon me for not following this amazingly "useful" thread in detail. Did someone here at least provide the traffic captures to the guys who know what they are doing?

@Supermule:

What about this on a ALIX board or whatever low performanve ATOM?

Why the fsck would I or anyone else waste my time with testing a DoS on Alix? Yeah it does not handle it. SIGDOUBLEDUH!

Supermule · May 3, 2015, 1:05 PM

Yes….I did a packetcapture free to DL for everyone.

Nullity · May 3, 2015, 1:15 PM

doc may be a bit blunt but you cannot be surprised by the attitudes in this thread.

You (Supermule) joined the thread and instantly declared pfSense a sub-par OS, and shared no supporting facts or theories. That was pure trolling. Now, I see it was perhaps inadvertent, but damn did you make a bad first impression.

My ignorant and honest opinion is that a good admin does not constantly focus on the ways his tools fail him, he figures out how to achieve his goal through other ways. Er… You seem hell-bent on proving pfSense sucks, how about employing some positive attitude and figure out the ways it does not suck.

Supermule · May 3, 2015, 1:20 PM

I have spent the last 2-3 mths together with lowprofile to search for something that can improve it.

We have sent numerous mails to the dev's and not much response.

We wanted to have the dev's setup a test rig so they could see for themselves how it fares and work together somehow on creating a solution for this or maybe point out what specific issues the base OS has handling the packets.

Not much has come back….if nothing at all.

Lowprofile is looking at other products to handle his scenario since he is pretty dissappointed in the whole "package" and especilly in the lack of response on a matter this important.

firewalluser · May 3, 2015, 1:29 PM

@Supermule:

Should more cores not equal better performanve instaed than JUST keeping the webserver online?

The OS is multi tasking and as such different services/programs/daemons will always be running so if you had just the core OS plus 7 services/programs/daemons running a single thread each, they would be distributed across all 8 cores and thus load shared within the constraints of the CPU hw.

You can think of a thread in a program as an instance of part of the main program, ie you might load a program with a menu that opens other windows. The menu may let you open the same menu option and child window multiple times. In this instance you will probably have a multithreaded app/program running which means each new instance of the child window will likely to be running on its own thread and the OS will distribute those threads across the available cores as well. Its a type of recursion in some respects.

In a modern OS theres lots running in the background and they will have different requirements like what priority they should run at, some will take up the time slice of a cpu more frequently than others due to the nature of the program. You can see this in windows in the task manager by right mouse clicking a running process and seeing the options for Set Priority in the popup menu. However dont go changing the priority of running programs & services as it can hog the CPU or make it unresponsive, all in all making the system unstable.

The thing to bear in mind with computers they are nothing more than a simple clockwork logic machine with some registers/buffers/disks/memory/place holders of sorts to handle moving data which is really just binary around. Over time they have shrunk in size, got faster and have had more software functions moved to various components like the CPU itself or some functions moved onto graphic cards, nics or disk controllers. Once you overcome the awesomeness of them, they are not really anything special imo. :)

With regard to your discovery this looks relevent now.

http://serverfault.com/questions/335461/pfsense-mbuf-full-what-to-do

And for more info on mbufs this is also relevant.
https://doc.pfsense.org/index.php/What_are_mbufs

So it looks like a buffer overflow of sorts which is just another aspect of managing data within an OS or program. In most walks of life like IT, Law or Medicine to name but a few, once you have overcome the terminology it becomes simpler. For example, in Law & Medicine Latin is common, in IT we have our jargon/terminology like Bits, Bytes, Ram, Firewall, & different OS's use different names to describe the same thing, eg in Windows you have services, in Linux you have daemons, even within different programming languages you will see the same, similar or completely different words used to describe the same action or outcome, plus in some languages you can also harness recursion like in C++ you have templates, but thats not to say you cant harness recursion in databases as well.

The use of jargon is designed to protect the knowledge we amass which can help to maintain domains/fiefdoms/income.

Overcome or learn the jargon/terminology and life becomes alot simpler. ;)

Nullity · May 3, 2015, 1:32 PM

@Supermule:

I have spent the last 2-3 mths together with lowprofile to search for something that can improve it.

We have sent numerous mails to the dev's and not much response.

We wanted to have the dev's setup a test rig so they could see for themselves how it fares and work together somehow on creating a solution for this or maybe point out what specific issues the base OS has handling the packets.

Not much has come back….if nothing at all.

Lowprofile is looking at other products to handle his scenario since he is pretty dissappointed in the whole "package" and especilly in the lack of response on a matter this important.

Right, you seem like a good guy, just like most of us are.

What is stopping us from working together? I shared why I prematurely thought you were a egotistical troll… perhaps some others share that perspective?

Or maybe we are all assholes. :)

firewalluser · May 3, 2015, 1:37 PM

@Nullity:

What is stopping us from working together? I shared why I prematurely thought you were a egotistical troll… perhaps some others share that perspective?
Or maybe we are all assholes. :)

We are all chemically motivated and biased by the data we have learnt over the long and short term, throw in the absence of body language for this medium http://en.wikipedia.org/wiki/Body_language and we will fill the body language void with our own current emotions sometimes known as projecting (which can soimetimes be illuminating based on what is written above) and thus we can arrive at the wrong conclusions about someone. Emoticons/emojis sometime help but not always as some prefer to not use them as they can still be interpretted incorrectly.

Nullity · May 3, 2015, 1:43 PM

Indeed. I personally hate emoticons too, but I have personally seen how a negative, or lack of postive, focus can send a whole thread into a negative, hateful tone of adversarial confrontations instead of people realizing they actually all have a common goal to solve the friggen problem and learn something.

firewalluser · May 3, 2015, 1:55 PM

That would be the butterfly effect to use a mathematical reference, or the emotions fear and anger in a biological sense which is driven by excessive dopamine levels derived from a variety of inputs namely music, caffeine, alcohol & drugs. Dopamine gets broken down into the stress hormones (andrenaline and epherine aka speed the amphetamine), they can be cleared within 3-4hrs in smokers but can take over twice as long in non smokers, but they do help increase spatial intelligence and I'm digressing.

Edit, you could also add in some Asch conformity & Milgrams obedience to authority from a psychological perspective as well as things are more complicated in general when dealing with biological lifeforms compared to Artificial Intelligences.

Supermule · May 3, 2015, 2:00 PM

Thanks.

Allready changed that in system -> tunables and it made quite a difference on the low core tests.

doktornotor · May 3, 2015, 2:04 PM

@Nullity:

Indeed. I personally hate emoticons too, but I have personally seen how a negative, or lack of postive, focus can send a whole thread into a negative, hateful tone of adversarial confrontations instead of people realizing they actually all have a common goal to solve the friggen problem and learn something.

Because neither screaming "oh noes, it suxxx, we're all doomed, use Windows Firewall instead", nor this YT testing is a way how you handle a perceived security issue.

https://www.freebsd.org/security/reporting.html

@firewalluser:

That would be the butterfly effect… or the emotions fear and anger in a biological sense which is driven by excessive dopamine levels derived from a variety of inputs ...

Supermule · May 3, 2015, 2:07 PM

See.

I havent stated that people should use Windows Firewall instead.

I have stated that its not affected.

Not the same really…..

firewalluser · May 3, 2015, 2:09 PM

Could be useful.
https://wiki.freebsd.org/NetworkPerformanceTuning

firewalluser · May 3, 2015, 2:10 PM

@doktornotor:

@Nullity:

Indeed. I personally hate emoticons too, but I have personally seen how a negative, or lack of postive, focus can send a whole thread into a negative, hateful tone of adversarial confrontations instead of people realizing they actually all have a common goal to solve the friggen problem and learn something.

Because neither screaming "oh noes, it suxxx, we're all doomed, use Windows Firewall instead", nor this YT testing is a way how you handle a perceived security issue.

https://www.freebsd.org/security/reporting.html

@firewalluser:

That would be the butterfly effect… or the emotions fear and anger in a biological sense which is driven by excessive dopamine levels derived from a variety of inputs ...

Where do we draw the line at being educational?

Supermule · May 3, 2015, 2:16 PM

Allready implemented under system -> tunables for what I use and the network MTU.

@firewalluser:

Could be useful.
https://wiki.freebsd.org/NetworkPerformanceTuning

Supermule · May 3, 2015, 2:20 PM

By the way. Tested 1.2.3 and i got blown out of the water instantly using 4GB ram and 4CPU's.

So the new OS' is deffo an improvement.

firewalluser · May 3, 2015, 2:23 PM

Thanks for letting us know, its been educational. ;D

Supermule · May 3, 2015, 3:13 PM

Opnsense 4core/8GB test.

http://youtu.be/dH4ih76b_Ik

Harvy66 · May 3, 2015, 3:32 PM

@Supermule:

8 cores

http://youtu.be/-xTtzLEQx08

Not as good as hoped but not running 100% CPU like all the others. It seems that the response on the WAN graph are related to the PING on WAN.

It seems that the 2 CORE setup is the one that performs best in beginning until around 35 seconds into the attack. Then crash. 4 and 8 cores keep the GUI online.

You may be at 100% cpu, but according to the dashboard, you're running at 311mhz even when at 100%.

Harvy66 · May 3, 2015, 3:40 PM

@firewalluser:

@Harvy66:

@firewalluser:

@Harvy66:

System Activity or "ps" will tell you total CPU time consumed. Just remember, a quad core can consume 4 CPU seconds per second.

Not always, you need to understand how the L2 cache works, ie its shared between cores on Intel, but AMD tend to have a cache amount per core, ie AMD would be less prone to cache collisions unlike Intel cpu's.

Cache misses counts as CPU time. If it takes an extra 250 cycles because of a cache miss, well, that's counting against you. CPU time is the amount of time a process has been scheduled. What it does during that time is irrelevant from the schedulers's standpoint.

@Harvy66:

System Activity or "ps" will tell you total CPU time consumed. Just remember, a quad core can consume 4 CPU seconds per second.

Yes

Yes & No

If no cache collisions occur then yes your "4 CPU seconds per second" would be right but when a cache collision occurs then its a matter of debate whether the cpu is giving you any cpu time useful to the task being asked of it by said software because a cache collision by definition is a failure of the cpu/core depending on where the cache collision occurs ie L1,2,3 which means no cpu processing useful to the task being asked of it as it backs out and resolves the cache collision.

To then make it a little more complicated or simpler depending on perspective, if the cache collision occurs on cache shared across all the cores then no you dont get your 4 cpu seconds per second as the CPU backs out and resolves the cache collision which holds up one or more other cores.

If the cache collision occurs on cache available only to a single core like L1 and some L2 (L2 on some chips is shared and on others its a small % of the total L2 but unique to each core), then you could consider it in your 4 cpu seconds per second statement but then there is still the matter of whether the CPU is giving you any "useful" processing time whilst it resolves the collision. Technically the time spent/clock cycles filling the cache having a collision and then resolving the collision is time wasted but it could still show as 100% core or CPU activity depending on the cache affected. So Yes when you see CPU activity at 100%, that would be correct but its not the whole picture as its hiding the cock ups of the CPU cache and the bus waits that are occuring.

Now even if we dont have any cache collisions, on a multi core cpu, time is then further spent wasted as the individual cores spend time waiting to access ram or the disk depending on bus architecture.

I've got software here which I have written which can run mulithreaded and multi cored, but its also capable of running on a single thread on a single core or x threads on a single core or x threads on x cores.

Guess which one runs the fastest?

The single threaded single core version.

Why is this?

Its because there is no time wasted handshaking between threads at the OS level and cores at the HW to access the ram and disk. Disk activity shows this up the most as disk/permanent storage is an order of magnitude slower to access even SSD's when compared to ram.

In some respects even though Arm chips are RISC ie dont have as many common tasks normally carried out by OS software functions which have made it into the cpu architecture unlike say Intels AES-NI to pick a relevant example http://en.wikipedia.org/wiki/AES_instruction_set
of where some common software functions have made it into the cpu architecture, they generally but not always tend to speed up the software but all of this ultimately depends on how the software is written and to a lessor extent the language and compiler used as optimising compilers like cache can work for you and against you as well depending on the chip used to run the software.

This is why I suggested right back at the beginning to try a 1.x version of pfsense. Considering the new features and improvements to functionality made to OS's over time, not only can code be compared easily, it will be possible to workout by elimination and some observations where the problem lies. I suspect knowing how HW drivers used to be for printers especially HP printers in the Win3.1,W95,W98, NT3.5, NT4 days that the drivers have not been updated enough to keep pace with OS developments, hence why I agree with KOM and suspect its a NIC hook issue in the OS, but it will also be compounded by the multi core's seen in cpu's today which is why I also suggested for those running it virtualised like on ESXI, to restrict the core's available to 1.

Apologies if this making you suck eggs, but due to limited data ie not knowing you or your past I dont know how much you know or dont know, hence the explaination above. :)

You are correct, but I was not incorrect either. All I was saying was that you can see CPU time spent. All CPU time is the amount of time spent in a given context. yes, AMD's new arch has a much greater chance of cache line collisions, especially given the size of their L2 caches and the limited n-way associativity, but that reduces the amount of work done per unit of time, not the amount of time spent. I do agree that AMD can take more time to get the same amount of work done, but "cpu time" is still wall-clock time spent in a context.

Nice to know other people share my affection for understanding computers :-)

Harvy66 · May 3, 2015, 4:09 PM

What is "KERN.IPC.NMBUF"? I can't find anything about it?

Supermule · May 3, 2015, 4:30 PM

Kernel buffers.

https://www.google.dk/search?q=KERN.IPC.NMBUF&ie=UTF-8

Supermule · May 3, 2015, 4:31 PM

It goes down so fast you dont see the utilization…

@Harvy66:

@Supermule:

8 cores

http://youtu.be/-xTtzLEQx08

Not as good as hoped but not running 100% CPU like all the others. It seems that the response on the WAN graph are related to the PING on WAN.

It seems that the 2 CORE setup is the one that performs best in beginning until around 35 seconds into the attack. Then crash. 4 and 8 cores keep the GUI online.

You may be at 100% cpu, but according to the dashboard, you're running at 311mhz even when at 100%.

Supermule · May 3, 2015, 5:23 PM

4mbps attack and 40% packetloss.

Netstat -L doesnt see any exhaustion of queues.

Anybody know how to change the backlog to 1024??

Just to see if it matters.

Supermule · May 3, 2015, 5:48 PM

Here is the output of vmstat -z

Anybody find something unusual in this?

![pfsense.22tv - Diagnostics_ Execute command_Page_1.png](/public/imported_attachments/1/pfsense.22tv - Diagnostics_ Execute command_Page_1.png)
![pfsense.22tv - Diagnostics_ Execute command_Page_1.png_thumb](/public/imported_attachments/1/pfsense.22tv - Diagnostics_ Execute command_Page_1.png_thumb)
![pfsense.22tv - Diagnostics_ Execute command_Page_2.png](/public/imported_attachments/1/pfsense.22tv - Diagnostics_ Execute command_Page_2.png)
![pfsense.22tv - Diagnostics_ Execute command_Page_2.png_thumb](/public/imported_attachments/1/pfsense.22tv - Diagnostics_ Execute command_Page_2.png_thumb)

dennypage · May 3, 2015, 9:36 PM

Guys, you need to be much more rigorous in collecting data. You are trying to diagnose a network packet processing problem. Using the web interface to execute shell commands will not produce a consistent and reliable result. Not only is the web interface heavy weight, it is lower priority than kernel packet processing. And most importantly, your diagnostic data collection is dependent upon the behavior of the system you are trying to diagnose.

Let's assume you don't want to build a custom kernel…

You need to shed as many variables as possible and get as close to real data as you can. Turn Snort off for crying out loud. And anything else optional that might interfere with metrics. If you want to use command line tools, execute them outside of network processing. This means using the console, not ssh. Create a shell script that collects information on a periodic basis. Elevate the priority of the script to ensure timely execution. And save the output for every run.

Here is a sample script:

#!/bin/sh
ps -axuwww
While true
do
/bin/date
/usr/bin/netstat -m
sleep 2
done

Here is a sample execution:

/usr/bin/nice -n -19 myscript

Supermule · May 4, 2015, 3:34 AM

Done it at the console at no useful output was generated for people to see.

I stopped Snort running and here is the output from the DoS.

First 2 is idle and next 2 is under DoS.

Supermule · May 4, 2015, 7:05 AM

Done some more testing this morning.

2-3mbps is all it takes. Has downscaled the Mbufs and state max a little.

http://youtu.be/NPtDnM8ixXs

Dennypage. Thanks for the info. Want to help diagnose then contact me on PM.

tim.mcmanus · May 6, 2015, 5:09 PM

This link is probably important to note the differences between versions: https://doc.pfsense.org/index.php/Does_pfSense_support_SMP_(multi-processor_and/or_core)_systems

2.1 was single-threaded and 2.2 is multi-threaded. That's why you're seeing an impact/performance difference between the two; it's not hard to extrapolate how and why.

I think what you're trying to determine, and this is based on my review of the thread, is which part of pf is choking. In order to determine this you need to debug each component in the chain from the NIC to the CPU and back out as well as the code. I'm not entirely sure you know programmatically where and which networking event triggers the issue inside pf, only that a large volume of data of a specific type starts the event.

You've moved beyond evaluating pf from a networking perspective and more into evaluating the codebase. This requires a different kind of data collection and troubleshooting. It also take an excruciatingly long time to identify and resolve these kinds of issues. It's a lot more than just tweaking a setting in some cases.

Best of luck in determining the root cause and solution to this issue.