Overload server BGP - Please help!
-
Hi! Please help me. There is a server with 4 BGP session 2 - UA-ix, 2-World. + NAT, VLAN and more. At one moment, overload on the CPU core, it lasts 4 hours. Traffic - as usual, the only - it was seen sharply increasing the number of packets on one of the sessions of the World. For example it was 20K/s to 80K/s, at which point the load to 5.5 and more (In normal operation, even under the load of the evening is usually not more than 0.9). The dump observed abnormal number of TCP Out-Of-Order and TCP Retransmission. When deactivate problem vlan with this session - server work is good - when picked up - all over again.
Right now no one session of the world with a load of 880 Mbit / s total traffic - just load 0.68.ast pid: 30796; load averages: 0.75, 0.73, 0.74 up 1+03:28:52 20:10:42
113 processes: 8 running, 84 sleeping, 21 waitingMem: 248M Active, 312M Inact, 672M Wired, 1116K Cache, 796M Buf, 6375M Free
Swap: 16G Total, 16G FreePID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
12 root -68 - 0K 384K WAIT 0 371:51 41.80% {irq256: igb0:que}
12 root -68 - 0K 384K CPU1 1 345:27 38.77% {irq257: igb0:que}
12 root -68 - 0K 384K WAIT 3 352:09 35.99% {irq259: igb0:que}
11 root 171 ki31 0K 64K RUN 2 949:35 35.06% {idle: cpu2}
11 root 171 ki31 0K 64K RUN 1 973:51 33.98% {idle: cpu1}
11 root 171 ki31 0K 64K RUN 0 954:06 33.59% {idle: cpu0}
12 root -68 - 0K 384K WAIT 2 368:28 33.50% {irq258: igb0:que}
12 root -64 - 0K 384K CPU3 3 582:11 27.29% {irq19: atapci0+}
12 root -68 - 0K 384K WAIT 3 194:05 22.56% {irq264: igb1:que}
12 root -68 - 0K 384K WAIT 0 210:41 20.90% {irq261: igb1:que}
12 root -68 - 0K 384K WAIT 2 203:53 20.56% {irq263: igb1:que}
12 root -68 - 0K 384K RUN 1 193:47 18.80% {irq262: igb1:que}
11 root 171 ki31 0K 64K RUN 3 439:59 13.96% {idle: cpu3}
16205 root 76 0 108M 30892K piperd 3 1:14 3.86% php
12102 root 76 0 106M 30300K accept 2 1:15 2.69% php
0 root -68 0 0K 240K - 3 46:31 1.56% {igb0 que}
0 root -68 0 0K 240K - 2 50:19 1.46% {igb0 que}
0 root -68 0 0K 240K - 2 43:58 1.27% {igb0 que}With problem
ast pid: 63626; load averages: 5.79, 3.20, 2.04 up 0+23:06:03 21:59:45
113 processes: 16 running, 79 sleeping, 18 waitingMem: 497M Active, 114M Inact, 959M Wired, 592M Buf, 6040M Free
Swap: 16G Total, 16G FreePID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND
0 root -68 0 0K 240K CPU0 1 106:38 68.99% {igb0 que}
0 root -68 0 0K 240K CPU3 0 76:25 57.96% {igb0 que}
0 root -68 0 0K 240K - 1 72:17 55.76% {igb0 que}
0 root -68 0 0K 240K RUN 1 63:56 48.78% {igb0 que}
0 root -68 0 0K 240K - 2 30:19 17.87% {igb1 que}
0 root -68 0 0K 240K - 1 22:39 17.77% {igb1 que}
0 root -68 0 0K 240K - 2 21:29 15.28% {igb1 que}
12 root -68 - 0K 384K WAIT 1 247:44 13.48% {irq257: igb0:que}
0 root -68 0 0K 240K CPU2 2 19:06 13.38% {igb1 que}
56079 root 118 20 258M 133M RUN 1 0:04 11.96% pfctl
12 root -68 - 0K 384K RUN 2 257:03 10.16% {irq258: igb0:que}
12 root -68 - 0K 384K WAIT 1 103:36 9.86% {irq262: igb1:que}
12 root -68 - 0K 384K RUN 2 111:38 9.47% {irq263: igb1:que}
12 root -68 - 0K 384K WAIT 3 243:36 9.18% {irq259: igb0:que}
12 root -68 - 0K 384K RUN 3 103:01 7.28% {irq264: igb1:que}
12 root -68 - 0K 384K RUN 0 108:18 6.69% {irq261: igb1:que}
12 root -64 - 0K 384K RUN 3 526:02 6.30% {irq19: atapci0+}
12 root -68 - 0K 384K WAIT 0 241:02 4.79% {irq256: igb0:que}Analysis packages with wireshark
When standards work at the session:==================================================================================================================================
Packet Lengths:
Topic / Item Count Average Min val Max val Rate (ms) Percent Burst rate Burst start
–--------------------------------------------------------------------------------------------------------------------------------
Packet Lengths 914000 1017.92 54 1514 19.1630 100% 28.3800 1.125
0-19 0 - - - 0.0000 0.00% - -
20-39 0 - - - 0.0000 0.00% - -
40-79 184937 64.52 54 79 3.8774 20.23% 5.7700 41.130
80-159 68551 112.74 80 159 1.4372 7.50% 2.2000 32.125
160-319 21184 213.47 160 319 0.4441 2.32% 0.9200 6.185
320-639 25390 464.15 320 639 0.5323 2.78% 1.0500 36.661
640-1279 27067 992.71 640 1279 0.5675 2.96% 1.0000 1.711
1280-2559 586871 1478.25 1280 1514 12.3044 64.21% 20.5300 1.125
2560-5119 0 - - - 0.0000 0.00% - -
5120 and greater 0 - - - 0.0000 0.00% - -
When poor performance at the session:
Packet Lengths:
Topic / Item Count Average Min val Max val Rate (ms) Percent Burst rate Burst startPacket Lengths 914123 197.63 42 1514 20.0752 100% 214.3700 39.910
0-19 0 - - - 0.0000 0.00% - -
20-39 0 - - - 0.0000 0.00% - -
40-79 702546 63.80 42 79 15.4287 76.85% 177.1200 39.910
80-159 118968 139.58 80 159 2.6127 13.01% 32.9200 39.925
160-319 9580 192.61 160 319 0.2104 1.05% 2.4900 39.910
320-639 4297 451.82 320 639 0.0944 0.47% 0.9200 28.500
640-1279 3440 991.35 640 1279 0.0755 0.38% 0.6500 37.525
1280-2559 75292 1488.02 1280 1514 1.6535 8.24% 14.8900 45.165
2560-5119 0 - - - 0.0000 0.00% - -
5120 and greater 0 - - - 0.0000 0.00% - -
My server:
2.0.1-RELEASE (amd64)
Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz
8 GB DDR3
Intel dual port adapter ET (IGB)Tell me what could be the problem? This trouble has 4 day makes the brain ((
-
well first of all, you are running a 2.0.3 release that from april 2013. nobody still has systems running that version to try and help you. update but read the upgrade guides first:https://doc.pfsense.org/index.php/Upgrade_Guide
it looks like the cpu usage is pretty high, even when everything is normal (probably interrupts, by the looks of it)
sometimes the things mentioned in the "nic tuning guide' help.https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards -
Unfortunately right now upgraded to the new version is very difficult, since there are no other machines to it all debug, and attempts have been - there everything worked so smoothly. The configuration must be completely rewritten, it is not suitable to new versions. As for tuning cards - tried, to no avail. The question is why so quickly load rises? Only the last 4 days. Could this be a DDoS?
-
thats possible. i'm not experienced enough to debug a packetcapture & find out if it's a DoS attack. some of the members or developers here might be able to help you out