Comments (10)
I had not noticed first that you were using maxsslrate. Pretty interesting. Maybe you're facing a race condition that prevents it from properly recovering when the limit it met. That's something reasonably easy to try to reproduce on our side by setting a lower limit. At least your "show fd" shows the listener is active (thus not disabled) in the poller. Thanks for these, we'll need a bit of time to analyse it deeper now.
from haproxy.
I checked. At first glance, it seems similar but I doubt it is related. Especially because here the listeners don't seems to be limited when the issue occurred.
from haproxy.
Very strange, I've never heard about any similar report even on the other versions you mention. Was the "show info" above produced when the problem was happening ? Or can't you connect anymore to the stats socket when the problem is happening ? Does it recover only by restarting haproxy ? If you're able to connect to the stats socket, sending a "show stat" and a "show fd" could help.
What I suspect could be related to the size of the backlog: I'm seeing a sessrate of 446 in your "show info" output, which indicates what connection rate is acceptable with SSL negotiation. Let's assume your server can deal with 2k sessions/s including SSL etc. If you receive an attack with more than that, the accept queue will fill up. It will then take 40s to process the last entry in the queue at 2k/s, and by then the client will have aborted but there's no way to know, so it costs a handshake calculation for nothing. In such a case, an approach can be to limit the backlog to a much lower value via the "backlog" keyword on the "bind" line. This way during an attack, you won't be accumulating connections that users gave up, and the recovery can be much faster. Just set that to 2-3x the max rate you can accept so that users don't needlessly wait more than 2-3s before getting an error.
from haproxy.
Very strange, I've never heard about any similar report even on the other versions you mention. Was the "show info" above produced when the problem was happening ?
Yes
Or can't you connect anymore to the stats socket when the problem is happening ?
Nope, connecting to the socket is fine
Does it recover only by restarting haproxy ?
That is correct
If you're able to connect to the stats socket, sending a "show stat" and a "show fd" could help.
Will do as soon as I can identify a host has the issue again
What I suspect could be related to the size of the backlog: I'm seeing a sessrate of 446 in your "show info" output, which indicates what connection rate is acceptable with SSL negotiation. Let's assume your server can deal with 2k sessions/s including SSL etc. If you receive an attack with more than that, the accept queue will fill up. It will then take 40s to process the last entry in the queue at 2k/s, and by then the client will have aborted but there's no way to know, so it costs a handshake calculation for nothing.
We set a max ssl rate of 3k sessions. (in fact the limits are dynamic, we set maxsslrate $(( $(nproc) * 1500 ))
and maxconn $(( $(nproc) * 40000 )
. We've benchmarked it and it works fine for us (<100% CPU/Network/memory).
In such a case, an approach can be to limit the backlog to a much lower value via the "backlog" keyword on the "bind" line. This way during an attack, you won't be accumulating connections that users gave up, and the recovery can be much faster. Just set that to 2-3x the max rate you can accept so that users don't needlessly wait more than 2-3s before getting an error.
Ok I will try tinkering with backlog
from haproxy.
Ok, so on another host with 4 cores, and therefore our dynamic maxconn set to 160000, we get this
$ ss -ltnup 'sport = :443'
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp LISTEN 0 90000 0.0.0.0:443 0.0.0.0:*
tcp LISTEN 0 90000 0.0.0.0:443 0.0.0.0:*
tcp LISTEN 90001 90000 0.0.0.0:443 0.0.0.0:*
tcp LISTEN 90001 90000 0.0.0.0:443 0.0.0.0:*
This time we hit the non dynamic limit of net.core.somaxconn = 90000
(which I realise should really be calculated as well).
$ echo show info | socat UNIX-CONNECT:/var/lib/haproxy/stats,connect-timeout=2 stdio
Name: HAProxy
Version: 2.9.6-9eafce5
Release_date: 2024/02/26
Nbthread: 4
Nbproc: 1
Process_num: 1
Pid: 2003
Uptime: 2d 5h33m21s
Uptime_sec: 192801
Memmax_MB: 0
PoolAlloc_MB: 6
PoolUsed_MB: 6
PoolFailed: 0
Ulimit-n: 2000043
Maxsock: 2000043
Maxconn: 1000000
Hard_maxconn: 1000000
CurrConns: 2208
CumConns: 8298281
CumReq: 4057092995
MaxSslConns: 0
CurrSslConns: 2207
CumSslConns: 8191837
Maxpipes: 0
PipesUsed: 0
PipesFree: 0
ConnRate: 3
ConnRateLimit: 0
MaxConnRate: 6002
SessRate: 3
SessRateLimit: 0
MaxSessRate: 6002
SslRate: 3
SslRateLimit: 6000
MaxSslRate: 6001
SslFrontendKeyRate: 4
SslFrontendMaxKeyRate: 7755
SslFrontendSessionReuse_pct: 0
SslBackendKeyRate: 0
SslBackendMaxKeyRate: 0
SslCacheLookups: 40
SslCacheMisses: 40
CompressBpsIn: 0
CompressBpsOut: 0
CompressBpsRateLim: 0
Tasks: 3644
Run_queue: 0
Idle_pct: 71
node: ip-10-2-32-176.ap-southeast-1.compute.internal
Stopping: 0
Jobs: 2221
Unstoppable Jobs: 1
Listeners: 11
ActivePeers: 0
ConnectedPeers: 0
DroppedLogs: 0
BusyPolling: 0
FailedResolutions: 0
TotalBytesOut: 8706499242217
TotalSplicedBytesOut: 0
BytesOutRate: 47007488
DebugCommandsIssued: 0
CumRecvLogs: 0
Build info: 2.9.6-9eafce5
Memmax_bytes: 0
PoolAlloc_bytes: 6503920
PoolUsed_bytes: 6503920
Start_time_sec: 1712051025
Tainted: 0
TotalWarnings: 76
MaxconnReached: 0
BootTime_ms: 156
Niced_tasks: .
$ cat /proc/net/sockstat
sockets: used 183355
TCP: inuse 28965 orphan 0 tw 392 alloc 183185 mem 76212
UDP: inuse 11 mem 0
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
$ uptime
15:17:47 up 2 days, 5:34, 1 user, load average: 1.40, 1.50, 1.67
echo show stat | socat UNIX-CONNECT:/var/lib/haproxy/stats,connect-timeout=2 stdio
# pxname,svname,qcur,qmax,scur,smax,slim,stot,bin,bout,dreq,dresp,ereq,econ,eresp,wretr,wredis,status,weight,act,bck,chkfail,chkdown,lastchg,downtime,qlimit,pid,iid,sid,throttle,lbtot,tracked,type,rate,rate_lim,rate_max,check_status,check_code,check_duration,hrsp_1xx,hrsp_2xx,hrsp_3xx,hrsp_4xx,hrsp_5xx,hrsp_other,hanafail,req_rate,req_rate_max,req_tot,cli_abrt,srv_abrt,comp_in,comp_out,comp_byp,comp_rsp,lastsess,last_chk,last_agt,qtime,ctime,rtime,ttime,agent_status,agent_code,agent_duration,check_desc,agent_desc,check_rise,check_fall,check_health,agent_rise,agent_fall,agent_health,addr,cookie,mode,algo,conn_rate,conn_rate_max,conn_tot,intercepted,dcon,dses,wrew,connect,reuse,cache_lookups,cache_hits,srv_icur,src_ilim,qtime_max,ctime_max,rtime_max,ttime_max,eint,idle_conn_cur,safe_conn_cur,used_conn_cur,need_conn_est,uweight,agg_server_status,agg_server_check_status,agg_check_status,srid,sess_other,h1sess,h2sess,h3sess,req_other,h1req,h2req,h3req,proto,-,ssl_sess,ssl_reused_sess,ssl_failed_handshake,h2_headers_rcvd,h2_data_rcvd,h2_settings_rcvd,h2_rst_stream_rcvd,h2_goaway_rcvd,h2_detected_conn_protocol_errors,h2_detected_strm_protocol_errors,h2_rst_stream_resp,h2_goaway_resp,h2_open_connections,h2_backend_open_streams,h2_total_connections,h2_backend_total_streams,h1_open_connections,h1_open_streams,h1_total_connections,h1_total_streams,h1_bytes_in,h1_bytes_out,h1_spliced_bytes_in,h1_spliced_bytes_out,
www_ssl,FRONTEND,,,2365,66164,160000,2674547,7364640673982,1069039835706,0,0,1290,,,,,OPEN,,,,,,,,,1,2,0,,,,0,1,0,7755,,,,0,4056045898,0,1658,288043,32,,22701,38582,4056375550,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,1,6001,8191793,0,0,0,0,,,0,0,,,,,,,0,,,,,,,,,,0,2674547,0,0,0,4056375550,0,0,,-,2674555,0,5439614,0,0,0,0,0,0,0,0,0,0,0,0,0,2365,346,2674547,4056374288,8412490505427,1068762896456,0,0,
www,FRONTEND,,,1,5,160000,3466,248130,593027,0,0,37,,,,,OPEN,,,,,,,,,1,3,0,,,,0,0,0,13,,,,0,3234,0,251,3,0,,0,13,3488,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,0,13,3466,3234,0,0,0,,,0,0,,,,,,,0,,,,,,,,,,0,3466,1,0,0,3488,0,0,,-,0,0,0,0,0,3,0,1,0,0,0,0,0,0,1,0,1,0,3466,3475,296944,596122,0,0,
app,i-0ed07def403d51d04,0,0,1,33537,,5427241,9726848390,1385133631,,0,,33849,271,0,0,UP,1,1,0,5,2,18405,100,,1,4,9,,5427241,,2,89,,9419,L7OK,200,5,0,5393176,0,0,0,0,,,,5393176,15172,25,,,,,0,,,0,0,17,107,,,,Layer7 check passed,,1,3,3,,,,10.2.37.170:8080,,http,,,,,,,,0,60210,5367031,,,1,,0,7226,14244,60170,0,0,1,1,2,1,,,,3,,,,,,,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,
app,i-0f3070f2d8b7b1d0d,0,0,173,14884,,298538885,535274006840,77726797767,,0,,45294,43248,0,0,UP,256,1,0,4,2,18399,150,,1,4,10,,298538885,,2,10902,,20046,L7OK,200,0,0,298466900,0,6,0,0,,,,298466906,370418,14205,,,,,0,,,0,0,17,63,,,,Layer7 check passed,,1,3,3,,,,10.2.34.206:8080,,http,,,,,,,,0,839598,297699287,,,398,,0,7280,17522,60525,0,0,398,173,571,256,,,,3,,,,,,,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,
app,i-07c3394790ff363ff,0,0,172,14882,,296034911,530763867149,77021383861,,0,,50914,39210,0,0,UP,256,1,0,5,2,18418,100,,1,4,11,,296034911,,2,11475,,19980,L7OK,200,3,0,295959262,0,20,0,0,,,,295959282,372788,12903,,,,,0,,,0,0,18,67,,,,Layer7 check passed,,1,3,3,,,,10.2.35.181:8080,,http,,,,,,,,0,847203,295187708,,,399,,0,7248,18999,60953,0,0,399,172,571,256,,,,3,,,,,,,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,
app,i-0541d9d428440e383,0,0,1,225,,3966656,7115916553,1026978140,,0,,0,0,0,0,UP,1,1,0,0,1,18377,73,,1,4,1,,3966656,,2,46,,4191,L7OK,200,1,0,3966647,0,0,0,0,,,,3966647,10625,0,,,,,0,,,0,0,17,86,,,,Layer7 check passed,,1,3,3,,,,10.2.42.23:8080,,http,,,,,,,,0,20316,3946340,,,1,,0,11,1502,60198,0,0,1,1,2,1,,,,4,,,,,,,,,,-,0,0,0,,,,,,,,,,,,,,,,,,,,,,
app,BACKEND,0,0,347,38700,32000,4056374503,7364640718370,1069039943633,0,0,,221932,169328,0,0,UP,514,4,0,,5,18418,31,,1,4,0,,4056373077,,1,22702,,38582,,,,0,4056045898,0,607,288046,32,,,,4056334583,823043,59015,0,0,0,0,0,,,0,0,17,59,,,,,,,,,,,,,,http,leastconn,,,,,,,0,3975617,4052397460,0,0,,,0,7280,18999,60953,0,,,,,514,0,0,0,,,,,,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1146,347,4052868,4056450328,1063974567729,8696536936083,0,0,
stats,FRONTEND,,,0,3,1000000,12851,1696332,928968202,0,0,0,,,,,OPEN,,,,,,,,,1,5,0,,,,0,0,0,1,,,,0,12851,0,0,0,0,,0,1,12851,,,0,0,0,0,,,,,,,,,,,,,,,,,,,,,http,,0,1,12851,12851,0,0,0,,,0,0,,,,,,,0,,,,,,,,,,0,12851,0,0,0,12851,0,0,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,12851,12851,2017607,929508480,0,0,
stats,BACKEND,0,0,0,0,100000,0,1696332,928968202,0,0,,0,0,0,0,UP,0,0,0,,0,192772,,,1,5,0,,0,,1,0,,0,,,,0,0,0,0,0,0,,,,0,0,0,0,0,0,0,15,,,0,0,1,2,,,,,,,,,,,,,,http,roundrobin,,,,,,,0,0,0,0,0,,,0,0,4,224,0,,,,,0,0,0,0,,,,,,,,,,,-,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
And the result of show fd
:
2519.fds.txt
Maybe what these two instances have in common is that they both hit their maxsslrate
limit at some point in the past.
from haproxy.
Could it be related to #2476 then?
from haproxy.
However, the fix was backported, thus it can be tested.
from haproxy.
[EDIT: previous version had typo 2.6.9 -> 2.9.6]
As mentioned in the bug description, the present issue happens with "haproxy-next 2.9.6, with patches up to c788ce33af85a28fa66f591cb65a7ea6c0f92007" which includes BUG/MINOR: listener: Wake proxy's mngmt task up if necessary on session release from #2476
from haproxy.
Ha! now that I re-read my message, I realise my there is a typo, it's 2.9.6! not 2.6.9, this applies to my last comment as well, adding an [EDIT] note.
from haproxy.
Thanks for the confirmation :) So it is indeed another issue.
from haproxy.
Related Issues (20)
- haproxy -v do not respect -q HOT 8
- prepending "no" in a backend doesn't disable splicing HOT 5
- Admin stats page should respect current URL in a browser HOT 4
- HAProxy captured response header Location not in Log HOT 6
- haproxy build errors on "m32" mode HOT 1
- Weird behaviour with websockets HOT 4
- intermittently req_ssl_sni failing to navigate correct backend with Version 2.8.5 HOT 1
- Add BBR to QUIC Congestion Control Algorithm HOT 5
- Trying to insert a string in response body using custom filter causes no response to client. HOT 13
- src/stick_table.c: null pointer dereference suspected by coverity HOT 1
- haproxy 2.6.16 : connections stuck in close_wait state HOT 3
- Specify Resolvers for the Httpclient by ID. HOT 2
- haproxy does not close sockets after FIN sent by remote end HOT 3
- Can not set IP for backend server by Runtime API HOT 2
- 3.0-dev7: multiple crashes HOT 14
- Haproxy return empty response from server when headers are modified for GET request HOT 9
- accessing/setting global variables in register_init HOT 2
- server-state-file causing SEGV in 3.0-c0ee2d7 HOT 9
- "internal error" stream state when using gzip compression on 3.0-dev7 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haproxy.