GithubHelp home page GithubHelp logo

Comments (8)

vkssv avatar vkssv commented on June 11, 2024

Hello and thanks a lot for your investigations !

The problem of logging two FIX messages together very probably lied to this content-based rule:

tcp-request content set-var(txn.payload) req.payload(0,0),regsub(\x01,|,g),regsub(^,"Initial Request: <"),concat(">")

Would you, please, try it just without concat converter at the end and share the log.

tcp-request content set-var(txn.payload) req.payload(0,0),regsub(\x01,|,g),regsub(^,"Initial Request: <")

Also to debug a little more, would you, please retry with this simplified log format and payload and share the logs.

log-format "${TCP_LOG} %[var(txn.payload)]"
tcp-request content set-var(txn.payload) req.payload(0,0)

Thanks,

from haproxy.

jeremysprofile avatar jeremysprofile commented on June 11, 2024

Unfortunately, I'm not sure further investigation is possible on my side. We have only seen this issue this one time in the past 12 months, and we do not have packet capture necessary to replay and retest.

As far as HAProxy routing goes, we only need the source IP and SenderCompID. Given the txn.payload / req.payload(0,0) only appears to be combining messages in the same session, this does not really affect routing to the appropriate backend, as long as you believe this is only a logging artifact and that nothing is causing HAProxy to hold that first message until the second message arrives.

It seemed strange to me that we managed to log two FIX messages given the documentation about HAProxy only inspecting the first message in a session, and I wanted to report it. I admit that our logging setup is a little strange, and am willing to believe this is due entirely to that. I am very bad at networking, and do not understand how req.payload(0,0) gets determined, or how it could hold multiple FIX messages, but as long as HAProxy is in fact sending the traffic to the backend as soon as it reads the first message, then I can handle some infrequent extraneous information in the logs.

from haproxy.

vkssv avatar vkssv commented on June 11, 2024

There are 2 hypothesis:

  1. This problem may be lied to exceeded txn-max-size.

https://www.haproxy.com/documentation/haproxy-configuration-manual/2-8r1/#tune.vars.txn-max-size

That is why it is interesting to see non-modified txn.payload variable value via this simplified settings:

frontend fix_listener
      ...
      log-format "${TCP_LOG} %[var(txn.payload)]"
      tcp-request content set-var(txn.payload) req.payload(0,0),hex
      ...
  1. Second: in a case, when client logs in with "Initial Request" message and then immediately logs out ("Logon not received") txn, by some strange circumstances, contains these two request messages.

Thus req.payload(0,0),regsub(\x01,|,g),regsub(^,"Initial Request: <"),concat(">") produces such line below:

...Initial Request: 
<8=FIX.4.4|9=77|35=A|49=YYYXXXINIT|56=ZZZZZ|34=1|52=20240226-12:01:00.968|98=0|108=30|141=Y|10=014|8=FIX.4.4|9=81|35=5|49=YYYXXXINIT|56=ZZZZZ|34=2|52=20240226-12:01:10.921|58=Logon not received|10=048|>

So, the same, it is interesting to see unmodified txn.payload in the log.

I'm trying to reproduce this, would you please, provide info which FIX client/server you are using and its versions ? ( probably, quickfixgo ?) and also tune settings below, if they are in your global section:
I will check meanwhile what we have as a defaults for it.

tune.vars.global-max-size
tune.vars.proc-max-size
tune.vars.reqres-max-size
tune.vars.sess-max-size
tune.vars.txn-max-size

Regards,

from haproxy.

jeremysprofile avatar jeremysprofile commented on June 11, 2024

We're using quickfixJ 2.3.0 (https://mvnrepository.com/artifact/org.quickfixj/quickfixj-all/2.3.0) for the server we route to - I don't know what the client is using when they connect to us.

We do not currently set any tune.* anywhere, so we're getting the default values.

from haproxy.

vkssv avatar vkssv commented on June 11, 2024

Hello and thanks a lot for the information !

A bit of updates from our side:

  1. This problem is not lied to tune.vars.txn-max-size default limit, as modified LOGON message is about 150 chars. So, if there are no some explicit tune.* settings in your global section, internal buffer is large enough to keep modified txn.payload value.

  2. This line below has the right syntax, I reproduced with it a wel-formated LOGON message, same as in your logs.

tcp-request content set-var(txn.payload) req.payload(0,0),regsub(\x01,|,g),regsub(^,"Initial Request: <"),concat(">")

CLIENT GOOD 127.0.0.1 SERVER Initial Request: <8=FIX.4.4|9=76|35=A|34=1|49=CLIENT|52=20240306-17:37:59.008|56=SERVER|98=0|108=43200|141=Y|10=101|>
  1. To reply to your questions: req.payload(0,0) allows you to fetch the whole FIX message, see the explanation of the syntax here: https://www.haproxy.com/documentation/haproxy-configuration-manual/2-8r1/#7.3.5-req.payload

Then you apply the 1st regexp regsub(\x01,|,g) on the whole paylod to replace ASCII code 01, which is SOH delimiter by "|", after this the 2nd regexp regsub(^,"Initial Request: <") is applied to the resulting string. It adds "Initial Request: <" prefix in the beginning. Finally concat(">") adds ">" at the end of resulting string.

  1. I still haven't reproduce this issue on our side. Thus, it would be nice to have some other details here:

Could you, please dump:

ethtool -k <network_device_name>  => the name of network interface on which you receive FIX traffic 
ethtool -S <network_device_name>

It is really interesting to have uptime values with CPU load average, or any CPU usage stats, that corresponds to this moment in the logs. Maybe some system monitoring logs, that corresponds to this date were archived and thus still available ?

for the future we are interested in counters that mat be produced by:

mpstat -P ALL  # or 
ps -eo pcpu,pid,user,args | sort -k 1 -r | head -5 

I suppose that this problem may lie to GRO or LRO enabled and, due to congestion somewhere on the datapath, the LOGON and LOGOUT client messages were arrived almost in the same moment at the network driver code and then in kernel networking stack. Timestamps that we see in "merged" payloads are corresponded to the time, when a client sent these messages, but not to the exact time when they arrived to haproxy host. To track the arrival time we only dispose the TCP timestamp in the haproxy log, but the message has already "merged" payload :/

As messages are very short LRO or GSO may potentially did this "merge" (driver code and linux networking code), especially if the system load was high enough at that moment. So, haproxy saw instead of two separate packets, only the one with squashed payload.

Thanks in advance
Kind regards,

from haproxy.

vkssv avatar vkssv commented on June 11, 2024

I also see from haproxy -vv output, that binary is linked with musl-libc and you are using PROXY protocol.

  1. Does haproxy is run in the container ( probably based on the Alpine Linux ) ? Is this container run in its turn in the VM or directly on the host machine ?
    If it is in the VM we also need a dump of ethtool -k and ethtool -S over the VM's sink netdev.

  2. Which software are you using to add PROXY protocol header and which version of this protocol ? It would be nice to have this config as well.

The very probable scenario, is that the TCP segments with client LOGON and LOGOUT messages were arrived either on the port of an intermediate proxy (which adds PROXY proto header), or on the port of haproxy host/VM. And this happened almost at the same moment, due to the network congestion (you've also mentined before, that you had high latencies in that moment).

As the payload of the segments is small (98 bytes), they were merged by LRO or most probably by GRO, especially in the case if haproxy is in a VM. So, the payload that haproxy received was already contained LOGON concatenated with LOGOUT arrived within the one TCP segment.

With PROXY protocol enabled size will be a little bit bigger, but LRO/GRO anyway could perform this merge, especially if the server is shared with other apps, that do networking.

If you will have a chance to catch these issue again, could you please, provide us a short description of your network topology:

  • how many hops before Internet GW ?
  • where the intermediate proxy, that adds PROXY proto header, is placed, where haproxy is placed (just behind or there are some other services in between) ?

haproxy logs, intermediate proxy logs and its full configuration files corresponded to this moment, system logs (to see the whole picture and better analyze the load), and CPU usage stats, it could be produced with vmstat 1 or commands from my previous message.
In CPU usage stats the most interesting parts are:

  • the number of system interrupts (in);
  • context switches (cs);
  • percentage of cpu time on non-kernel processes (us);
  • percentage of cpu time on kernel processes (sy);
  • percentage of idle CPU time (id);
  • percentage of time spent on waiting for input/output (wa);
  • percentage of time stolen by virtual machine (st).

To conclude, with all info that I have for the moment, it seems not to be the haproxy bug, but the miss of NIC/TCP stack offloads tunnings or some misconfiguration on the intermediate proxy side.

Hope this helps,
Thanks in advance,
Regards

from haproxy.

jeremysprofile avatar jeremysprofile commented on June 11, 2024

Could you, please dump:

ethtool -k <network_device_name>  => the name of network interface on which you receive FIX traffic 
ethtool -S <network_device_name>

Sorry, I am more ignorant than you think I am. I do not know which network device I am receiving FIX traffic on, or how to figure out that information.

Does haproxy is run in the container ( probably based on the Alpine Linux ) ? Is this container run in its turn in the VM or directly on the host machine ?
If it is in the VM we also need a dump of ethtool -k and ethtool -S over the VM's sink netdev.

We run HAProxy in a container in managed Kubernetes in AWS, so yes, container -> VM -> host.

Which software are you using to add PROXY protocol header and which version of this protocol ? It would be nice to have this config as well.

It's another HAProxy instance, though that one is "directly" on an EC2 instance, so it's just VM -> host.

As the payload of the segments is small (98 bytes), they were merged by LRO or most probably by GRO, especially in the case if haproxy is in a VM. So, the payload that haproxy received was already contained LOGON concatenated with LOGOUT arrived within the one TCP segment.

That makes sense to me, thank you for explaining!

With PROXY protocol enabled size will be a little bit bigger, but LRO/GRO anyway could perform this merge, especially if the server is shared with other apps, that do networking.

The host machine probably does many things, since we're only given part of the machine with the VM.

It is really interesting to have uptime values with CPU load average, or any CPU usage stats, that corresponds to this moment in the logs

For the HAProxy that generated these logs, I only have
image
(Note: times are shown in America/Denver timezone - 05:00 corresponds to the Feb 26 12:00 UTC that these events occurred.)

I realize this is not at all helpful.

If we see this issue occur again, I will push for gathering more detailed metrics on CPU and network load.
For now, I am very comfortable accepting your hypothesis that the messages were combined into a single TCP segment.
Apologies for taking so much of your time on this - I really appreciate you explaining things!

from haproxy.

vkssv avatar vkssv commented on June 11, 2024

Hi ! Thanks a lot for sharing your setup. This will be useful for us.

Yes, LRO/GRO payload merge is very common issue, when data consuming applications run in VM, it could produce such strange random effects, as packets from the same flow must arrive almost at the same moment on the NIC or in the stack.

Here are how it works in more details: https://lwn.net/Articles/358910/, be aware !

I was unclear, CPU load in does not play an explicit role here, if these offload features are enabled in the driver code and/or in the stack, they always do their job. But high load especially at hypervisor will increase latencies at kernel and driver code => more chances that packets destined to VM will all arrive already merged by GRO at hypervisor stack or they will arrive as is but at the same time at the network queue of vNIC in VM.

General recomendation: LRO (NIC driver ) is more aggressive than GRO in terms of merging. So it is interesting to start tuning by disabling LRO at first and see if this helps. Normally GRE squashes TCP segments from the same flow only, when they arrive at the moment in the kernel stack, so it is less probable.

Be aware, if you disable GRO and LRO in VM, it is also recommended to disable checksum offloads ! Otherwise VM's stack and vNIC driver will spend allocated to them CPU cycles to calculate small packets checksums. TX and RX checksum offloads must be enabled in this case only on PHY NIC of hypervisor, this gives better performance in VM and in hypervisor.

I've just looked at ENA driver code, as it is very probable that you have this one in your VMs as vNIC ( you can check with ethtool -i <netdev_name>), at VM level in driver code you can only disable TSO ( offloads on egress path )
https://elixir.bootlin.com/linux/v6.1.3/source/drivers/net/ethernet/amazon/ena/ena_netdev.c#L4011.
It does not support LRO. So GSO needs to be checked in VM network stack and to ask directly AWS network guys, what they have as offload settings at its hypervisor level.

It is very common, that for some not omnipresent protocols as FIX, GSO|LRO at hypervisor level makes surprises.

Kind regards,

from haproxy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.