Hey! We do a lot of network flow work. We have a sort of issue using "source" and "d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Indeed, we've been pondering that already, after <a class="user-mention notranslate" d

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Top level: "client" and "server",about elastic/ecs

Comments (31)

ave19 commented on June 10, 2024 1

@webmat

Who's the client and who's the server, if the flow event comes from an agent that's sitting in between? :-)

Whoever got the first SYN is the server. Generally, the lower port.

[edit: our pcap drop rates are really low, but not zero, so we might miss that SYN. See @robcowart comment. also also, with UDP you don't even get that. For a UDP service, unless you do protocol inspection, you can't really know whether the packet you saw was the request or the answer.]

The agent in the middle may not be able to tell, though. When we map source and destination to client and server, we don't delete the source and destination bits, those are the ones we're sure about!

from ecs.

ave19 commented on June 10, 2024 1

@willemdh I hear you, but think about UDP. Or think about DNS in particular. One system sends a request to another, and the other answers it. Do you swap source and destination on both sides of that? UDP is stateless, so you'd almost have to. That means the DNS server is in half the events on both sides. How do you pie chart where your requests are coming from in that scenario? Filter it in post? It is Elasticsearch I guess. It's not that doing source and destination is wrong in this scenario, it's that it's less easy to work with the data collected.

from ecs.

dcode commented on June 10, 2024 1

I like where this is going, but to throw a wrench in it, I'm a huge user of Bro data (and also Suricata). I like the connection top-level object concept, but bro tracks "client" and "server" a little differently, as does Suricata. Bro calls whoever initiate the TCP/IP connection the "originator" and other system in the conversation the "responder". Going a layer deeper, Bro will analyze the protocol, and for something like HTTP it will record the "originator" and "responder" of that protocol. In most common protocols, the originator is the same at the TCP/IP layer and the HTTP layer. In several protocols it's not guaranteed, like SMTP or FTP. In those protocols, it's completely possible that the "responder" of the TCP/IP connection initiates the protocol as the "originator".

All that said, I think it makes sense to manage "connection" data at only the TCP/IP layer (or equivalent transport protocol). If there's protocol specific information that confirms direction of the application protocol, that can be recorded in a protocol-specific subobject (i.e. network.smtp.originator, network.dns.responder).

Note, that I'm not trying to get into a religious war against client/server and originator/responder. I think for the purposes of ECS, it's equivalent.

Also of note, Suricata uses src and dst for IP addresses, but tracks by count as bytes_toserver and bytes_toclient using similar semantics of Bro. That is, initial assessment is based on who sent the first packet (regardless of TCP, UDP, ICMP, etc) and is confirmed if a more specific protocol analyzer is used.

All that said, I'm in favor of (where the semantics of packets/bytes mean that endpoint sent it):

connection.client.ip: 1.2.3.4
connection.client.port: 12367
connection.client.packets: 180
connection.client.bytes: 1234
connection.server.ip: 6.7.8.9
connection.server.port: 965
connection.server.packets: 150
connection.server.bytes: 1234
connection.protocol: tcp
connection.service: smtp
network.smtp.client.ip: 6.7.8.9

Under any case, if I'm receiving packets via a tap or span port, I have no idea which direction that's going (inbound vs outbound).

EDIT: Added example data

from ecs.

dcode commented on June 10, 2024 1

So, pre 1.0-beta, I implemented as much ECS and ECS-friendly items as I could for RockNSM. In light of a firm decision, I went with network.source, network.destination, network.client, and network.server. Since the prevailing votes for source/destination are as top-level fields, I have to re-cast my vote to top-level client/server fields, because as @ave19 noted, semantics matter.

Now, understandably, having IPs in different fields makes it more difficult to build dashboards and such. In my final logstash enrichment for generic ECS data, I added an additional field called network.community_id, which is a deterministic hash of a 5-tuple. This is inspired by some work in the Zeek and Suricata communities. This enables us to keep direction context for the logs that support it, keep the client/server context for the logs that support it, and the ability to pivot across both types of logs.

I'm not proposing we make community_id core ECS, but it addresses the problem while retaining the most of both worlds. In the meantime, I'll be renaming my fields to use the top-level names of source, destination, client, server.

from ecs.

webmat commented on June 10, 2024

Indeed, we've been pondering that already, after @robcowart's comment in this thread.

I wonder with this strategy, which side is being called "server", in cases where the infrastructure under management is calling to the outside (e.g. calling an external API, triggering webhooks to arbitrary customer endpoints).

In common parlance, my node generating the event would be the "client", and the remote (which I may or may not manage) would be the "server".

Parallel to this, it may be worth mentioning that for other reasons, we're starting to discuss doing classification of the IPs (local, private, public, multicast), which may help figure out which side is which.

from ecs.

webmat commented on June 10, 2024

Another note about this, more applicable for security, but also when monitoring network gear.

Who's the client and who's the server, if the flow event comes from an agent that's sitting in between? :-)

from ecs.

robcowart commented on June 10, 2024

Basically a server provides a service (a port or group of ports). Clients connect to those services. A server will only respond to a client. It doesn't initiate conversations. Conversely a client only listens for responses. It doesn't listen for arbitrary connection requests.

The determination of client and server can be quite tricky if you don't have a record of the initial packet transmitted (such as the SYN packet sent to initiate the TCP handshake). 20 years ago you could be >90% accurate simply by assuming the lower port value is the server and the larger value is the client. However with so many applications now listening on higher ports (e.g. ES 9200, LS 9600, Kafka 9092, etc.) you will get at best about 65% accuracy with with this method. Many log sources are bit more authoritative in this regard than flow records.

Basically there isn't a single method that works. A combination of data source specific methods that arrive at a consensus is usually necessary. With the solution we provide to our paying customers, we find that we are about 95% accurate out-of-the-box. With some tuning (it can be customized) 98-99% is possible.

@webmat can you provide a more specific example of what you are referring to regarding an "agent"?

from ecs.

robcowart commented on June 10, 2024

I will also add that local/private/public isn't much help when determining client/server. However reserved multicast and broadcast IP and MAC addresses will always be associated with the server end of the conversation. This is one input for the "consensus" method we use.

from ecs.

robcowart commented on June 10, 2024

The last point I will make is that it is not an either/or situation. While client/server is the preferred perspective for most use-cases, src/dst is needed for some types of threat detection.

Consider a few security related analytics scenarios...

A port scan will be from client to server.
However an amplification attack will look at sources to destinations, where the source port would be from a well known UDP service (e.g. 53 for DNS).

So depending on what we are looking for our analytics configuration will sometimes use src/dst and sometimes use client/server.

from ecs.

ave19 commented on June 10, 2024

@webmat

in cases where the infrastructure under management is calling to the outside (e.g. calling an external API, triggering webhooks to arbitrary customer endpoints).

The host running the API service is also generating logs. From that service's perspective, it's the server (running a service) and your caller is the client.

In the events coming from your server, its the server and things that connect to it are clients.

from ecs.

ave19 commented on June 10, 2024

@robcowart

The determination of client and server can be quite tricky

You are so right!

from ecs.

ave19 commented on June 10, 2024

Honestly, I don't expect a flow's interpretation of client and server to be 100% accurate for all the reasons @robcowart points out.

Most of the time we're going to use those tags, we're applying them to logs coming from things like web servers. From inside a web server's event feed, the source and destination don't really apply. And, if you're one of ten servers running on a host, you might have a different server.ip from the others, and each of those different from host.ip on which you run, and agent.ip or device.ip where your logs get sent, or what have you.

It just makes a little space.

from ecs.

ave19 commented on June 10, 2024

And for cyber reasons, having all of your servers (on whatever boxes) call all of their clients client allows us to more easily track one or more IPs that might be up to something by correlating logs from all the services running on all the hosts.

from ecs.

robcowart commented on June 10, 2024

I agree with you @ave19, for some data sources the client and server are clear. We still set source and destination fields, but will also set something like "[metadata][isServer]" => "destination". When the event hits the client/server determination logic, this flag will cause the more complicated logic to be bypassed, and the client/server fields to be set with a simple assignment.

from ecs.

ave19 commented on June 10, 2024

@robcowart interesting... we have lots of different kinds of feeds, so lots of different parsing logic. most of the time, we can go straight in to the server.ip form.

from ecs.

willemdh commented on June 10, 2024

Although I can definitely understand your points @ave19, for me source and destination are more clear and less Confusing / ambigous then client and server. When 2 applications are exchanging data through an esb, id really prefer to be able to use source and destination objects in the esb logs. But then again I'm a system engineer, not a network engineer.

from ecs.

willemdh commented on June 10, 2024

@ave19 Ok, I can defintely use the client / servers approach for F5 / Palo Alto use cases.
In case I would need a 'non-connection' related source destination info,
I can always create my own (private) source and destination objects.

So, looking at #51 this is were ECS would go then

Field	Description	Type
connection.server.host.ip	IP address of the server.Can be one or multiple IPv4 or IPv6 addresses.	ip
connection.server.host.name	Hostname of the server.	keyword
connection.server.host.port	Port of the server.	long
connection.server.host.mac	MAC address of the server.	keyword
connection.server.host.domain	server domain.	keyword
connection.server.host.subdomain	server subdomain.	keyword
connection.client.host.ip	IP address of the client.Can be one or multiple IPv4 or IPv6 addresses.	ip
connection.client.host.name	Hostname of the client.	keyword
connection.client.host.port	Port of the client.	long
connection.client.host.mac	MAC address of the client.	keyword
connection.client.host.domain	client domain.	keyword
connection.client.host.subdomain	client subdomain.	keyword
connection.direction	Direction of the network traffic. Recommended values are:* inbound* outbound* unknown	keyword
connection.forwarded_ip	Host IP address when the client IP address is the proxy.	ip

Shouldn't we move network.session_id to the connection object too then? See #37

from ecs.

ruflin commented on June 10, 2024

Thanks for all the discussion above. My take away so far is that server, client are not necessarily replacing source, destination but both can exist at the same time and complete each other.

What if we have all 4? I personally like adding server, client as especially for web server logs as examples it feels more intuitive to use client and server.

from ecs.

ave19 commented on June 10, 2024

Heh, um, at the risk of scuttling my own topic: I was poking this today and decided that service might be better than server since I can pack more than one service into a single box. But that means that I can use host instead of server, and if I put things like the (possibly virtual) ip info into service.ip and service.port, I might be able to make that part work with existing top level fields. A service could be in a docker container on a host and so forth. I think calling it service makes it clear it's not necessarily a box. Thoughts about that part?

To be clear, this is mostly about logs coming from that running instance of the service (ie apache). That service will report that a client connected to it, so I think I still want client as a top level. Things like service.state (with a value like running) still apply.

The logs from that service will leave artifacts that allow me to collect information about the host and agent along the way. I think this is enough to let me trace that log event back to the origin.

from ecs.

webmat commented on June 10, 2024

@robcowart what I meant by "agent" was simply a monitoring agent like Packetbeats. Perhaps a misnomer, because in some cases, the event source will be a device itself being poked from the outside. But I just meant whatever was collecting the traffic event data.

Given the current consensus of how tricky it can be to reliably determine who's the server & who's the client, I think we don't have a choice but to keep source & destination. Then in cases where we can reliably determine server/client, we can add the appropriate fields.

Or were you actually removing src/dst whenever you got to reliably determine srv/cli?

from ecs.

robcowart commented on June 10, 2024

As I mention above both can be valuable, depending on what you are trying to determine.

I mentioned in another issue, that I would prefer to have src/dst and then a flag field like isServer which would be either src or dst. This would avoid A LOT of duplicate data. Unfortunately Kibana doesn't work like that. You could possibly get away with scripted fields for some things, but not all viz types support scripted fields (e.g. Timelion and TSVB).

Until there is more flexibility I will continue to tell myself "disks are cheap" and will value functionality and great user-experience over a few extra HDDs.

from ecs.

ave19 commented on June 10, 2024

I agree, both are valuable. I also agree disks are cheap!

In a network flow monitoring situation, the only thing you can really reliably know is source and destination. However, for cyber, that means doing extra work to figure out which end of that connection is the machine you're trying to defend. (Maybe it's the low port, etc.) After you do that work, you can map source and destination into server and client (or service) as appropriate. But, we would never give up the fields we know so we will end up with all four fields. (If your use case doesn't care about who the server was, then sorting them out is optional.)

When will we get per document field aliasing? 😄 That would be the best scenario.

In a service monitoring situation, if the service has an open port and responds to queries, it's a straight client and server (or service) case, and since those terms are more descriptive (for our cyber mission anyway) we'd use those over source or destination. If the service is a pusher and makes logs, it's still the service but it might be appropriate to call the other end the server. See, I like service now. 😄 I can pin a lot of things to it!

from ecs.

robcowart commented on June 10, 2024

@dcode I use client/server determination with Suricata data here...
https://github.com/koiossian/synesis_lite_suricata

I would appreciate hearing your feedback on how it is handled, and whether you see any issues.

from ecs.

webmat commented on June 10, 2024

This is not feedback, this is just a more precise pointer ;-) Client vs Server code starts at line 601 here

See also various places between lines 209 to 468 to see the traffic locality determination.

from ecs.

webmat commented on June 10, 2024

One thing I like about it is that it's entirely based on information taken from the event itself (including some fast translate-based enrichment).

It doesn't depend from doing an ElasticSearch search per event.

from ecs.

webmat commented on June 10, 2024

@ave19 To answer your question on aliasing, here's the progress so far. The concept of alias is available in recent builds, but still incomplete (in my opinion) for what we're trying to achieve.

So if you use a recent build of ElasticSearch, you can search in Kibana -- and even leverage the new auto-complete -- based on your "original" field just as much as your alias.

What's still missing is the ability to display based on the alias' name. Your visualizations and API results will only contain the original field. I haven't checked yet if the response includes a mapping of the aliases, so clients could handle this however they want. I suspect the alias mapping is not returned yet either.

from ecs.

ave19 commented on June 10, 2024

Sounds like progress! Thanks

On Mon, Aug 6, 2018, 11:14 AM Mathieu Martin ***@***.***> wrote: @ave19 <https://github.com/ave19> To answer your question on aliasing, here's the progress so far. The concept of alias is available in recent builds, but still incomplete (in my opinion) for what we're trying to achieve. So if you use a recent build of ElasticSearch, you can search in Kibana -- and even leverage the new auto-complete -- based on your "original" field just as much as your alias. What's still missing is the ability to display based on the alias' name. Your visualizations and API results will only contain the original field. I haven't checked yet if the response includes a mapping of the aliases, so clients could handle this however they want. I suspect the alias mapping is not returned yet either. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#63 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AICmGo3oFxHL7XAD3ZbYuW9YBKW2NnLpks5uOHlogaJpZM4VqsO1> .

…

-ave

from ecs.

ruflin commented on June 10, 2024

This discussion triggered a more general question on my end on what our "standard" is to reusing / composing objects. I opened an issue related to it here to not mix it with this discussion here: #71

from ecs.

vbohata commented on June 10, 2024

+1 for having server, client, source and destination. I can imagine some application logs may require all of them (for DHCP for example). Also web application logs contains client and server (source and destination is quite odd use here).

from ecs.

webmat commented on June 10, 2024

I love the idea of supporting community_id in ECS eventually, thanks for bringing this up.

from ecs.

webmat commented on June 10, 2024

@dcode We're introducing network.community.id. Check out #208.

from ecs.

Top level: "client" and "server" about ecs HOT 31 CLOSED

Comments (31)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs