GithubHelp home page GithubHelp logo

Distributable about jmespath HOT 7 CLOSED

kintyre avatar kintyre commented on June 12, 2024
Distributable

from jmespath.

Comments (7)

lowell80 avatar lowell80 commented on June 12, 2024

Yes. I'm unsure if you mean distributed search (where the app is distributed to the indexers via bundle replication), or deployed to multiple search heads. But in either case, it should work fine.

I have seen some issues with distributed search Splunk Cloud where certain bits weren't replicated correctly or something like that, but I was able to reproduce the issue with other custom search commands (distributed by apps written by Splunk); so I concluded that the issue was outside my control and therefore I handed the issue over to Splunk Support.

Have you tried something that hasn't worked as expected? Or is this just a general question.

from jmespath.

mwisnicki avatar mwisnicki commented on June 12, 2024

This is the definition of distributable I've had in mind: https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Commandsbytype#Streaming_commands

It's more of general question. I'm not that deep into splunk but I was warned that since this is centralized streaming command, I should avoid it in dashboards and reports for performance reasons.

From what I was able to gather, distributable command would have distributed = true in commands.conf which this project does not.

from jmespath.

lowell80 avatar lowell80 commented on June 12, 2024

Ah, yes the command is streamable. The jmespath command looks at a single event at a time, and doesn't let information cross that boundary, therefore the command can be distributed so it can be run on either on indexers or on the search head, depending on how a search is constructed.

Performance isn't dictated simply by the classification of the search type (streaming vs transforming, stateful/non-stateful, ...) It ends up being much more complicated than that, and there's plenty of good resources on search optimization. The most basic rule being, get your base search correct (eliminate unwanted data as early as possible in the process.) If your new to Splunk, then I'd suggest not worrying about that yet. Aim for functionality first, and once you get the hang of things, then look at optimizations; often it's not a big deal, but it's very use-case dependent. That being said. There is, and always will be, some extra overhead for external search commands like jmespath. For example, if the built-in spath command does everything you need, then that will perform much more quickly than jmespath which launches an external python process.

BTW, I've written a bit about which one to choose here:
https://github.com/Kintyre/jmespath/wiki/Command-Reference-jmespath#when-to-use-jmespath-vs-spath

I'm not sure about the distributed = true setting in commands.conf. I think the option you are looking to is streaming = true, which is set for jmespath. At some point I'm going to upgrade the interface to use the Splunk Python SDK which uses the newer style "chunked" interface rather than the old internal (non-published splunk.Intersplunk library), but none of this will change the streaming behavior.

from jmespath.

mwisnicki avatar mwisnicki commented on June 12, 2024

Thanks for explanation. The information about distributed = true being needed I've actually found in splunklib in this repo:

distributed = ConfigurationSetting(value=True, doc='''
:const:`True`, if this command should be distributed to indexers.
Under SCP 1 you must either specify `local = False` or include this line in commands.conf_, if this command
should be distributed to indexers.
..code:
local = true
Default: :const:`True`
Supported by: SCP 2
.. commands.conf_: http://docs.splunk.com/Documentation/Splunk/latest/Admin/Commandsconf
''')

Do you know what's the performance difference for simple property navigation with jmespath compared to equivalent spath query?

Do you think it's possible to write splunk extension that would match or get close to performance of built-in command (in whatever language)?

from jmespath.

lowell80 avatar lowell80 commented on June 12, 2024

Okay that's in the Splunk Python SDK, which is already in use for the jsonformat command and eventually will be used for jmespath (hopefully before the 2.0 release). Under the covers the mode of operation is negotiated at runtime for "chunked" search commands (between splunk and external search commands), and therefore very little definition ends up in commands.conf. Note that distributed defaults to True.

I haven't done any official performance comparisons. For most of the searches where I use it, it's because spath can't get the job done, or it would take a half-dozen SPL command to do the equivalent of what a single jmespath expression can do. In those cases, any performance hit becomes effectively irrelevant to me. Of course performance can't suck. In practice, I haven't seen jmespath become the performance bottleneck.

Can you clarify. Is this an academic concern or have you tried it and run into performance issues?
Do you have very demanding performance requirements?

from jmespath.

mwisnicki avatar mwisnicki commented on June 12, 2024

Right now I'm just trying to satisfy my curiosity :)

from jmespath.

lowell80 avatar lowell80 commented on June 12, 2024

My experience has been that it's typically fast enough. Use cases where super high performance is necessary typically isn't well suited for Splunk (or any other big data platform) in the first place.

If you use it and find that things are slower than you'd expect, please reach back out, and we'll see what can be done. Most often, there are ways to restructure a search to make it much faster (because most often, jmespath doesn't end up being the bottle neck. It's typically orders of magnitude faster than say, the time it takes to pull the raw events off disk.)

I'm going to go ahead and close this. BTW, a great resource for general (or highly-specific) Splunk performance questions is Splunk User Slack channel or Splunk Answers. (I can be found on both.)

from jmespath.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.