Comments (7)
Yes. I'm unsure if you mean distributed search (where the app is distributed to the indexers via bundle replication), or deployed to multiple search heads. But in either case, it should work fine.
I have seen some issues with distributed search Splunk Cloud where certain bits weren't replicated correctly or something like that, but I was able to reproduce the issue with other custom search commands (distributed by apps written by Splunk); so I concluded that the issue was outside my control and therefore I handed the issue over to Splunk Support.
Have you tried something that hasn't worked as expected? Or is this just a general question.
from jmespath.
This is the definition of distributable I've had in mind: https://docs.splunk.com/Documentation/Splunk/7.3.1/SearchReference/Commandsbytype#Streaming_commands
It's more of general question. I'm not that deep into splunk but I was warned that since this is centralized streaming command, I should avoid it in dashboards and reports for performance reasons.
From what I was able to gather, distributable command would have distributed = true
in commands.conf
which this project does not.
from jmespath.
Ah, yes the command is streamable. The jmespath
command looks at a single event at a time, and doesn't let information cross that boundary, therefore the command can be distributed so it can be run on either on indexers or on the search head, depending on how a search is constructed.
Performance isn't dictated simply by the classification of the search type (streaming vs transforming, stateful/non-stateful, ...) It ends up being much more complicated than that, and there's plenty of good resources on search optimization. The most basic rule being, get your base search correct (eliminate unwanted data as early as possible in the process.) If your new to Splunk, then I'd suggest not worrying about that yet. Aim for functionality first, and once you get the hang of things, then look at optimizations; often it's not a big deal, but it's very use-case dependent. That being said. There is, and always will be, some extra overhead for external search commands like jmespath
. For example, if the built-in spath
command does everything you need, then that will perform much more quickly than jmespath
which launches an external python process.
BTW, I've written a bit about which one to choose here:
https://github.com/Kintyre/jmespath/wiki/Command-Reference-jmespath#when-to-use-jmespath-vs-spath
I'm not sure about the distributed = true
setting in commands.conf. I think the option you are looking to is streaming = true
, which is set for jmespath. At some point I'm going to upgrade the interface to use the Splunk Python SDK which uses the newer style "chunked" interface rather than the old internal (non-published splunk.Intersplunk
library), but none of this will change the streaming behavior.
from jmespath.
Thanks for explanation. The information about distributed = true
being needed I've actually found in splunklib in this repo:
jmespath/bin/splunklib/searchcommands/streaming_command.py
Lines 124 to 139 in d3c1f0c
Do you know what's the performance difference for simple property navigation with jmespath compared to equivalent spath query?
Do you think it's possible to write splunk extension that would match or get close to performance of built-in command (in whatever language)?
from jmespath.
Okay that's in the Splunk Python SDK, which is already in use for the jsonformat
command and eventually will be used for jmespath
(hopefully before the 2.0 release). Under the covers the mode of operation is negotiated at runtime for "chunked" search commands (between splunk and external search commands), and therefore very little definition ends up in commands.conf
. Note that distributed
defaults to True
.
I haven't done any official performance comparisons. For most of the searches where I use it, it's because spath
can't get the job done, or it would take a half-dozen SPL command to do the equivalent of what a single jmespath expression can do. In those cases, any performance hit becomes effectively irrelevant to me. Of course performance can't suck. In practice, I haven't seen jmespath
become the performance bottleneck.
Can you clarify. Is this an academic concern or have you tried it and run into performance issues?
Do you have very demanding performance requirements?
from jmespath.
Right now I'm just trying to satisfy my curiosity :)
from jmespath.
My experience has been that it's typically fast enough. Use cases where super high performance is necessary typically isn't well suited for Splunk (or any other big data platform) in the first place.
If you use it and find that things are slower than you'd expect, please reach back out, and we'll see what can be done. Most often, there are ways to restructure a search to make it much faster (because most often, jmespath
doesn't end up being the bottle neck. It's typically orders of magnitude faster than say, the time it takes to pull the raw events off disk.)
I'm going to go ahead and close this. BTW, a great resource for general (or highly-specific) Splunk performance questions is Splunk User Slack channel or Splunk Answers. (I can be found on both.)
from jmespath.
Related Issues (20)
- Run code analysis
- Get Splunk Cloud supported HOT 1
- Add multiline text to array functionality (Make large text blobs more readable)
- Add unittest for all custom functions
- Add integration tests
- Return null as empty string
- Add minimize feature to jsonformat
- Add a wiki page on custom functions
- Add support for building new JSON events with jmespath HOT 6
- jsonformat: Only set the 'linecount' if already present
- jsonformat fails with inverse time order
- jmespath path syntax error sometime results in error message on "_raw"
- Support quote on the output field HOT 2
- Python repr() encoding still sometimes slips out!
- Expand docs - How to extract fields with varying names (pattern)
- Build with Travis CI HOT 1
- Results of function items(@) cannot be manipulated further
- Splunk python SDK no longer supported in cloud HOT 3
- Use the Splunk Python SDK command interface HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jmespath.