kintyre / jmespath Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 1.0 304 KB

JMESPath app for Splunk

Home Page: https://splunkbase.splunk.com/app/3237/

Python 91.85% Shell 8.15%

splunk-application splunk jmespath json

jmespath's People

Contributors

Stargazers

Watchers

Forkers

aserpi

jmespath's Issues

Make jmsepath arguments mirror spath (not xpath)

Make the arguments more closely mirror the spath arguments instead of the xpath arguments. They simply make more sense:

input input field. Defaults to _raw.
output output field name. Defaults to expanding all output variables in the top-level field namespace.
path the path or expression to extract. The term path=* doesn't need to be literally present, work just as well with no name. (This may be difficult to get right using the new Splunk SDK / chunked python interface where argument parsing is different (not sure if raw mode is available.)

Return null as empty string

Return a non-object as an empty string rather than as the Python-centric "None" string. This is in better alignment with how spath handles empty results.

Splunk python SDK no longer supported in cloud

Can you please update the splunk python SDK to 1.6.16 or higher for cloud compatibility

Add a wiki page on custom functions

Support unrolling lists of objects in the output

List of objects should result in unrolling the of the list into multiple output events. Essentially, this is replacing the need for a secondary mvexpand command.

So if the jmespath expression results in

[
   { "A" : 4, "B": 9 },
   { "A" : 1, "B": 2 }
]

Then the input result/event should be cloned and returned twice. The first time with A=4, B=9 and the second event should have A=1,B=2.

Without this, a list JSON strings could be returned, which then requires mvexpand followed by another spath.

Expand docs - How to extract fields with varying names (pattern)

Add an example to the docs that highlights this search pattern:

 | jmespath output=NewDisplayName "ExtendedProperties[?Name=='targetUpdatedProperties'].Value | from_string(@[0]) | [?ends_with(Name,'.DisplayName')].NewValue"

Thanks to @jpapasadora

MS O365 Management.

Search something like: sourcetype=ms:o365:management member (group OR role)

Implement custom function for parsing embedded JSON (nested JSON docs)

Come up with a mechanism that allows a nested JSON document to be parsed as a custom function within the JMSEPath language.

Essentially this would just call json.loads().

If this fails, then just assume the object wasn't a JSON string and return the original value unmodified. That seems like a nice safety mechanism in case the there's a mixture of JSON strings and other strings that have to be processed at the same time.

Add integration tests

Add integration tests in the form of either (1) wrapper calls to the SPL command executable (at the "chunked" protocol level, or (2) full end-to-end test in the form of run-anywhere SPL command with an expected output.

Simulate splunk interface

PRO: Better "unittest" like concept; modular.
PRO: Allows for easier automated testing unittest framework and automated testing with Travis.
It may be possible to avoid low-level chunked encoding stuff by using mock.

Test it in Splunk

PRO: A full end-to-end test may catch additional corner cases.
CON: More overhead
CON: Requires a fully running Splunk instance, and authentication to connect via the SDK (REST endpoints)
Alternatively, could be implemented as a dashboard or something like that where all the test run automatically whenever the dashboard is loaded. Seems like this could be painful to troubleshoot, but it would allow the entire test suite to be shipped with the product so that end-users could confirm behavior. (Seems complicated.)

Either way, it should be possible to borrow from the existing run-anywhere commands I put together in the docs.

I'm thinking 1 may be easier.

Support quote on the output field

Support a command like this:

sourcetype="ms:o365:management AzureActiveDirectory | jmespath output="Actor.Type[*]" "unroll(Actor, 'Type', 'ID')"

Right now this works, but it creates a field with double quotes around it. DOH! We also want to be able to support fields that could contain a whitespace character (event though that's bad form).

The current workaround is follow this up with an extra rename operation, which shouldn't be necessary.

Add unittest for all custom functions

Distributable

Can jmespath be distributed across search heads?

Add support for custom functions

Use json output not python repr format

Ensure that any structured output is returned in real JSON format. Right now some stuff leaks through in Python representation format (probably due to explicit str() calls). While the Python format is nearly JSON, it's enough of a different to cause real pain for any subsequent processing.

Single values can and should be returned as-is in their raw value (string/number/bool). Ensure there's no extra quoting imposed (unicode to str conversion)

Use the Splunk Python SDK command interface

Switched to using the new "chunked encoding" interface and re-implement the search commands in a more forward-looking way. Note that this will remove support prior to Splunk 6.3, which probably isn't a big deal at this point.

One concern with this is how dynamic fields are dealt with in the StreamingCommand, for example. If a field doesn't exist on the first iteration, then it's omitted from the output entirely. I'm not sure what this is a function of, but this doesn't seem to be a problem with the old-school InterSplunk approach.

Add support for building new JSON events with jmespath

Consider allowing input= to be a wildcarded field as well. All of the field(s) that match would be passed in as top-level keys.

So if you had a single event with fields like this:

fields	value
_raw	....
source	...
rec.name	Joe
rec.kids	Janet
	Greg
	Bob
rec.age	45

Could could run a command like so:

... | jmespath input=rec.* output=rec "{Name:name, Children:kids, Demographics:{age:to_number(age)}}"

And the output value for rec would looks something like:

{ "Name" : "Joe",
   "Children" : [ "Janet", "Greg", "Bob" ],
  "Demographics": { "age": 45 }
}

If one of the rec.* fields already contains a JSON string, then the from_string() function can be used to convert and, if necessary, further manipulate the record.

BTW: I'm not even sure the syntax of the JMESPath example is legit. Good luck future self!

Add minimize feature to jsonformat

I'd like jsonformat to also be able to provide minimized json output. (more like an unformat, but a new command doesn't seem necessary). Additionally, it may be helpful if the output could escape it (so it's usable as a run-anywhere command ready to be dropped into a makeresults | eval _raw="...." search.

Support best-effort JSON parser

Add best-effort JSON parsing support to the nested JSON parsing function. It looks like the ijson package may do this, but I haven't tested it. The use case for this functionality is the fact that some nested JSON payloads appear to be subjected to size constraints.

This happens in MS Office 365 Management logs, for example. We should parse whatever we can (best effort) so that some of the data can be retrieved, instead of discarding all of it. I think this lines up with spaths general best-effort approach.

jsonformat: Only set the 'linecount' if already present

Don't update/set the linecount when updating _raw unless the field is already contains a numeric value. If not, don't create it. (Avoid creating the

For example, if you have the search:

index=* b635172e-2cd9-45b7-adc8-4efb14d21901 | table _time _raw sourcetype source | jsonformat

We should haven't to explicitly then re-remove the lincount field. In other words, it makes sense to keep it when it's present, but otherwise don't get in the way.

jmespath path syntax error sometime results in error message on "_raw"

Doh! I can't seem to reproduce this. But I ran into this issue while passing an incorrect data type into a function (the unroll() function, if I'm not mistaken.) Not sure how _raw got modified in the process, but it seemed weird.

Keeping this as a placeholder for more investigation, for now.

jsonformat fails with inverse time order

This fails

... 
| eval _time=if(sourcetype like "amal%", strftime(time, "%Y-%m-%dT%T.%6N%Z"), _time)
| sort - _time
| jsonformat raw order=sort indent=2

with the error message:

Error in 'jsonformat' command: The external search command 'jsonformat' did not return events in descending time order, as expected.

Most likely just the wrong Configuration options.

Add multiline text to array functionality (Make large text blobs more readable)

Add ability to replace text strings containing newlines (aka \n) with an array of individual lines.

This could be added to jmespath as a simple split() function to complement the existing join() command. But I think the most helpful bits would be in the jsonformat command. This would make multiline text blobs much more readable, the purpose is less about programmatic manipulation.

Python repr() encoding still sometimes slips out!

Still seeing output like this:

{u'RememberDevicesNotIssuedBefore': u'2018-11-08T19:37:42.7363619Z', u'State': 1, u'RelyingParty': u'*'}

when that should have produced a proper JSON output.

Example search

| makeresults | eval _raw="{\"Target\":[{\"ID\":\"User_147b1c0b-5066-46e1-a0fe-12e7ca52494c\",\"Type\":2},{\"ID\":\"147b1c0b-5066-46e1-a0fe-12e7ca52494c\",\"Type\":2},{\"ID\":\"User\",\"Type\":2},{\"ID\":\"[email protected]\",\"Type\":5},{\"ID\":\"10037FFEAE8FE859\",\"Type\":3}],\"OrganizationId\":\"aca24088-bb0e-42c4-a9a0-0099c3f962f0\",\"RecordType\":8,\"ActorContextId\":\"aca24088-bb0e-42c4-a9a0-0099c3f962f0\",\"Workload\":\"AzureActiveDirectory\",\"ExtendedProperties\":[{\"Value\":\"Success\",\"Name\":\"resultType\"},{\"Value\":\"UserManagement\",\"Name\":\"auditEventCategory\"},{\"Value\":\"<null>\",\"Name\":\"nCloud\"},{\"Value\":\"aca24088-bb0e-42c4-a9a0-0099c3f962f0\",\"Name\":\"actorContextId\"},{\"Value\":\"b3a0e779-9348-492e-8864-3dffd59e1be8\",\"Name\":\"actorObjectId\"},{\"Value\":\"User\",\"Name\":\"actorObjectClass\"},{\"Value\":\"[email protected]\",\"Name\":\"actorUPN\"},{\"Value\":\"1003BFFDAEA5811F\",\"Name\":\"actorPUID\"},{\"Value\":\"MSODS.\",\"Name\":\"teamName\"},{\"Value\":\"aca24088-bb0e-42c4-a9a0-0099c3f962f0\",\"Name\":\"targetContextId\"},{\"Value\":\"147b1c0b-5066-46e1-a0fe-12e7ca52494c\",\"Name\":\"targetObjectId\"},{\"Value\":\"User\",\"Name\":\"extendedAuditEventCategory\"},{\"Value\":\"[email protected]\",\"Name\":\"targetUPN\"},{\"Value\":\"10037FFEAE8FE859\",\"Name\":\"targetPUID\"},{\"Value\":\"[\\\"StrongAuthenticationRequirement\\\",\\\"TargetId.UserType\\\"]\",\"Name\":\"targetIncludedUpdatedProperties\"},{\"Value\":\"[{\\\"Name\\\":\\\"StrongAuthenticationRequirement\\\",\\\"OldValue\\\":[],\\\"NewValue\\\":[{\\\"RelyingParty\\\":\\\"*\\\",\\\"State\\\":1,\\\"RememberDevicesNotIssuedBefore\\\":\\\"2018-11-08T19:37:42.7363619Z\\\"}]},{\\\"Name\\\":\\\"Included Updated Properties\\\",\\\"OldValue\\\":null,\\\"NewValue\\\":\\\"StrongAuthenticationRequirement\\\"},{\\\"Name\\\":\\\"TargetId.UserType\\\",\\\"OldValue\\\":null,\\\"NewValue\\\":\\\"Member\\\"}]\",\"Name\":\"targetUpdatedProperties\"},{\"Value\":\"e180c62a-db41-43b9-ae6c-de0857210130\",\"Name\":\"correlationId\"},{\"Value\":\"2\",\"Name\":\"version\"},{\"Value\":\"{\\\"UserType\\\":\\\"Member\\\"}\",\"Name\":\"additionalDetails\"},{\"Value\":\"2.1\",\"Name\":\"env_ver\"},{\"Value\":\"#Ifx.AuditSchema#IfxMsods.AuditCommonEvent\",\"Name\":\"env_name\"},{\"Value\":\"2018-11-08T19:37:43.1158911Z\",\"Name\":\"env_time\"},{\"Value\":\"32YYA\",\"Name\":\"env_epoch\"},{\"Value\":\"61578266\",\"Name\":\"env_seqNum\"},{\"Value\":\"0\",\"Name\":\"env_popSample\"},{\"Value\":\"ikey\",\"Name\":\"env_iKey\"},{\"Value\":\"257\",\"Name\":\"env_flags\"},{\"Value\":\"##00000000-0000-0000-0000-000000000000_00000000-0000-0000-0000-000000000000_00000000-0000-0000-0000-000000000000\",\"Name\":\"env_cv\"},{\"Value\":\"<null>\",\"Name\":\"env_os\"},{\"Value\":\"<null>\",\"Name\":\"env_osVer\"},{\"Value\":\"becwebservice\",\"Name\":\"env_appId\"},{\"Value\":\"1.0.10571.2\",\"Name\":\"env_appVer\"},{\"Value\":\"1.0\",\"Name\":\"env_cloud_ver\"},{\"Value\":\"MSO-BL2\",\"Name\":\"env_cloud_name\"},{\"Value\":\"becwebservice\",\"Name\":\"env_cloud_role\"},{\"Value\":\"1.0.10571.2\",\"Name\":\"env_cloud_roleVer\"},{\"Value\":\"BL2BWSR573\",\"Name\":\"env_cloud_roleInstance\"},{\"Value\":\"PROD\",\"Name\":\"env_cloud_environment\"},{\"Value\":\"R5\",\"Name\":\"env_cloud_deploymentUnit\"}],\"UserId\":\"[email protected]\",\"UserType\":0,\"TargetContextId\":\"aca24088-bb0e-42c4-a9a0-0099c3f962f0\",\"ResultStatus\":\"Success\",\"ObjectId\":\"[email protected]\",\"Version\":1,\"ActorIpAddress\":\"<null>\",\"AzureActiveDirectoryEventType\":1,\"Operation\":\"Update user.\",\"Actor\":[{\"ID\":\"[email protected]\",\"Type\":5},{\"ID\":\"1003BFFDAEA5811F\",\"Type\":3},{\"ID\":\"User_b3a0e779-9348-492e-8864-3dffd59e1be8\",\"Type\":2},{\"ID\":\"b3a0e779-9348-492e-8864-3dffd59e1be8\",\"Type\":2},{\"ID\":\"User\",\"Type\":2}],\"Id\":\"28a59b0b-1792-4bee-bcc7-a48ad68302a3\",\"ClientIP\":\"<null>\",\"CreationTime\":\"2018-11-08T19:37:43\",\"UserKey\":\"[email protected]\"}"
| jmespath output=ExtendedProperties.targetUpdatedProperties.* "ExtendedProperties[?Name=='targetUpdatedProperties'].Value|from_string(@[0])|[0]"
| table ExtendedProperties.targetUpdatedProperties.NewValue

Implement custom function for unrolling list of Name/Value pair objects

Unroll lists of Name/Value pairs and turn them back into dicts/objects. (Rather than using xyseries key Name Value" or eval {Name}=Value tricks.)

Take data that looks like this:

{
    "ExtendedProperties": [
        {
            "Name": "resultType",
            "Value": "Success"
        },
        {
            "Name": "auditEventCategory",
            "Value": "ApplicationManagement"
        },
        {
            "Name": "nCloud",
            "Value": "<null>"
        },
        {
            "Name": "actorContextId",
            "Value": "acaxxxxx-bb0e-42c4-xxxx-xxxxxxxxxxxx"
        },
        {
            "Name": "actorObjectId",
            "Value": "81bxxx08-123e-4a45-xxxx-xxxxxxxxxxxx"
        },
        {
            "Name": "actorObjectClass",
            "Value": "User"
        },
        {
            "Name": "actorUPN",
            "Value": "[email protected]"
        }, 
   ]
}

And we want to instead map that to an object that looks like this:

{ 
    "resultType" : "Success",
    "auditEventCategory":  "ApplicationManagement",
    "nCloud" : "<null>",
    "actorContextId" : "acaxxxxx-bb0e-42c4-xxxx-xxxxxxxxxxxx",
    "actorObjectId": "81bxxx08-123e-4a45-xxxx-xxxxxxxxxxxx",
    "actorObjectClass": "User",
    "actorUPN" : "[email protected]"}
}

And from a splunk side of things, it would be helpful if we could assign these all to a common prefix.
So for example, if the prefix given was ExtProp. then the field names in Splunk would be ExtProp.resultType, ExtProp.auditEventCategory, ExtProp.nCloud, ... and so on.

May need to add some kind of field name sanitization to this as well, to prevent spaces and other weird characters from slipping though.

This entire approach would hide values if the name Name was given twice, but I think that's an acceptable risk. Know your data.

Get Splunk Cloud supported

Figure out what steps are necessary to get Cloud vetted.

Enable more dynamic output for objects

Just like spath if no output is given, any object output should be expanded into the current result. So if the jmespath expression results in { "A" : 1, "B": 2 }, then new fields named A and B should be added and populated as fields.

If the user specifies the output variable, then any structured output should be dump as a JSON string.

Build with Travis CI

Results of function items(@) cannot be manipulated further

Description

Results of function items(@) cannot be manipulated further: the output is of the wrong type.

Example

The search

| makeresults
| eval json="{\"key_1\": {\"key_a\": 1, \"key_b\": 1}, \"key_2\": {\"key_a\": 2, \"key_b\": 2}}"
| jmespath input=json output=result "items(@)[*][1].key_a"

should create the multivalue field result with values 1 and 2. Instead, the field is null.

Cause

Implementation of items(@) returns a list of tuples, which are not recognized by JMESPath as arrays. See jmespath/jmespath.py#179.

Fix

Replace the return expression in file bin/jpath.py, line 22 with [list(item) for item in h.items()].

Run code analysis

Run some code analysis tools. I'm sure there's a bunch of typos, unneeded imports, and so on.