GithubHelp home page GithubHelp logo

json-sanitizer's Introduction

json-sanitizer Fuzzing Status

Given JSON-like content, The JSON Sanitizer converts it to valid JSON.

Getting Started - Contact

This can be attached at either end of a data-pipeline to help satisfy Postel's principle:

be conservative in what you do, be liberal in what you accept from others

Applied to JSON-like content from others, it will produce well-formed JSON that should satisfy any parser you use.

Applied to your output before you send, it will coerce minor mistakes in encoding and make it easier to embed your JSON in HTML and XML.

Motivation

Architecture

Many applications have large amounts of code that uses ad-hoc methods to generate JSON outputs.

Frequently these outputs all pass through a small amount of framework code before being sent over the network. This small amount of framework code can use this library to make sure that the ad-hoc outputs are standards compliant and safe to pass to (overly) powerful deserializers like Javascript's eval operator.

Applications also often have web service APIs that receive JSON from a variety of sources. When this JSON is created using ad-hoc methods, this library can massage it into a form that is easy to parse.

By hooking this library into the code that sends and receives requests and responses, this library can help software architects ensure system-wide security and well-formedness guarantees.

Input

The sanitizer takes JSON like content, and interprets it as JS eval would. Specifically, it deals with these non-standard constructs.

Construct Policy
'...' Single quoted strings are converted to JSON strings.
\xAB Hex escapes are converted to JSON unicode escapes.
\012 Octal escapes are converted to JSON unicode escapes.
0xAB Hex integer literals are converted to JSON decimal numbers.
012 Octal integer literals are converted to JSON decimal numbers.
+.5 Decimal numbers are coerced to JSON's stricter format.
[0,,2] Elisions in arrays are filled with null.
[1,2,3,] Trailing commas are removed.
{foo:"bar"} Unquoted property names are quoted.
//comments JS style line and block comments are removed.
(...) Grouping parentheses are removed.

The sanitizer fixes missing punctuation, end quotes, and mismatched or missing close brackets. If an input contains only white-space then the valid JSON string null is substituted.

Output

The output is well-formed JSON as defined by RFC 4627. The output satisfies these additional properties:

  • The output will not contain the substrings (case-insensitively) "<script", "</script" or "<!--" and can thus be embedded inside an HTML script element without further encoding.
  • The output will not contain the substring "]]>" and can thus be embedded inside an XML CDATA section without further encoding.
  • The output is a valid Javascript expression, so can be parsed by Javascript's eval builtin (after being wrapped in parentheses) or by JSON.parse. Specifically, the output will not contain any string literals with embedded JS newlines (U+2028 Paragraph separator or U+2029 Line separator).
  • The output contains only valid Unicode scalar values (no isolated UTF-16 surrogates) that are allowed in XML unescaped.

Security

Since the output is well-formed JSON, passing it to eval will have no side-effects and no free variables, so is neither a code-injection vector, nor a vector for exfiltration of secrets.

This library only ensures that the JSON string → Javascript object phase has no side effects and resolves no free variables, and cannot control how other client side code later interprets the resulting Javascript object. So if client-side code takes a part of the parsed data that is controlled by an attacker and passes it back through a powerful interpreter like eval or innerHTML then that client-side code might suffer unintended side-effects.

var myValue = eval(sanitizedJsonString);  // safe
var myEmbeddedValue = eval(myValue.foo);  // possibly unsafe

Additionally, sanitizing JSON cannot protect an application from Confused Deputy attacks

var myValue = JSON.parse(sanitizedJsonString);
addToAdminstratorsGroup(myValue.propertyFromUntrustedSource);

Performance

The sanitize method will return the input string without allocating a new buffer when the input is already valid JSON that satisfies the properties above. Thus, if used on input that is usually well formed, it has minimal memory overhead.

The sanitize method takes O(n) time where n is the length of the input in UTF-16 code-units.

json-sanitizer's People

Contributors

dependabot[bot] avatar fabianhenneke avatar fmeum avatar fsiddiqi avatar hazendaz avatar jmanico avatar mikesamuel avatar nowheremanmail avatar roland-ewald avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

json-sanitizer's Issues

If I have a JS comment all content is destroyed

I have been looking for a few days how to heal my JSON files, until I saw that this library existed, I was doing some tests to see how it worked, and well I was quite scared to see that if I had a JS-style comment like this // Hello, I am a comment everything after it disappears, is that normal behavior?

For example my input json

[
    {
        "id": {
            "key": "value",
            "key": "value",
            "key": "value",
            "key": "value",		
			//Ey what's up
        },		
            "key": "value",
            "key": "value",
            "key": "value",
            "key": "value",
    },


... More keys below....

Output json:

[
    {
        "id": {
            "key": "value",
            "key": "value",
            "key": "value",
            "key": "value",
        }
    }
]

Everything below the comment disappears.

Edit: I am using the last version, 1.2.1

Issue with documentation/running

I am looking for exactly this tool. However, my background is ruby, not js.  

Could anyone provide at least what libraries are in here? My interpreter can't 
even load a main class.

What steps will reproduce the problem?
1. Be newb to js
2. try to use this in any way
3. end up with Error: Could not find or load main class 
json-sanitizer-2012-10-17.jar

Please provide any additional information below.
Sorry to be pain, just nothing telling me what the classes are makes it quite 
difficult to call them.

Original issue reported on code.google.com by [email protected] on 16 Sep 2013 at 3:17

  • Merged into: #1

sanitizer strips part of the value when the value contains a forward slash

What steps will reproduce the problem?
1.add following line to JsonSanitizerTest: assertSanitized("dev/comment", 
"dev/comment");
2.run the test


What is the expected output? What do you see instead?
Instead of the expected output "dev/comment", we get "dev". 

What version of the product are you using? On what operating system?
latest git version, Windows 7


Original issue reported on code.google.com by [email protected] on 30 Jan 2015 at 10:14

Allow JSON with greater nesting depth than 64

Currently, the nesting depth for arrays and maps combined is limited to 64. RFC4627 allows the maximum nesting depth to be implementation-specific, so this is no bug.

In some cases, however, the current limit may not be enough, so that a greater maximum depth should be configurable.

Should the reserved keywords expand for JDK8 or later?

JDK8 or later version has more keywords like 'try-with-resources', 'goto', 'volatile'.

private static final String[][] RESERVED_KEYWORDS = {
    {},
    {},
    {"do", "if", "in", },
    {"for", "let", "new", "try", "var"},
    {"case", "else", "enum", "eval", "null", "this", "true", "void", "with"},
    {"catch", "class", "const", "false", "super", "throw", "while", "yield"},
    {"delete", "export", "import", "return", "switch", "static", "typeof"},
    {"default", "extends", "public", "private"},
    {"continue", "function"},
    {"arguments"},
    {"implements", "instanceof"}
  };

Reference: [JDK 8 keywords] (https://docs.oracle.com/javase/tutorial/java/nutsandbolts/_keywords.html)

JsonSanitizer removes more than expected to because of strict decimal format

I've added the JsonSanitizer to my project. I am trying to parse a file name which is named with various '.','-' and numbers as well as text. The sanitizer chops it way more than I intended, for instance 2.0-237.0 becomes 2.0.

Is there any way to fix it or make the sanitizer work on such input as I expect? (I assume the sanitizer "thinks" its a decimal number of some sort).

Thank you!

Is project still alive and if so, can this be fixed?

There haven't been any commits for a long time so not sure if this is still alive. If it is, the JSR305 needs to be phsyically defined to a version instead of attempting to look for newer version. That look is bad practace and it does cause occassional failures due to network issues. I'm more than willing to fix it as it rarely ever changes but wanted to find out if this project is still alive as I'd want to do a lot of maven cleanup on top of that and get a release soon.

JsonSanitizer.sanitize(jsonString) usage

I am confused this method usage, if jsonString is an invalid json, this method will throw a RuntimeException? But from the source code, it seems not. So what will be return for an invalid json? BTW, another question is this API can prevent json injection?

Thanks

Issue with documentation/running

I am looking for exactly this tool. However, my background is ruby, not js.  

Could anyone provide at least what libraries are in here? My interpreter can't 
even load a main class.

What steps will reproduce the problem?
1. Be newb to js
2. try to use this in any way
3. end up with Error: Could not find or load main class 
json-sanitizer-2012-10-17.jar

Please provide any additional information below.
Sorry to be pain, just nothing telling me what the classes are makes it quite 
difficult to call them.

Original issue reported on code.google.com by [email protected] on 16 Sep 2013 at 3:17

JsonSanitizer throws exceptions and errors on malformed input

What steps will reproduce the problem?

{{{
JsonSanitizer.sanitize("[{{},ä")

java.lang.StringIndexOutOfBoundsException: String index out of range: 7
  at java.lang.String.charAt(String.java:658)
  at com.google.json.JsonSanitizer.elideTrailingComma(JsonSanitizer.java:650)
  at com.google.json.JsonSanitizer.sanitize(JsonSanitizer.java:398)
  at com.google.json.JsonSanitizer.sanitize(JsonSanitizer.java:96)
}}}

{{{
JsonSanitizer.sanitize("[{{ää},ä");

java.lang.AssertionError: ä
  at com.google.json.JsonSanitizer.elideTrailingComma(JsonSanitizer.java:656)
  at com.google.json.JsonSanitizer.sanitize(JsonSanitizer.java:398)
  at com.google.json.JsonSanitizer.sanitize(JsonSanitizer.java:96)
}}}

What version of the product are you using?

* The one from 2012-10-17.

Original issue reported on code.google.com by [email protected] on 12 Jun 2014 at 10:23

Support java.io.InputStream

It is a bit limiting to force the JSON to be passed as a String especially for use in cases where the message size/volume may be large. The JAXRS MessageBodyReader API provides an InputStream. Is there a plan to extend the API to accept and return an InputStream so that we can maintain optimised access to the data as a stream?

pound symbol is eliminating the closing brackets

I have response object with a description field. when there is a pound symbol in the description, its eliminating the equal number of characters from the response ending. Due to this closing curly braces are being removed, thus causing distorted json structure.
This is happening when we are using json-sanitizer 1.1 jar.

Any suggestions to avoid this?

Sanitizer not removing <script> from input

Hi guys,

the readme page mentions that all <script> tags should be removed from the input JSON, but this does not happen in my test:

My input is following json {"name":"Fotofeld","data":{"fieldName":"<script>alert(1);</script>"}}
which gets converted into {"name":"Fotofeld","data":{"fieldName":"\u003cscript>alert(1);\u003c/script>"}}, which in my case gets saved to database, and then when I return the "fieldName", the value is back as if without the sanitization..
The problem is that the result of the sanitization gets tagged by the OWASP ZAP..
Screenshot from debugger included
Screenshot from 2021-02-15 17-37-48

Support CharSequence Sanitisation

So other JSON parsing libraries can be used support sanitisation of both whole JSON and individual values via java.lang.CharSequence.

Examples:

javax.json.JsonString node = ...
CharSequence sanitised = JsonSanitizer.sanitiseValue(node.getChars());

For situations such as a StringBuilder or StringBuffer it could also have the option to not create a new instance and edit in place to avoid GC churn related to String creation.

Two examples - shouldn't it sanitize this?

When we run these examples through the jsonsanitizer this is what happens. I was under the impression this sanitizer should eliminate these types of XSS attacks? Or do you presume the JSON is broken down into key/value pairs and input validated/output encoded on a field by field basis? That would not scale very well for performance so was hoping jsonsanitizer would work.

public void testSVGAttack() throws Exception {
String json = "{test: "<svg/onload=alert(/XSS Owned/)>"}";
String clean = JsonSanitizer.sanitize(json);
System.out.println(clean);
}

input: {"test": "<svg/onload=alert(/XSS Owned/)>"}

output: {"test": "<svg/onload=alert(/XSS Owned/)>"}

and then there is this........

String json = "{"test": "MDM%3c%73%43%72%49%70%54%20%74%59%70%45%3d%74%45%78%54%2f

%76%42%73%43%72%49%70%54%3e%4d%73%67%42%6f%78%28%31%32%38%31%35%29%3c%2f

%73%43%72%49%70%54%3e"}";

input === output

unencoded that is

{"test": "MDM<sCrIpT tYpE=tExT/vBsCrIpT>MsgBox(12815)</sCrIpT>"}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.