GithubHelp home page GithubHelp logo

Comments (5)

gtback avatar gtback commented on August 20, 2024

This issue refers to ways to encode data that is not valid in XML, such as raw binary data or data that uses certain low-value control-code Unicode code points. In general, this should probably be Base64 encoded and enclosed in CDATA blocks [UPDATE: the CDATA doesn't matter if it has been base64-encoded]. The solution for this may also solve #25.

Issue #134 is a distinct, but somewhat related issue.

EDIT: a CDATA block is not necessary for Base64-encoded data

from schemas.

gtback avatar gtback commented on August 20, 2024

The more I think about this, the more I am of the opinion that no solution will cover 100% of edge cases. Unicode string data (regardless of encoding) is a proper subset of binary data, and certain cases we want to support storing non-Unicode binary data in a StringObjectPropertyType. Though this is not "allowed" by many standards/protocols, it is exactly these violations of expectations which would be valuable to model in CybOX.

It may be useful to relax the "xs:string" restriction on StringObjectPropertyType, and also allow xs:hexBinary or xs:base64Binary, which would allow reuse of the "datatype" attribute to indicate how the "binary" string data is being represented.

Alternatively, we could recommend as a best practice that data which is "allegedly" UTF-8 or UTF-16, but is actually an invalid sequence in those encodings, instead be base64 encoded, and set "is_defanged" to true, with a "defanging_algorithm_ref" of "https://tools.ietf.org/html/rfc4648#section-4".

In either case, programs which consume CybOX content will not be able to reliably process this data using the string/unicode data types in those languages, even if we can "accurately" encode the data in XML. In my opinion the best solution is to "defang" the binary data into base64, setting the appropriate attributes. This requires a documentation change or addition to a "best practices" document, but no change to the schema itself.

from schemas.

ikiril01 avatar ikiril01 commented on August 20, 2024

Relaxing the "xs:string" restriction on StringObjectPropertyType might be a possible workaround, but it probably has a host of other implications (like determining what the actual/default datatype is when processing CybOX content) that aren't desirable.

I too like the defanging approach - it allows us to reuse the schema constructs that were already added for a very similar purpose. Besides on adding documentation to describe how to use the defanging attributes for this use cases, maybe we should also update the annotation on is_defanged to state that it can refer to defanged OR re-encoded data. Currently it reads:

This field is optional and conveys whether the associated Object property has been defanged (representation changed to prevent malicious effects of handling/processing).

Perhaps we can change it to something like (and defanging_algorithm_ref accordingly):

This field is optional and conveys whether the associated Object property has been defanged (representation changed to prevent malicious effects of handling/processing) or otherwise re-encoded from its original representation.

from schemas.

gtback avatar gtback commented on August 20, 2024

Agreed. Relaxing the restriction is not a change I would consider for the 2.1 release. I like updating the annotation, though I think we can simply remove the "malicious" or change it to "unintented".

from schemas.

gtback avatar gtback commented on August 20, 2024

After thinking more about this, changing "fixed" to "default" might be exactly what we need to do. There's a new proposal for this:

https://github.com/CybOXProject/schemas/wiki/Proposal:-Allow-different-datatypes-on-*ObjectPropertyTypes

from schemas.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.