GithubHelp home page GithubHelp logo

Comments (12)

jessek avatar jessek commented on May 5, 2024

These malware samples are indeed huge, tens of megabytes each. Our current code can't handle creating a JSON blob that large and is failing.

from threatexchange.

theCatWisel avatar theCatWisel commented on May 5, 2024

Just as an idea for now (temporarily):

How about to automatically remove the sample field form the response if the sample size is over XX MB (on server side)?
I mean just until we find a way to deliver bigger file size samples.
This would prevent to receive HTTP 500 errors on the client side.

from threatexchange.

jessek avatar jessek commented on May 5, 2024

We could remove the field, or replace it with an error message. For example, for a request with

fields=status,sample

We could return either:

{
"status": "MALICIOUS",
"sample": "This sample is too large to return",
"id": "1234456789"
}

or

{
"status": "MALICIOUS",    
"id": "1234456789"
}

Personally, I would favor the former. My concern is that people will assume the field is present in the data being returned. Silently omitting it could lead to problems. What do you think?

from threatexchange.

mgoffin avatar mgoffin commented on May 5, 2024

I would vote for not returning the sample field at all if a sample's size is above a certain threshold. If there's no way to download it, an error message is about as useful as the sample just not being available from a programmatic perspective.

Currently there's no "size" field returned by malware objects, but if there was one it would at least provide some context as to how large the sample is. It would even allow you to note in the documentation that samples above a specific size will not be available for download (at least until there's a way to do so?).

from threatexchange.

jessek avatar jessek commented on May 5, 2024

We can certainly add a field for the sample size. Should this be the size of the ZIP file which you'll download, or the actual file size. The former will be used for the cutoff of what can be downloaded, but I suspect the latter would be more useful during analysis. Thoughts?

from threatexchange.

mgoffin avatar mgoffin commented on May 5, 2024

Hmm, that's a really good question. Both of those would be useful, the latter more-so probably from an analyst's perspective and someone looking to pull down metadata about the sample. I'd probably vote for the latter myself and maybe just document the ZIP size threshold so people are aware. It's impossible to relate uncompressed size to compressed size with the different data types and content.

from threatexchange.

jessek avatar jessek commented on May 5, 2024

How about a compromise? Fields for sample_size and sample_size_compressed? We'd put in the documentation "If the compressed sample size is larger than 25MB, the sample field will be omitted."

from threatexchange.

mgoffin avatar mgoffin commented on May 5, 2024

That would work for me!

from threatexchange.

theCatWisel avatar theCatWisel commented on May 5, 2024

Sounds like a good solution and would work for me too. :-)

from threatexchange.

jessek avatar jessek commented on May 5, 2024

The compromise solution is now live! Using the example from the top of the issue, I ran a query just now for:

/1068651733168127?fields=md5,sample,sample_size,sample_size_compressed

and got something like the following (actual values obfuscated):

{
    "md5": "3269e9fde81f7ea4e538ba595f77f52f",
    "sample_size": 71777777,
    "sample_size_compressed": 71755555,
    "id": "1068651733168127"
}

If this works for you, please close out the issue.

from threatexchange.

mgoffin avatar mgoffin commented on May 5, 2024

Nice! I'll add that to pytx :)

On Wednesday, December 30, 2015, Jesse Kornblum [email protected]
wrote:

The compromise solution is now live! Using the example from the top of the
issue, I ran a query just now for:

/1068651733168127?fields=md5,sample,sample_size,sample_size_compressed

and got something like the following (actual values obfuscated):

{
"md5": "3269e9fde81f7ea4e538ba595f77f52f",
"sample_size": 71777777,
"sample_size_compressed": 71755555,
"id": "1068651733168127"
}

If this works for you, please close out the issue.


Reply to this email directly or view it on GitHub
#99 (comment)
.

from threatexchange.

theCatWisel avatar theCatWisel commented on May 5, 2024

Nice! Thx a lot!

from threatexchange.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.