When trying to request "huge" samples it will cause a HTTP 500 - Internal Error. <

HTTP 500 Error - Internal Error for huge samples about threatexchange HOT 12 CLOSED

facebook commented on July 27, 2024

HTTP 500 Error - Internal Error for huge samples

from threatexchange.

Comments (12)

jessek commented on July 27, 2024

These malware samples are indeed huge, tens of megabytes each. Our current code can't handle creating a JSON blob that large and is failing.

from threatexchange.

theCatWisel commented on July 27, 2024

Just as an idea for now (temporarily):

How about to automatically remove the sample field form the response if the sample size is over XX MB (on server side)?
I mean just until we find a way to deliver bigger file size samples.
This would prevent to receive HTTP 500 errors on the client side.

from threatexchange.

jessek commented on July 27, 2024

We could remove the field, or replace it with an error message. For example, for a request with

fields=status,sample

We could return either:

{
"status": "MALICIOUS",
"sample": "This sample is too large to return",
"id": "1234456789"
}

{
"status": "MALICIOUS",    
"id": "1234456789"
}

Personally, I would favor the former. My concern is that people will assume the field is present in the data being returned. Silently omitting it could lead to problems. What do you think?

from threatexchange.

mgoffin commented on July 27, 2024

I would vote for not returning the sample field at all if a sample's size is above a certain threshold. If there's no way to download it, an error message is about as useful as the sample just not being available from a programmatic perspective.

Currently there's no "size" field returned by malware objects, but if there was one it would at least provide some context as to how large the sample is. It would even allow you to note in the documentation that samples above a specific size will not be available for download (at least until there's a way to do so?).

from threatexchange.

jessek commented on July 27, 2024

We can certainly add a field for the sample size. Should this be the size of the ZIP file which you'll download, or the actual file size. The former will be used for the cutoff of what can be downloaded, but I suspect the latter would be more useful during analysis. Thoughts?

from threatexchange.

mgoffin commented on July 27, 2024

Hmm, that's a really good question. Both of those would be useful, the latter more-so probably from an analyst's perspective and someone looking to pull down metadata about the sample. I'd probably vote for the latter myself and maybe just document the ZIP size threshold so people are aware. It's impossible to relate uncompressed size to compressed size with the different data types and content.

from threatexchange.

jessek commented on July 27, 2024

How about a compromise? Fields for sample_size and sample_size_compressed? We'd put in the documentation "If the compressed sample size is larger than 25MB, the sample field will be omitted."

from threatexchange.

mgoffin commented on July 27, 2024

That would work for me!

from threatexchange.

theCatWisel commented on July 27, 2024

Sounds like a good solution and would work for me too. :-)

from threatexchange.

jessek commented on July 27, 2024

The compromise solution is now live! Using the example from the top of the issue, I ran a query just now for:

/1068651733168127?fields=md5,sample,sample_size,sample_size_compressed

and got something like the following (actual values obfuscated):

{
    "md5": "3269e9fde81f7ea4e538ba595f77f52f",
    "sample_size": 71777777,
    "sample_size_compressed": 71755555,
    "id": "1068651733168127"
}

If this works for you, please close out the issue.

from threatexchange.

mgoffin commented on July 27, 2024

Nice! I'll add that to pytx :)

On Wednesday, December 30, 2015, Jesse Kornblum [email protected]
wrote:

The compromise solution is now live! Using the example from the top of the
issue, I ran a query just now for:

/1068651733168127?fields=md5,sample,sample_size,sample_size_compressed

and got something like the following (actual values obfuscated):

{
"md5": "3269e9fde81f7ea4e538ba595f77f52f",
"sample_size": 71777777,
"sample_size_compressed": 71755555,
"id": "1068651733168127"
}

If this works for you, please close out the issue.

—
Reply to this email directly or view it on GitHub
#99 (comment)
.

from threatexchange.

theCatWisel commented on July 27, 2024

Nice! Thx a lot!

from threatexchange.

HTTP 500 Error - Internal Error for huge samples about threatexchange HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs