GithubHelp home page GithubHelp logo

Comments (7)

mgoffin avatar mgoffin commented on May 6, 2024

When I set a limit of 1000, I can consistently get back the same number of results. If I set a limit of 25, then stuff similar to the above starts happening. In these cases I'm seeing situations where next is missing from the pager (usually indicating you've reached the end of your results) or the pager section is missing as well (need to dig into this in case there's a weird result being returned).

from threatexchange.

RyPeck avatar RyPeck commented on May 6, 2024

What happens if you specify the since and until parameters?

from threatexchange.

puhley avatar puhley commented on May 6, 2024

I have been testing the ThreatExchange APIs using both the /threat_indicators/get_indicators.py scripts and the pytx modules. Tonight, if I run "./get_indicators.py --type IP_ADDRESS -o foo.txt", then I get no results. If I run "./get_indicators.py --type IP_ADDRESS --limit 1000 -o foo.txt" then I get thousands of results. I am experiencing the same behavior using the pytx approach. A "ThreatIndicator.objects (type_='IP_ADDRESS', dict_generator=True)" call will not produce any results. If I add a limit parameter to the call then it will produce results. Using a different indicator type, such as swapping DOMAIN for IP_ADDRESS, does not affect the outcomes. You also need to specify limit parameters for DOMAIN searches in order to get results. Since both the pytx and get_indicator.py approaches make similar GET requests and have similar results, my assumption is that this may be on the Facebook side. Over the last few days, the behavior has varied. When I first tested using both approaches on the 30th and the 1st, I would get some results from both approaches without specifying a limit value. Although, similar to your experience, the number of results would vary between sequential runs.

from threatexchange.

jessek avatar jessek commented on May 6, 2024

We found and fixed a bug today which was impacting the number of search results on ThreatExchange endpoints. Some results were being hidden. The fix will go live this weekend and should be on 100% of the production servers on Monday morning.

Can you please try these tests again and see if the problem still exists? I'm happy to keep digging if it is, but I'm hoping we nailed it.

from threatexchange.

mgoffin avatar mgoffin commented on May 6, 2024

Will do!

from threatexchange.

mgoffin avatar mgoffin commented on May 6, 2024

I apologize for this taking so long, but here's the results using pytx!

# Looping over ThreatDescriptor objects with the text "facebook.com"

# Four runs setting "limit" to 1000
('time: ', 8.108134984970093)
('count: ', 4678)

('time: ', 7.476968050003052)
('count: ', 4678)

('time: ', 8.544885873794556)
('count: ', 4678)

('time: ', 8.227381944656372)
('count: ', 4678)

# Four runs not setting limit
('time: ', 127.16218400001526)
('count: ', 4678)

('time: ', 91.82453489303589)
('count: ', 4678)

('time: ', 101.06061697006226)
('count: ', 4678)

('time: ', 96.52747583389282)
('count: ', 4678)

The consistent result count is perfect (although no way to vet that it's the actual count I should be seeing). The runtime with no limit is a bit more sporadic but it seems to agree with our previous conclusion that the higher the limit we set the better the performance will be due to decreased API calls.

I don't know if anyone else is seeing the same improvement but I think the fix worked! Thanks for all of the hard work :)

from threatexchange.

jessek avatar jessek commented on May 6, 2024

Hooray! I still think we can improve the timing, but think we can close this issue for now. Please let me know if you start seeing this behavior again and reopen.

from threatexchange.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.