Comments (7)
When I set a limit
of 1000
, I can consistently get back the same number of results. If I set a limit
of 25
, then stuff similar to the above starts happening. In these cases I'm seeing situations where next
is missing from the pager (usually indicating you've reached the end of your results) or the pager section is missing as well (need to dig into this in case there's a weird result being returned).
from threatexchange.
What happens if you specify the since
and until
parameters?
from threatexchange.
I have been testing the ThreatExchange APIs using both the /threat_indicators/get_indicators.py scripts and the pytx modules. Tonight, if I run "./get_indicators.py --type IP_ADDRESS -o foo.txt", then I get no results. If I run "./get_indicators.py --type IP_ADDRESS --limit 1000 -o foo.txt" then I get thousands of results. I am experiencing the same behavior using the pytx approach. A "ThreatIndicator.objects (type_='IP_ADDRESS', dict_generator=True)" call will not produce any results. If I add a limit parameter to the call then it will produce results. Using a different indicator type, such as swapping DOMAIN for IP_ADDRESS, does not affect the outcomes. You also need to specify limit parameters for DOMAIN searches in order to get results. Since both the pytx and get_indicator.py approaches make similar GET requests and have similar results, my assumption is that this may be on the Facebook side. Over the last few days, the behavior has varied. When I first tested using both approaches on the 30th and the 1st, I would get some results from both approaches without specifying a limit value. Although, similar to your experience, the number of results would vary between sequential runs.
from threatexchange.
We found and fixed a bug today which was impacting the number of search results on ThreatExchange endpoints. Some results were being hidden. The fix will go live this weekend and should be on 100% of the production servers on Monday morning.
Can you please try these tests again and see if the problem still exists? I'm happy to keep digging if it is, but I'm hoping we nailed it.
from threatexchange.
Will do!
from threatexchange.
I apologize for this taking so long, but here's the results using pytx!
# Looping over ThreatDescriptor objects with the text "facebook.com"
# Four runs setting "limit" to 1000
('time: ', 8.108134984970093)
('count: ', 4678)
('time: ', 7.476968050003052)
('count: ', 4678)
('time: ', 8.544885873794556)
('count: ', 4678)
('time: ', 8.227381944656372)
('count: ', 4678)
# Four runs not setting limit
('time: ', 127.16218400001526)
('count: ', 4678)
('time: ', 91.82453489303589)
('count: ', 4678)
('time: ', 101.06061697006226)
('count: ', 4678)
('time: ', 96.52747583389282)
('count: ', 4678)
The consistent result count is perfect (although no way to vet that it's the actual count I should be seeing). The runtime with no limit is a bit more sporadic but it seems to agree with our previous conclusion that the higher the limit we set the better the performance will be due to decreased API calls.
I don't know if anyone else is seeing the same improvement but I think the fix worked! Thanks for all of the hard work :)
from threatexchange.
Hooray! I still think we can improve the timing, but think we can close this issue for now. Please let me know if you start seeing this behavior again and reopen.
from threatexchange.
Related Issues (20)
- pdq_hasher error for B/W png HOT 1
- [py-tx] SignalType Reference implementation for Video TMK+PDQF Matching
- [py-tx] ThreatExchange checkpoint time implementation is incorrect, potentially skipping updates HOT 2
- [py-tx] Investigate dbm as a replacement for the default store
- /matches/for-hash/ returns 400, could not parse request HOT 9
- [hma] Clicking Sync button on the webui doesn't do anything
- [py-tx] New extension interface for storage
- [py-ty] Venv setup documentation and/or files
- [hma] Cleanup Settings > ThreatExchange Tab
- [hma] 500 error thrown on invalid PDQ hash HOT 1
- [HMA] graph API 9.0 hardcoded, now deprecated HOT 1
- [py-tx][HMA-in-a-bottle] Modularising py-tx -- Draft roadmap HOT 6
- [hma] Fetcher policy fails to access index HOT 1
- [hma] submitting content gets stuck between "hashed" and "matched" HOT 2
- /matches/for-hash/ gives AttributeError: 'IndexMatchUntyped' object has no attribute 'distance' HOT 1
- [pytx] No match results if creating a local_file with only 1 hash in it HOT 1
- [hma] Size of hashkey has exceeded the maximum size limit of 2048 bytes HOT 3
- [hma] ValueError in indexer
- [vpdq] Add support for Windows in vpdq pypy package HOT 5
- [OMM] Set up basic Continuous Integration git workflows for OMM HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from threatexchange.