GithubHelp home page GithubHelp logo

Comments (1)

Simran-B avatar Simran-B commented on May 26, 2024

For a simple query like FOR doc IN s FILTER doc._key == "12345" RETURN doc, the profiling output looks like this:

Execution plan:
 Id   NodeType                    Site  Calls   Par   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode               COOR      1     -       1          0       0.00000   * ROOT
  7   SingleRemoteOperationNode   COOR      1     -       1          0       0.00072     - FOR doc IN s FILTER doc.`_key` == "12345" /* primary index scan */
  5   ReturnNode                  COOR      1     -       1          0       0.00000     - RETURN doc

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
  7   primary   primary   s            true     false    false      100.00 %   [ `_key` ]   [  ]            "12345"

Optimization rules applied:
 Id   Rule Name                                           Id   Rule Name                                  
  1   use-indexes                                          3   remove-unnecessary-calculations-2          
  2   remove-filter-covered-by-index                       4   optimize-cluster-single-document-operations

Query Statistics:
 Writes Exec      Writes Ign      Doc. Lookups      Scan Full      Scan Index      Cache Hits/Misses      Filtered      Requests      Peak Mem [b]      Exec Time [s]
           0               0                 0              0               1                  0 / 0             0             0                 0            0.00106

The optimize-cluster-single-document-operations rule is active and you don't find any nodes in the execution plan with DBS in the Site column, respectively no REMOTE/GATHER operations. Calls and Items are 1. The Requests count under Query Statistics is 0.

The explain output doesn't include Calls, Items, and Requests, but it does show you which optimizer rules are active and you can see whether the execution plan contains REMOTE/GATHER nodes.

The optimizer rule is not eligible if you filter for two different documents, like FOR doc IN s FILTER doc._key == "12345" OR doc._key == "54321" RETURN doc (not a single document operation). The profile shows REMOTE/GATHER, several Calls, and Requests is e.g. 9 in case of a collection with 3 shards, which tells you that all DB-Servers were at least contacted once.

Execution plan:
 Id   NodeType        Site  Calls   Par   Items   Filtered   Runtime [s]   Comment
  1   SingletonNode   DBS       3     -       3          0       0.00001   * ROOT
  7   IndexNode       DBS       3     0       2          0       0.00033     - FOR doc IN s   /* primary index scan, index scan + document lookup, 3 shard(s) */    
 10   RemoteNode      COOR      9     -       2          0       0.00005       - REMOTE
 11   GatherNode      COOR      4     -       2          0       0.00005       - GATHER   /* parallel, unsorted */
  5   ReturnNode      COOR      4     -       2          0       0.00001       - RETURN doc

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
  7   primary   primary   s            true     false    false      100.00 %   [ `_key` ]   [  ]            (doc.`_key` IN [ "12345", "54324" ])

Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   replace-or-with-in                         4   remove-unnecessary-calculations-2          7   parallelize-gather               
  2   use-indexes                                5   scatter-in-cluster                         8   async-prefetch                   
  3   remove-filter-covered-by-index             6   remove-unnecessary-remote-scatter

Query Statistics:
 Writes Exec      Writes Ign      Doc. Lookups      Scan Full      Scan Index      Cache Hits/Misses      Filtered      Requests      Peak Mem [b]      Exec Time [s]
           0               0                 2              0               2                  0 / 0             0             9                 0            0.00330

The rule is also not applicable when you query a View, even when asking only for a single document.


When using custom shard keys (not _key but let's say sk), you can observe that the query explain output tells you the shard if you ask for a single shard key like FOR doc IN t FILTER doc.sk == 4 RETURN doc (shard: s8010016):

Execution plan:
 Id   NodeType                  Site  Par     Est.   Comment
  1   SingletonNode             DBS              1   * ROOT
  2   EnumerateCollectionNode   DBS        100000     - FOR doc IN t   /* full collection scan, shard: s8010016  */   FILTER (doc.`sk` == 4)   /* early pruning */
  8   RemoteNode                COOR        100000       - REMOTE
  9   GatherNode                COOR        100000       - GATHER   /* parallel, unsorted */
  5   ReturnNode                COOR        100000       - RETURN doc

When asking for multiple shards, like FILTER doc.sk IN [1, 4], all shards are contacted, however, even if the shard keys belong to a single shard (3 shard(s)):

Execution plan:
 Id   NodeType                  Site  Par     Est.   Comment
  1   SingletonNode             DBS              1   * ROOT
  2   EnumerateCollectionNode   DBS        100000     - FOR doc IN t   /* full collection scan, 3 shard(s)  */   FILTER (doc.`sk` IN [ 1, 4 ])   /* early pruning */
  8   RemoteNode                COOR        100000       - REMOTE
  9   GatherNode                COOR        100000       - GATHER   /* parallel, unsorted */
  5   ReturnNode                COOR        100000       - RETURN doc

You can also observe the difference in the profiling output by looking at the Calls and Requests, although you should add a LIMIT 1 to either get 3 or 9 requests (without, you get some higher number that is dependent on the number of documents, possibly also the batchSize, etc. and you would need to compare the stats).

Using an arangosearch View, search-alias View, or an inverted index, the profiling output shows similar numbers for requests even if asking only for a single shard key and document, so all shards appear to be contacted (for the inverted index, it does show 3 shard(s)).

from arangodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.