GithubHelp home page GithubHelp logo

Comments (8)

aboodman avatar aboodman commented on August 25, 2024

Here is an idea for how to proceed here:

  1. Create a representative benchmark that scans 20MB of 1KB random JSON objects, see where we currently are - both cold (using purge command above) and warm
  2. Put the same data in LevelDB and do same test
  3. Put the same data in Noms on the CLI and do same test

from replicache.

arv avatar arv commented on August 25, 2024

I've been working on making scan return an AsyncIterable.

class ScanResult implements AsyncIterable<JSONValue> { ... }

scan(options?: ScanOptions): ScanResult;

One of the outcomes of that is that the transaction will escape query if we return the ScanResult from query (instead of draining the async iterator in query):

const iter = await rep.query(tx => tx.scan());
for await (const v of iter) { // does _invoke with the tx ID from the tx in the query
   console.log(v);
}

If we do the above, the transaction ID is used to read more values in the async iterator but the way things are currently structured we close the transaction at the end of the async query function.

My proposed solution is to use ref counting for the transaction and when the ref goes down to 0 we call _invoke 'closeTransaction'. We could also use WeakRef but it is not widely supported yet and there is no guarantee that the reference is garbage collected.

@aboodman Feedback wanted if this is worth it?

from replicache.

aboodman avatar aboodman commented on August 25, 2024

I don't quite follow the refcounting or weakref ideas, but it is important that the tx has a well-defined lifetime. Once we have GC implemented inside the datastore, we need to know what is safe to GC. TX lifetimes are part of this (we can't collect anything there's an open tx for).

In other words, the scan iterator should stop working if it escapes the tx. Later on, we might decide to have a different tx api like:

tx = replicache.read()
tx.scan()
tx.close()

Even in that case scan must stop working after its associated tx has closed.

from replicache.

aboodman avatar aboodman commented on August 25, 2024

the scan iterator should stop working if it escapes the tx

to be more precise: the scan iterator should stop working after the associated tx closes.

from replicache.

arv avatar arv commented on August 25, 2024

There is really no guarantee that closeTransaction gets called.

The API we currently have allows scan to be used outside of query. That works fine since we close the implicit transaction when the iterator is closed. The problem is when we use scan inside query because query handles the transaction and query closes the transaction when query is done.

What I'm suggesting is to keep track of the iterators and when query is done and all the iterators are done we close the transaction.

This is easiest done with ref counting (but do not mistake it with ref counting used for memory management).

from replicache.

arv avatar arv commented on August 25, 2024

Sorry, my comments should probably have been on #30

from replicache.

arv avatar arv commented on August 25, 2024

I did some performance testing

I had a DB of 1000 key-value-pairs and the value is around 50 characters when serialized to JSON.

Setting the page size to > 2000 makes us fetch all scan items in one go:

Total time to scan: 45ms (according to JS)

02 Jul 20 16:19:17.8586 -0700 DBG rpc --> data={} db=perf gr=18 req=openTransaction rid=2049
02 Jul 20 16:19:17.8587 -0700 DBG rpc <-- cid=7y8QzWw7idnhCy5AhSHUBG db=perf dur=0.167184 gr=18 req=openTransaction rid=2049
02 Jul 20 16:19:17.8672 -0700 DBG rpc --> data="{\"transactionId\":22,\"prefix\":\"\",\"limit\":10000}" db=perf gr=18 req=scan rid=2050
02 Jul 20 16:19:17.8888 -0700 DBG rpc <-- cid=7y8QzWw7idnhCy5AhSHUBG db=perf dur=21.604977 gr=18 req=scan rid=2050

And from openTransaction to end of scan 30ms
Actual time in Dispatch 22ms

The two requests have:
Duration: 9.07 ms (7.55 ms network transfer + 1.51 ms resource loading)
Duration: 34.43 ms (27.11 ms network transfer + 7.32 ms resource loading)

The self js time for the parts are .19ms + .44ms + 1.56ms = 2.19ms

Summary

At this point Go is taking up 50% the time and the other 50% is used by HTTP.

from replicache.

arv avatar arv commented on August 25, 2024

If I change the page size to 100 (ie 10 requests)

The total time is 176ms according to JS.

Go log from openTransaction to last scan me 167ms.

The total time spent in JS is now 91ms

Actual time spent in repm.Dispatch: 30ms

There is a lot of queueing and waiting for responses. One example request:

image

from replicache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.