GithubHelp home page GithubHelp logo

Paging support? about fsharp.cosmosdb HOT 13 CLOSED

aaronpowell avatar aaronpowell commented on June 15, 2024
Paging support?

from fsharp.cosmosdb.

Comments (13)

aaronpowell avatar aaronpowell commented on June 15, 2024

I'm going to admit that I'm still wrapping my head around how the IAsyncEnumerable and AsyncSeq work, given we don't have the await foreach in F# that C# would tend to unpack.

Because of that, I'm not completely sure that AsyncSeq understands pagination the same way.

Do you have an example of what you're trying to do?

from fsharp.cosmosdb.

BrianVallelunga avatar BrianVallelunga commented on June 15, 2024

What I'm doing right now (with the v3 SDK) is this:

let public fetchAllItems(feedIterator: FeedIterator<'a>) =
    asyncSeq {
        while feedIterator.HasMoreResults do
            let! response = feedIterator.ReadNextAsync() |> Async.AwaitTask
            yield! response |> AsyncSeq.ofSeq
    }

This returns an AsyncSeq<'a> and seems to work, but I have my doubts if it is the "right" way. I believe someone helped me with the final yield! line.

from fsharp.cosmosdb.

BrianVallelunga avatar BrianVallelunga commented on June 15, 2024

For a bit more explanation, response is a FeedResponse<'a> which implements IEnumerable<'a>. response |> AsyncSeq.ofSeq then gives AsyncSeq<'a> which is then merged into the parent sequence via yield!.

See: https://theburningmonk.com/2011/09/fsharp-yield-vs-yield/

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

Looking into the v3 SDK source and comparing it to v4, it looks like it works a bit differently. In v4 there isn't the FeedIterator<'T> that v3 used, instead it's AsyncPagable<'T>.

Now, digging through this a bit further it turns out that this is really a wrapper over FeedIterator and hides away the paging unless you call AsyncPagable.AsPages(), in which you provide the size of the pages you need.

So, I probably would have to have execPagedAsync where you can provide the right info and that could return it as an AsyncSeq then, but I'll have to play (trying to work out how to handle the Insert API presently).

from fsharp.cosmosdb.

seankearon avatar seankearon commented on June 15, 2024

Could you not just use OFFSET and LIMIT in your query "SQL" and remember what page index you're on? Or have I missed something here?

Edit: I believe that's the official CosmosDB approach for paging.

from fsharp.cosmosdb.

BrianVallelunga avatar BrianVallelunga commented on June 15, 2024

@seankearon I don't think these two types of paging are the same thing, though I may be wrong. The type I'm referring to is more akin to batching. The Cosmos SDK client won't return everything all at once. You have to continually call it to get the next batch of data. I'll take a look at what's in V4 when I get a chance.

from fsharp.cosmosdb.

seankearon avatar seankearon commented on June 15, 2024

Yeah, I'm wondering where the difference is between those two.

If an async stream is like using a drinking straw to drink from a lake. Then paging/batching is like using a bucket to drink from a lake.

If I'm passing you the bucket to drink from, do you care whether I fill it all at once or in little steps using my drinking straw? Probably not - you just want the next bucket of water.

I'm thinking that the way to fill up bucket n using the straw would be something like this:

let usersByPage(page: Int32) =
    host
    |> Cosmos.host
    |> Cosmos.connect key
    |> Cosmos.database "UserDb"
    |> Cosmos.container |> "UserContainer"
    |> Cosmos.query "SELECT u.FirstName, u.LastName FROM u WHERE u.LastName = @name OFFSET xyz LIMIT pqr"
    |> etc. etc. etc.
    |> Cosmos.execAsync<User>

But then, it's been a loooong day! :)

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

I've finally had some time to come back and revisit this issue and work out if it's possible/viable to do pagination support.

TL;DR: Use the OFFSET and LIMIT as @seankearon has suggested, I don't think I can put anything into the API to do it for you. Best I can do it have a way to get batched results per iteration of the AsyncSeq.

We're going to dive through a rabbit hole now, so choose if you want to read on as I'm partially writing this down for my own sake. I'm going to trace through a bunch of the Azure.Cosmos code as it currently stands.

When you execute a GET query you call the method GetItemQueryResultsAsync (_Note: The Async suffix is added after -preview4, so my code doesn't use it, but it will eventually) and this creates a FeedIteratorCore which is what handles the ReadNextAsync operation to fetch records from CosmosDB.

This type is then wrapped in FuncAsyncPagable to return the AsyncPagable<Page<T>> response that is consumed by AsyncSeq in F# to make our nice API.

AsyncPagable, the base class of FuncAsyncPagable has the AsPages and MoveNext methods defined on it, MoveNext being what is ultimately called by the state machine to go over the iterator (it bubbles through a few other types, but it's ultimately where we land). What's interesting about the implementation is that it actually called AsPages anyway, so the AsPages method is our important one.

Our AsPages calls the func passed in here which is a call to GetPagesAsync on PageIteratorCore.

Now, if we trace through here, AsPages takes a continuationToken and pageHitSize, but GetPagesAsync on our iterator only takes the continuationToken, the pageHitSize is dropped along the way. My guess is that the pageHitSize doesn't map to anything that is available on the CosmosDB REST API, so it can't be used and is discarded in turn.

So, what's the difference between iterating over the AsyncPageable<T> and IAsyncEnumerator<Page<T>> (how it currently works vs calling .AsPages)? Whether or not you get a single result or a batch of results. Page<T> has a Values property which will contain 100 T items that you need to unpack. This means it comes down to "do you want to work with a single result each iteration, or with a batch of items?" (no where can I find exposed a "HasMore" property, that's just determined by whether you keep iterating).

I tested this against a large Cosmos store I have with the following code:

async Task Main()
{
	var client = new CosmosClient("...");
	
	var container = client.GetDatabase("...").GetContainer("...");
	
	var qd = new QueryDefinition("SELECT c.id FROM c");

	var nonCount = 0;
	"Non-paged query".Dump();
	await foreach (var response in container.GetItemQueryIterator<Dictionary<string, string>>(qd))
	{
		nonCount++;
	}

	"Paged query".Dump();
	var pageCount = 0;
	await foreach (var response in container.GetItemQueryIterator<Dictionary<string, string>>(qd).AsPages())
	{
		pageCount++;
	}
	
	$"Non-Paged ran {nonCount} times to Paged {pageCount}".Dump();
}

And here's the response:

Non-paged query
Paged query
Non-Paged ran 3811 times to Paged 39

The iteration count dropped and I ran a network trace on it, which saw the same number of network requests happening.

I might look at putting in a queryBatch or something like that which returns AsyncSeq<Page<T>> to give feature parity with the underlying API.

from fsharp.cosmosdb.

BrianVallelunga avatar BrianVallelunga commented on June 15, 2024

Thanks for the detailed write-up. I'm waiting to use this on my project until the v4 Cosmos API SDK is fully released, but this looks great.

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

I've added a "pagination" option in a new branch: https://github.com/aaronpowell/FSharp.CosmosDb/tree/pagination

Basically all it does is adds a new method Cosmos.execBatchAsync which returns AsyncSeq<Page<T>> so you can get the some paged results but it's not really paged due to what I mentioned above.

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

This will be coming in the next release.

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

If anyone wants to test, grab the 0.3.0 pre-release packages from https://github.com/aaronpowell?tab=packages&repo_name=FSharp.CosmosDb

from fsharp.cosmosdb.

aaronpowell avatar aaronpowell commented on June 15, 2024

Available on NuGet now.

from fsharp.cosmosdb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.