GithubHelp home page GithubHelp logo

Comments (26)

davidmoten avatar davidmoten commented on June 14, 2024 1

Re SMTP MIME format, see https://github.com/davidmoten/odata-client#download-an-email-in-smtp-mime-format

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Here's the documentation https://github.com/davidmoten/odata-client#delta-collections.

CollectionPage<Message> delta = OdataFactory
                    .request()
                    .users('[email protected]')
                    .mailFolders('inbox')
                    .messages()
                    .delta()
                    .get();

delta.stream()...

// a while later
delta = delta.nextDelta();
...

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

BTW the first call to delta() lists all messages I think so you might want to use .deltaTokenLatest() as per documentation.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Is this question about serialization? If you serialize the CollectionPage<Message> to json it should have the @odata.deltaLink field in it.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thanks for the response,
We are expecting millions of messages in a Mail-folder and service will not be able to process all the messages locally. So we are streaming the messages received from exchange server. Initially service will pull k records(where k is the pagesize) & send to the agent, then it will again fetch next k records & send to agent, this will keep on going until it gets 'deltaLink' in the response. As soon as service receives 'deltaLink' in the response it will send the records(from last page) to agent, also since 'deltaLink' is received so it will stop pulling data.

Now we need to append 'deltaLink' with the last set of records(from the page containing deltaLink) to the agent. Can you please suggest how to fulfill this requirement?

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

You can use the CollectionPage.deltaLink() method. Note also that when you serialize a CollectionPage to JSON it will include the deltaLink field if present.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

BTW in terms of page size you should also read https://github.com/davidmoten/odata-client#your-own-page-size.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thanks David for the consideration,

Here we are using stream(as given below) and it seems that the library automatically serializes based on mail message records like collectionPage.currentPage();, since the data on agent doesn't shows the nextlink or deltalink. Any suggestions here?


public Flux<Stream<Message>> GetMails()
{
       Stream<Message> collectionPage = OdataFactory
                    .request()
                    .users('[email protected]')
                    .mailFolders('inbox')
                    .messages()
                    .delta()
                    .maxPageSize(20)
                    .stream();

       return Flux.just(collectionPage);
}

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Ah well this boils down to how a Stream serializes doesn't it. A Stream is not a List but it is like one for serialization purposes (the library you are using presumably does something predictable for Streams). Just imagine you take all the Message objects out of a CollectionPage object and add them to an ArrayList. What makes you think that the metadata will go across with it? It won't!

What's the maximum number of messages you want to go across in one call to getMails? It certainly won't be 20 with your current code because the stream keeps getting more pages. You could just return Flux<CollectionPage<Message>> if you were happy with returning one page per call. Then the serialization would include nextLink, and deltaLink (if at last page). It's ugly to couple an API to an internal libraries classes, you could always create your own Page object and return that Flux<Page<Message>>. Of course if you only care about the JSON representation across the network then it doesn't matter.

BTW, you do realize that the Message object won't include email attachments, especially large ones, and there is weirdness with special attachments like Reference attachments? If you want to capture the whole SMTP message reliably you can get it in MIME format as a stream from Graph api using odata-client-msgraph. I can show you how if is of interest.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Apologize, '.maxPageSize(20)' is not relevant in the function( 'GetMails') shared above, as the stream will automatically pull the the pages util deltalink is received.

Agree we can pull all the records using 'nextPage().get()' until deltalink is received and finally add these records into a CustomPage along with deltalink. Using this approach the response on agent will have records as well as the deltalink. In this approach we need to store all the records into a list locally and if we consider a scenario where Mail-folder has millions of messages then this approach will impact the performance and also service may not be able to store all these messages locally, if it doesn't have enough memory.

This is why we are going for stream but the only challenge using stream is how to send 'deltalink' in the response to the agent, so that next time agent can make request for only incremental changes.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Thanks for the detail, I'll give you an example of what to do shortly.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

I'm adding a method with the signature

Stream<ObjectOrDeltaLink<T>> streamWithDeltaLink()

to CollectionPage<T>.

The approach to solve your problem is to take a stream of n Message objects and convert that to a stream of n+1 wrapper objects where the first n objects contain a Message and the last object has an Optional deltaLink (after all, not every stream has a deltaLink at the end).

When Stream<ObjectOrDeltaLink<Message>> is serialized you should see;

[
{ "object": MESSAGE_JSON, "deltaLink": null}, 
...
{ "object": MESSAGE_JSON, "deltaLink": null},
{ "object": null, "deltaLink": "https://blahblah" }
]

I'll finish tests and then let you try it out.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

I've merged the change into master. So now you can do this:

public Flux<ObjectOrDeltaLink<Message>> getMails()
{
       Stream<ObjectOrDeltaLink<Message>> stream = 
               client
                    .users('[email protected]')
                    .mailFolders('inbox')
                    .messages()
                    .delta()
                    .deltaTokenLatest()
                    .streamWithDeltaLink();

       return Flux.fromStream(stream);
}

Note the use of deltaTokenLatest (are you familiar with the effects of that option?) and you should not need to rebuild a client from a factory every call. A client built once can be used for the lifetime of an application and is threadsafe. Note also that you were returning a Flux using Flux.just and you should be using Flux.fromStream.

What library are you using to expose a Flux across the network (what library is doing the serialization of a Flux into JSON)? Is it WebFlux?

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thank you David, yes we are using WebFlux.
I hope the above code changes will be part of release 0.1.35, when it will be released?

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

I'll look at a release in the next day or so. In the meantime you can just do this to use the SNAPSHOT version locally:

git clone https://github.com/davidmoten/odata-client.git
cd odata-client
mvn clean install

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Hi David,
Thanks a lot, we have tested and the new code changes produces output as per our expectation.

Just small request can we have method 'streamWithDeltaLink' should be part of 'CollectionPageNonEntityRequest' and 'CollectionEntityRequestOptionsBuilder' also?

Otherwise we are ok with the current code changes as well and below is the invocation:

public Flux<ObjectOrDeltaLink<Message>> getMails()
{
Stream<ObjectOrDeltaLink<Message>> stream = 
               client
                    .users('[email protected]')
                    .mailFolders('inbox')
                    .messages()
                    .delta()
                    .get()
                    .streamWithDeltaLink();

return Flux.fromStream(stream);
}

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Glad to hear it works @madanbisht, thanks for testing the change. I've added the extra methods as requested and I'll build a release shortly.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

0.1.35 is on Maven Central now.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thank you, now we are using build 0.1.35 and its perfectly working.

Just one small request like deltalink, is it possible to add nextlink in the response as it will be useful to handle below failure scenario.

Considering a scenario where mailbox has lets say 2 Million mails, using stream we will be able to pull all mails including deltalink(required for incremental messages).
If somehow the connection gets broken in between, lets say after pulling 1 Million mails then agent doesn't have the information on what data it has received and I think in this situation agent has to make the request again to pull all the data, including mails which it has already received.

Adding nextlink in the response to the agent will enable the agent to initiate request only for remaining mails(which it doesn't received yet) using nextlink.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

Ha, getting complicated for you! To solve that problem I would call nextDelta more often. You get a failure you have fewer messages to repeat processing for. The other reason that you should call nextDelta more often is that the delta tokens themselves have limited lifetimes, they expire!

Are you trying to guarantee processing of every email? If so then I doubt using deltas is an appropriate method especially as deltaTokens expire. At my work we guarantee processing of emails by pulling down unread emails from the mailbox and only marking them as read once they have been persisted to a queue for processing locally. There appears to be an index on the read/unread status because performance is still good for this request even though the mailbox has grown a lot (in our case 2000 emails a day). Good luck with getting O365 to scale to millions of messages per day in one mailbox, have you tested this? What's the plan for removing messages from the mailbox? If you are doing that anyway why don't you just stream all messages from the mailbox and delete them when they are processed successfully?

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Hi David,
Actually we are not assuming millions of messages per day in one mailbox, in fact its impossible and below is the use case I was talking about.

The intent of our application is to retrieve mails from an account mailbox and store it somewhere in a disk, so that the mails can be restored back into the mailbox whenever required. Considering a scenario where an account is 5 to 10 years old, the account can have millions of messages and our application need pull all these messages.

Once all mails for an account retrieved successfully, application will use deltalink to pull incremental mails in future. If the next pull(incremental) happens after 1 year from the last pull(very first pull) then again the account can have Millions of new mails.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

One more concern please correct, as you said that deltaToken has limited lifetimes then I think deltaLink will not be applicable after a week or month.

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

There's not much out there but here's something that talks about delta links expiring:

https://stackoverflow.com/questions/51933002/syncstatenotfound-error-how-to-fix-or-avoid

and this says that they expire within 7 days:

http://www.msfttoday.com/duration-of-change-tracking-tokens-for-identity-and-education-resources/

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thank you David for sharing it, it has an impact in our use case.

Also your suggestions are appreciated on the above use case, i.e. error handling(due to connection error) for those scenario where application uses stream to process Millions of messages for an account(which is 5 to 10 years old) .

from odata-client.

davidmoten avatar davidmoten commented on June 14, 2024

The intent of our application is to retrieve mails from an account mailbox and store it somewhere in a disk, so that the mails can be restored back into the mailbox whenever required. Considering a scenario where an account is 5 to 10 years old, the account can have millions of messages and our application need pull all these messages.

Use raw SMTP format as I've already commented, not Message json. That way you retain everything about the email including all attachments no matter how big and the SMTP headers. You'll need to confirm the practicality of this for restoring an email to an account.

Once all mails for an account retrieved successfully, application will use deltalink to pull incremental mails in future. If the next pull(incremental) happens after 1 year from the last pull(very first pull) then again the account can have Millions of new mails.

Solution for your error handling scenario is to pull more often to reduce the stream size and to account for expiry of deltaLink and handle duplicates sensibly. Handling of duplication is an inevitability, make sure you account for it. Microsoft suggests using a webhook for notification of changes so you don't have to poll for them. Worth looking at too.

from odata-client.

madanbisht avatar madanbisht commented on June 14, 2024

Thanks David for your kind support and understanding.

Also we are interested on streamming SMTP message from Graph api using odata-client-msgraph. If required, we will create a new thread for it.

from odata-client.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.