GithubHelp home page GithubHelp logo

Comments (16)

snow2zhou avatar snow2zhou commented on May 8, 2024 4

Maybe you can use function GeneratePdf() to get byte[] first:
byte[] data = document.GeneratePdf();
Then you can transfer byte[] to stream:
Stream stream = new MemoryStream(data);
Finally, you can return a FileStreamResult to user :
FileStreamResult actionresult = new FileStreamResult(stream, "application/pdf");
actionresult.FileDownloadName = "myPDF.pdf";

from questpdf.

DerAlbertCom avatar DerAlbertCom commented on May 8, 2024 3

Here an Example:

public class QuestPdfResult : ActionResult
{
    private readonly IDocument _document;
    private readonly string _filename;


    public QuestPdfResult(IDocument document, string filename)
    {
        _document = document;
        _filename = filename;
    }

    public override async Task ExecuteResultAsync(ActionContext context)
    {
        var httpContext = context.HttpContext;
        var streamManager = httpContext.RequestServices.GetRequiredService<RecyclableMemoryStreamManager>();

        using var memoryStream = streamManager.GetStream();
        _document.GeneratePdf(memoryStream);

        httpContext.Response.ContentType = "application/pdf";
        httpContext.Response.Headers.ContentDisposition = $"attachment; filename=\"{_filename}\"";
        memoryStream.Position = 0;
        await memoryStream.CopyToAsync(httpContext.Response.Body);
    }
}

you have to Add the RecyclableMemoryStreamManager to DI.

services.AddSingleton<RecyclableMemoryStreamManager>();

from questpdf.

casperOne avatar casperOne commented on May 8, 2024 1

@MarcinZiabek Thanks for the prompt response.

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

It would be great if this was supported, but I can make do with copying for now.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

Hello,

I think that you are incorrectly utilizing the streaming capability within the asp mvc. Please try this and let me know if it works 😁

public async Task<ActionResult> DownloadReport()
{
    await using var stream = new MemoryStream();
    myReport.GeneratePdf(stream);
    return File(stream, "application/pdf", "myReport.pdf");
}

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

I suspect that it may be due to the fact that the GeneratePdf overload that writes to a stream does not do so async, and that is a requirement if a stream is being written to directly in ASP.NET core.

That is correct, QuestPDF does not support async operations. The only place where the async pattern would be useful is streaming images. However, because of the layouting algorithm, the library needs to know the exact size of the image at the very beginning of the process (which usually require just loading the image into memory) and then keep it to nearly the very end when the PDF file is produced with SkiaSharp. Alternatively, the library would need to load the image into memory twice which is no good either.

from questpdf.

casperOne avatar casperOne commented on May 8, 2024

That does work, but as mentioned, I'm looking to avoid allocations by writing to an intermediate stream (the new MemoryStream allocation is the same as returning to a new byte array).

Can you please provide more details and justification for why it may work this way? So far, I have been using this method to stream large files up to several gigabytes (from blob storage, through asp core, to the client) and never saw any significant memory allocations, in contrast to using just a byte array. Maybe I just miss some important detail here...

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

I have been able to get around my use case by rendering to a local array; because of the size of the PDFs, I worry about allocations building up on the LOH.

Can you please provide more information? I was thinking about this use case and indeed, you may be right, the stream API does not help much when used with the MomeryStream object. It would be great to improve the library to reduce its memory consumption and GA pressure :)

from questpdf.

casperOne avatar casperOne commented on May 8, 2024

@MarcinZiabek The documents that I'm generating are currently ~1.6 MB, so greater than 85K which is required to go on the LOH.

In a high-throughput situation, this can easily cause problems, as the LOH:

  • Does not get collected until generation 1 and 2 are collection
  • Is not always compacted (which leads potential difficulty with further allocations on the LOH later on)

Source:

https://docs.microsoft.com/en-us/dotnet/standard/garbage-collection/large-object-heap#when-is-a-large-object-collected

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

This is truly an interesting case, thank you for sharing this link! I am planning to read more about this concept. We also need to make sure that the core platform operates in a similar fashion. It would be great to have any real benchmarks showing this is indeed a problem.

  1. The overall size of the PDF document is a result of embedded fonts. It is possible to manually perform font subsetting to reduce PDF file size. It can be done by any online tool. It creates a new font file with only required glyphs. This may reduce PDF file size tenfold or more. I am slowly working on the process of performing this automatically.
  2. You can also reduce the file size by using image compression - by default, QuestPDF uses PNG file format.

I am still not sure if optimizing this streaming access will result in any significant results at this point. When you generate the document with images, those images need to be present in memory during the entire generation process. So, by average, we can reduce LOH overhead only by half in the most optimistic scenario. And this is assuming that SkiaSharp does not introduce any buffering or workload on its own.

I am not against any optimisation. In fact, any help is always more than welcome 😁

from questpdf.

schulz3000 avatar schulz3000 commented on May 8, 2024

I can reproduce the issue with HttpContext.Response.Body stream.
The problem is that the underlying Stream implementation does not support the Position property.
But the SkiaSharp implementation call this property in SkiaSharp.SKManagedWStream.OnBytesWritten

I think there are two possible solutions.

  • throw an exception early if we detect an input stream has set CanSeek to false
  • write a wrapper to provide the Position property to SkiaSharp

I will provide a PR for the wrapper solution.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

Thank you for providing this solution and improvement. I will analyse it as soon as I can. I understand your explanation.

Is there any possibility that the wrapper idea may introduce any bugs in other areas? After all:

  1. It hides real position property.
  2. It disables/removes some stream capabilities (e.g. CanRead, CanSeek, etc.) that may be required by SkiaSharp in the future (e.g. for performance optimization scenarios).

from questpdf.

Usergitbit avatar Usergitbit commented on May 8, 2024

Is this fixed? Trying to pass HttpContext.Response.Body as the stream still throws a null reference exception for me.

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

I will take a look within a couple of days. Maybe something was incorrectly merged. Thank you for rising my attention!

from questpdf.

MarcinZiabek avatar MarcinZiabek commented on May 8, 2024

I decided to roll back changes developed in #65 in Quest 2022.2.5.

Indeed, in some cases it allows streaming directly to the Response.Body stream. However, this also introduces a significant risk. When an exception is thrown, the QuestPDF library attempts to close and dispose the SkiaSharp SkDocument object. This involves disposing the managed stream provided as an argument. If the provided stream happens to be Response.Body, a momery security issue happens and the FatalException is thrown. This is unaccaptable on production environments.

In other cases, the solution works but the "Synchronous operations are disallowed. Call WriteAsync or set AllowSynchronousIO to true instead" exception is thrown - this is caused by ASP.NET. Basically, it excepts that when we directly stream to the response stream, we should use the stream.WriteAsync method. Unfortunatelly, the SkiaSharp calls just the stream.Write method internally and this cannot be easily changed.

It is possible that I do not fully understand the whole problem. For now, I want to mitigate the risk. If anyone wants to help me analyse this problem with more details and on all environments, please do 😁

from questpdf.

DerAlbertCom avatar DerAlbertCom commented on May 8, 2024

You can use https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream as MemoryStream Replacement for less LOH Fragmentation.

from questpdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.