GithubHelp home page GithubHelp logo

rwf2 / multer-rs Goto Github PK

View Code? Open in Web Editor NEW
151.0 9.0 34.0 181 KB

An async parser for multipart/form-data content-type in Rust

License: MIT License

Rust 100.00%
multipart-formdata multipart-uploads async rust multipart-parser

multer-rs's Introduction

GitHub Actions Status crates.io Documentation MIT

multer-rs

An async parser for multipart/form-data content-type in Rust.

It accepts a Stream of Bytes as a source, so that It can be plugged into any async Rust environment e.g. any async server.

Docs

Install

Add this to your Cargo.toml:

[dependencies]
multer = "2.0"

Basic Example

use bytes::Bytes;
use futures::stream::Stream;
// Import multer types.
use multer::Multipart;
use std::convert::Infallible;
use futures::stream::once;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Generate a byte stream and the boundary from somewhere e.g. server request body.
    let (stream, boundary) = get_byte_stream_from_somewhere().await;

    // Create a `Multipart` instance from that byte stream and the boundary.
    let mut multipart = Multipart::new(stream, boundary);

    // Iterate over the fields, use `next_field()` to get the next field.
    while let Some(mut field) = multipart.next_field().await? {
        // Get field name.
        let name = field.name();
        // Get the field's filename if provided in "Content-Disposition" header.
        let file_name = field.file_name();

        println!("Name: {:?}, File Name: {:?}", name, file_name);

        // Process the field data chunks e.g. store them in a file.
        while let Some(chunk) = field.chunk().await? {
            // Do something with field chunk.
            println!("Chunk: {:?}", chunk);
        }
    }

    Ok(())
}

// Generate a byte stream and the boundary from somewhere e.g. server request body.
async fn get_byte_stream_from_somewhere() -> (impl Stream<Item = Result<Bytes, Infallible>>, &'static str) {
    let data = "--X-BOUNDARY\r\nContent-Disposition: form-data; name=\"my_text_field\"\r\n\r\nabcd\r\n--X-BOUNDARY--\r\n";
    let stream = once(async move { Result::<Bytes, Infallible>::Ok(Bytes::from(data)) });
    
    (stream, "X-BOUNDARY")
}

Prevent Denial of Service (DoS) Attacks

This crate also provides some APIs to prevent potential DoS attacks with fine grained control. It's recommended to add some constraints on field (specially text field) size to prevent DoS attacks exhausting the server's memory.

An example:

use multer::{Multipart, Constraints, SizeLimit};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create some constraints to be applied to the fields to prevent DoS attack.
    let constraints = Constraints::new()
         // We only accept `my_text_field` and `my_file_field` fields,
         // For any unknown field, we will throw an error.
         .allowed_fields(vec!["my_text_field", "my_file_field"])
         .size_limit(
             SizeLimit::new()
                 // Set 15mb as size limit for the whole stream body.
                 .whole_stream(15 * 1024 * 1024)
                 // Set 10mb as size limit for all fields.
                 .per_field(10 * 1024 * 1024)
                 // Set 30kb as size limit for our text field only.
                 .for_field("my_text_field", 30 * 1024),
         );

    // Create a `Multipart` instance from a stream and the constraints.
    let mut multipart = Multipart::with_constraints(some_stream, "X-BOUNDARY", constraints);

    while let Some(field) = multipart.next_field().await.unwrap() {
        let content = field.text().await.unwrap();
        assert_eq!(content, "abcd");
    } 
   
    Ok(())
}

Usage with hyper.rs server

An example showing usage with hyper.rs.

For more examples, please visit examples.

Contributing

Your PRs and suggestions are always welcome.

multer-rs's People

Contributors

atouchet avatar davidpdrsn avatar dbrgn avatar fishrock123 avatar jaysonsantos avatar jebrosen avatar jmjoy avatar kestrer avatar nickelc avatar paolobarbolini avatar rousan avatar sabrinajewson avatar sergiobenitez avatar teymour-aldridge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multer-rs's Issues

Unable to support utf8 file name

Filename: 你好.txt

curl 'localhost:8000' --form 'operations={"query": "mutation ($file: Upload!) { singleUpload(file: $file) { id } }","variables": { "file": null }}' --form 'map={ "0": ["variables.file"] }' --form '0=@你好.txt' 

Consider allowing whitespaces after the boundary and before the CRLF

(This is a continuation of #24 )

Hello. I open this issue to ask if the following kind of boundary may be supported:

imagen
(where the boundary is simple_boundary)

This boundary ends with SPACE+CRLF.

Currently I get the following error when trying to parse a payload with that boundary:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: multer: Multipart stream is incomplete'

Is that kind of boundary standard compliant?

My current workaround is to make multer to process an arbitrary boundary, not ideal but will work meanwhile.

As a reference, I have tried multipart crate with a similar result, but Python request-toolbet processes that boundary without issues.

Thanks :)

Support for async-std

I tried to use multer in an async-std application.

# Cargo.toml
[package]
name = "multer-async-std"
version = "0.1.0"
authors = ["Danilo <[email protected]>"]
edition = "2018"

[dependencies]
anyhow = "1"
async-h1 = "2"
async-std = { version = "1", features = ["attributes"] }
http-types = "2"
multer = "1.2"
// src/main.rs

use async_std::{
    net::{TcpListener, TcpStream},
    prelude::*,
    task,
};
use http_types::{headers, Response, Result, StatusCode};

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    let listener = TcpListener::bind("127.0.0.1:8080").await?;

    let mut incoming = listener.incoming();
    while let Some(stream) = incoming.next().await {
        let stream = stream?;
        task::spawn(async {
            if let Err(e) = accept(stream).await {
                eprintln!("Error: {}", e);
            }
        });
    }

    Ok(())
}

async fn accept(stream: TcpStream) -> Result<()> {
    async_h1::accept(stream.clone(), |mut req| {
        async move {
            let content_type = req
                .header(headers::CONTENT_TYPE)
                .map(|ct| ct.last().as_str())
                .unwrap();
            let boundary = multer::parse_boundary(content_type).unwrap();
            let body = req.take_body();
            let multipart = multer::Multipart::new(body.bytes(), boundary);
            Ok(Response::new(StatusCode::Ok))
        }
    })
    .await?;
    Ok(())
}

This fails to compile because Multipart::new wants a stream of Bytes, not a stream of u8.

error[E0277]: the trait bound `bytes::bytes::Bytes: std::convert::From<u8>` is not satisfied
  --> src/main.rs:34:29
   |
34 |             let multipart = multer::Multipart::new(body.bytes(), boundary);
   |                             ^^^^^^^^^^^^^^^^^^^^^^ the trait `std::convert::From<u8>` is not implemented for `bytes::bytes::Bytes`
   |
  ::: /home/danilo/.cargo/registry/src/github.com-1ecc6299db9ec823/multer-1.2.2/src/multipart.rs:59:12
   |
59 |         O: Into<Bytes> + 'static,
   |            ----------- required by this bound in `multer::multipart::Multipart::new`
   |
   = help: the following implementations were found:
             <bytes::bytes::Bytes as std::convert::From<&'static [u8]>>
             <bytes::bytes::Bytes as std::convert::From<&'static str>>
             <bytes::bytes::Bytes as std::convert::From<bytes::bytes_mut::BytesMut>>
             <bytes::bytes::Bytes as std::convert::From<http::byte_str::ByteStr>>
           and 4 others
   = note: required because of the requirements on the impl of `std::convert::Into<bytes::bytes::Bytes>` for `u8`

I cannot use from_reader either since that's a Tokio thing.

What would be the appropriate way to use multer with async-std? (If I can get it to work, I can also contribute an example.)

New release v2.0.0 ?

@SergioBenitez, @jebrosen, is there any pending issues to be solved yet? I think we should make a new release as soon as possible. Please let me know, if the master branch is stable, so I can make a new release.

Supporting unquoted name and filename broke semicolons in filenames

We rely on our customers uploading files with metadata in the filenames (please don't ask, yes we know this is a bad idea, yes we've considered all the other options). For separating different properties we decided on using ; because it's both very clear and supported in filenames on all major platforms.

I just tracked down a mysterious issue we were having while using async-graphql and it led me to this line https://github.com/rousan/multer-rs/blob/master/src/constants.rs#L38

It looks like in 677ce5c, support for filenames with semicolons in them was completely broken, which means the file doesn't even get through to our code and early-fails with a mysterious, not even graphql compliant error (the latter is on async-graphql though).

It would be great if this could somehow be fixed - until then we have to rely on pinning an older version of multer (as this breaking change was released with the 2.0.3 release which only exists on crates.io).

reading from stdin?

hey there, thanks for the code!
I've been playing around for quite some time trying to get multer reading from stdin.
No fault of multer-rs my lack of knowledge and success...

this is one quick GPT attempt.

use futures_util::stream::StreamExt; // Import StreamExt trait
use tokio::io::AsyncBufReadExt;
use multer::Multipart;
use tokio::io;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read the stream from stdin.
    let stdin = io::stdin();
    let stream = io::BufReader::new(stdin).lines();

    // Define the boundary string (you may need to adjust this to match your input).
    let boundary = "X-BOUNDARY";

    // Create a `Multipart` instance from the byte stream and boundary.
    let mut multipart = Multipart::new(stream.compat(), boundary); // Use compat() to convert the stream

    // Iterate over the fields, use `next_field()` to get the next field.
    while let Some(field) = multipart.next_field().await? {
        // Get field name.
        let name = field.name();
        // Get the field's filename if provided in "Content-Disposition" header.
        let file_name = field.file_name();

        println!("Name: {:?}, File Name: {:?}", name, file_name);

        // Read field content as text.
        let content = field.text().await?;
        println!("Content: {:?}", content);
    }

    Ok(())
}
error[E0599]: no method named `compat` found for struct `tokio::io::Lines` in the current scope                                                                    --> examples\simple_stdin.rs:16:47                                                                                                                                |                                                                                                                                                             16 |     let mut multipart = Multipart::new(stream.compat(), boundary); // Use compat() to convert the stream                                                       |                                               ^^^^^^ method not found in `Lines<BufReader<Stdin>>`    

I find reading from stdin is complicated.
async_stdin's example works for me, I don't know how to get it transformed into a proper Stream either.

use async_stdin::recv_from_stdin;

#[tokio::main]
async fn main() {
     let mut rx = recv_from_stdin(10);
     while let Some(s) = rx.recv().await {
         println!("Received: {}", s);
     }
}

Maybe this is easy for others? Any help would be appreciated, if I do come up with something I can provide an example.
thanks

Remove `format!` allocations during parsing

At present, the parser uses format! to generate strings during parsing. These should be removed.

https://github.com/rousan/multer-rs/blob/fe853d64f3bb7763866bdaa90c617a11a6bde505/src/multipart.rs#L255

https://github.com/rousan/multer-rs/blob/fe853d64f3bb7763866bdaa90c617a11a6bde505/src/multipart.rs#L312

https://github.com/rousan/multer-rs/blob/fe853d64f3bb7763866bdaa90c617a11a6bde505/src/buffer.rs#L108

There are a couple of approaches:

  1. Change the scanning code so that the parts of the joined string are searched for individually in succession. IE, find BOUNDARY_EXT and check if the next bytes are boundary.
  2. Cache the formatted strings in something like Storage. This is probably the easier approach.

Doesn't support 'name=' without double quotes

.NET System.Net.Http sends form-data; name= without double quotes. e.g. form-data; name=upload_file_minidump. This is allowed by the HTTP spec, and double quotes are only marked as SHOULD when it contains any special characters (RFC 2183, section 2.? see dotnet/runtime#24932 (comment), dotnet/runtime#26585 (comment)).

This library doesn't handle this:

https://github.com/rousan/multer-rs/blob/541e5ea46d0bb3891cec0efeff27811f03e61995/src/constants.rs#L21

cc getsentry/symbolicator#544

Release 2.0.3

@rousan Could we get a release of the current tip? I've already bumped the version, so all we need is a cargo publish. If you'd like for me to handle this in the future, please feel free to add me as an owner on crates.io.

StreamBuffer can use unlimied memory if the underlying stream is always ready

This is probably quite hard to encounter in the real world with a real network--I hit this while testing with artifical streams made from futures::stream::repeat. I was trying to write a unit test showing that if the network dropped I was properly cleaning up half-written files. Unfortunatly I struggled to get my unit test to produce a half-written file because the entire thing was being buffered in memory.

I think the problem is the following fuction in buffer.rs. If self.stream never returns pending, self.buf.extend_from_slice(&data) will keep being called and more and more data will be buffered up. Possibly breaking out of the loop if the buffer has reached some decent size might be a good idea? Not sure what "decent size" would be defined as here.

I fully understand if you don't agree that this is a problem, I appreciate that this is probably very much a edge case.

    pub fn poll_stream(&mut self, cx: &mut Context<'_>) -> Result<(), crate::Error> {
        if self.eof {
            return Ok(());
        }

        loop {
            match self.stream.as_mut().poll_next(cx) {
                Poll::Ready(Some(Ok(data))) => {
                    self.stream_size_counter += data.len() as u64;

                    if self.stream_size_counter > self.whole_stream_size_limit {
                        return Err(crate::Error::StreamSizeExceeded {
                            limit: self.whole_stream_size_limit,
                        });
                    }

                    self.buf.extend_from_slice(&data)
                }
                Poll::Ready(Some(Err(err))) => return Err(err),
                Poll::Ready(None) => {
                    self.eof = true;
                    return Ok(());
                }
                Poll::Pending => return Ok(()),
            }
        }
    }

No release entry for v3.0.0

It would be good to know what changed in v3.0.0 so that upgrading to the latest version is less uncertain.

Clarify Field.name in relation to the RFC

Hi there, this is not a bug per se but, I would like to clarify the function Field.name() which returns an option but, on the pointed doc it says that for a header for multipart body it is mandatory. The question is, on multer is it optional because can be used to represent both the multipart body's and the HTTP headers?

"Multipart stream is incomplete" when form data has leading newlines.

When trying to read fields from a Multipart where the original stream had leading data such as newlines, the query will error with "Multipart stream is incomplete"; this behavior differs from other web libraries, such as python's flask, where those leading newlines get ignored.

(ss) Example data where the issue happens

This request was made by the UnityWebRequest, so it's a completely valid real world scenario where this issue could happen, and it should be supported.

Currently the solution is to skip the first 2 bytes of the stream before passing it to the field iterator, but this is not ergonomic, and in some situations (such as being combined with rocket), the stream becomes useless as the entire data had to be written from the AsyncRead to accomplish this.

Couldn't work with bytes 1.0

error[E0277]: the trait bound `bytes::bytes::Bytes: From<bytes::Bytes>` is not satisfied
  --> src/handlers/my_handler.rs:82:25
   |
82 |     let mut multipart = Multipart::new(stream, boundary);
   |                         ^^^^^^^^^^^^^^ the trait `From<bytes::Bytes>` is not implemented for `bytes::bytes::Bytes`
   | 
  ::: /Users/user/.cargo/registry/src/mirrors.tuna.tsinghua.edu.cn-df7c3c540f42cdbd/multer-1.2.2/src/multipart.rs:59:12
   |
59 |         O: Into<Bytes> + 'static,
   |            ----------- required by this bound in `Multipart::new`
   |
   = help: the following implementations were found:
             <bytes::bytes::Bytes as From<&'static [u8]>>
             <bytes::bytes::Bytes as From<&'static str>>
             <bytes::bytes::Bytes as From<Vec<u8>>>
             <bytes::bytes::Bytes as From<bytes::bytes_mut::BytesMut>>
           and 4 others
   = note: required because of the requirements on the impl of `Into<bytes::bytes::Bytes>` for `bytes::Bytes`

Does this crate work with multipart/mixed ?

Hello.

I would like to know if this crate can work with multipart/mixed data. I am consuming a REST API whose response is in the following format:

--simple_boundary 
Content-type: application/json

[JSON HERE]
--simple_boundary 
Content-Disposition: form-data
Content-type: application/octet-stream
Content-Transfer-Encoding: base64
Content-ID:[ID]

[LONG B64 HERE]
--simple_boundary--

This is the content-type header:

"content-type": "multipart/mixed; boundary=\"simple_boundary\""

I am using reqwest 0.10.10 to send the request to API and to get the response in a stream of bytes as required to multer, but when calling next_field fn I get the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: multer: Multipart stream is incomplete'

Then I am not sure if because the content-type says the response is multipart/mixed multer won't work with this response.

Thank you for your help on this. All the best!

Not extracting `file_name` correctly if filename contains double quotes

Hi, thanks for this great library!

I'm uploading files via multipart/form-data within axum and see that if my filename contains double quotes, it's not properly parsed when using the file_name() method of the Field struct.

Example starting with "

Example sample code (modified from simple_example.rs)
use bytes::Bytes;
use futures_util::stream::Stream;
use multer::Multipart;
use std::convert::Infallible;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let (stream, boundary) = get_byte_stream_from_somewhere().await;
    let mut multipart = Multipart::new(stream, boundary);

    while let Some(field) = multipart.next_field().await? {
        println!("Name: {:?}, File Name: {:?}", field.name(), field.file_name());
        println!("Content: {:?}", field.text().await?);
    }

    Ok(())
}

async fn get_byte_stream_from_somewhere() -> (impl Stream<Item = Result<Bytes, Infallible>>, &'static str) {
    let data = "--X-BOUNDARY\r\nContent-Disposition: form-data; name=\"\"; filename=\"\"Exclusive Offer\"_ Last chance To Get microsoft-office-365.eml\"\r\n\r\nabcd\r\n--X-BOUNDARY--\r\n";
    let stream = futures_util::stream::iter(
        data.chars()
            .map(|ch| ch.to_string())
            .map(|part| Ok(Bytes::copy_from_slice(part.as_bytes()))),
    );

    (stream, "X-BOUNDARY")
}

The file name on the file system:

"Exclusive Offer"_ Last chance To Get microsoft-office-365.eml

Prints the following:

Name: Some(""), File Name: Some("")
Content: "abcd"

I would have expected the following output (or something similar to):

Name: Some("\"\""), File Name: Some("\"\"Exclusive Offer\"_ Last chance To Get microsoft-office-365.eml\"")
Content: "abcd"

Example containing with "

The file name on the file system:

[URGENT] "Exclusive Offer"_ Last chance To Get microsoft-office-365.eml

Prints the following:

Name: Some(""), File Name: Some("[URGENT] ")
Content: "abcd"

I would have expected the following output (or something similar to):

Name: Some("\"\""), File Name: Some("\[URGENT] "\"Exclusive Offer\"_ Last chance To Get microsoft-office-365.eml\"")
Content: "abcd"

Looking in the codebase, it seems like the issue is within this file and I think is related to the FIXME comment https://github.com/rousan/multer-rs/blob/8746d3bd876ddfcc9df9cd1d30783a87873345a8/src/constants.rs#L35-L38

I'm fiddling around with that part of the code, but it's tricky getting this right.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.