GithubHelp home page GithubHelp logo

Comments (4)

jmikola avatar jmikola commented on July 2, 2024

Tracking in PHPLIB-1206.

@fgm: Can you share some more details about the API you're actually trying to create atop PHPLIB's internal StreamWrapper? Specifically, I'm curious about how you'd intend to use it in an existing application and whether you expect the entire file (for both reads and writes) to be identified by a path string.

IIRC, the current design depends on passing a stream context to fopen() because it was the only way to provide both the CollectionWrapper (abstraction for the files and chunks collections) and other necessary options (file document for reads, filename and options to writes). Unlike other stream wrappers (Amazon S3 comes to mind), we could not rely on a path string alone.

from mongo-php-library.

fgm avatar fgm commented on July 2, 2024

Hello @jmikola . Thanks for answering so fast The API is "just" full support for any file operation, without the caller having to pass a stream context. That is the Drupal core expectation for any stream wrapper. So, for example, being able to just do a file_get_contents("$scheme://$path").

In the meantime, I read more of the library implementation, and AFAICS it would not be directly applicable, because it only handles fopen modes r and w, while for example using file_get_contents() uses mode rb. And that is without even considering all the combined modes like r+ we support in Drupal stream wrappers. The existing S3 wrapper, for example, normalizes modes by dropping b and t, thus actually always using the implicit bmodifier; and they have 3 kinds of stream: read, append, and write; with all of them supporting reads. They support operations in the middle of files because S3 supports Range read/writes; for GridFS the equivalent is likely to selectively access chunks instead of rebuilding the whole file on every operation.

I likely haven't understood enough of the library at this point, but I do not see why it was the only way to provide both the CollectionWrapper (abstraction for the files and chunks collections) and other necessary options (file document for reads, filename and options to writes). It seems to me that, assuming you have access to the underlying database one can implement StreamWrapper::initReadableStream like this (just an idea, I don't pretend it works):

class StreamWrapper {
    // ...snip...
    protected static Database $database;

    protected CollectionWrapper $cw;

    public function getCollectionWrapper(): CollectionWrapper {
        if (!isset($this->cw)) {
            $this->cw = new CollectionWrapper(self::database->getManager(), self::database->getDatabaseName(), 'fs', []);
        }
        return $this->cw;
    }

    public function stream_open($path, $mode, $options, &$openedPath)
    {
        $this->initProtocol($path);
        $this->mode = $mode;

        if ($mode === 'r') {
            return $this->initReadableStream($path);
        }

        if ($mode === 'w') {
            return $this->initWritableStream($path);
        }

        return false;
    }

    private function initReadableStream(string $path)
    {
        $this->stream = new ReadableStream(
            $this->getCollectionWrapper(); 
            $this->collectionWrapper->findFileByFilenameAndRevision($path, 0), // Not sure how revisions work yet
        );

        return true;
    }
}

// and then initialize like this (in the Drupal case), other users get they database how they normally do.
StreamWrapper::database = \Drupal::service('mongodb.database_factory')->get('files');
StreamWrapper::register();

Of course, in a real version, one would access the CollectionWrapper from the context if any, and only fall back to that mechanism when it is not present: that's just to give an idea.

from mongo-php-library.

jmikola avatar jmikola commented on July 2, 2024

because it only handles fopen modes r and w, while for example using file_get_contents() uses mode rb

The decision to only support "r" and "w" modes is rather arbitrary, because the StreamWrapper is internal to the Bucket, which only uses those two modes. All reads are writes are implicitly binary, so there's no reason the "b" and "t" modifiers could not be accepted and ignored, as you mentioned they are with the S3 stream wrapper.

They support operations in the middle of files because S3 supports Range read/writes; for GridFS the equivalent is likely to selectively access chunks instead of rebuilding the whole file on every operation.

The current implementation in PHPLIB is based on the cross-driver GridFS spec, which discusses append operations under future work. At the time the spec was conceived, MongoDB did not implement transactions. It's possible this could now be revisited, although it will still be some time until all server versions supported by drivers also provide transaction support. In any event, I would not move forward with trying to implementing something in PHPLIB directly without first addressing this in the spec in order to ensure we cover as many edge cases as possible and don't end up with an API/design that will eventually conflict with the common driver implementation.

I do not see why a stream context was the only way to provide both the CollectionWrapper and other necessary options.

GridFS file identifiers can be arbitrary BSON types, which cannot be conveniently expressed in a file path. Embedding the BSON value as extended JSON or an encoded binary string (e.g. base64) is feasible, but that also wouldn't be very readable. I expect it also wouldn't play nicely with other libraries that intend to work with actual file paths using an arbitrary stream wrapper protocol. This is ultimately why we passed the BSON value via the stream context.

I'll note that the Bucket does produce file paths for uploaded and downloaded file like:

gridfs://{databaseName}/{bucketName}/{some _id representation}

However, that's just for informational purposes (e.g. debugging, an exception message). The file path is not used internally for anything.

As for the CollectionWrapper also being passed via the stream context, that was done to avoid having any static state on the stream wrapper. Using PHPLIB, applications can create any number of buckets and each file operation constructs a new stream wrapper instance. The library makes no assumptions about there being a singleton database connection, and we did not want to start doing so in a stream wrapper.

As much as I would have liked to provide a GridFS API like gridfs://database/collection/filename akin to many of the other stream abstractions I surveyed, it didn't seem possible at the time we originally implemented this.


I see you followed up with a suggested design in PHPLIB-1206, so I'll respond to that in the JIRA issue.

from mongo-php-library.

GromNaN avatar GromNaN commented on July 2, 2024

Hello @fgm, thanks to the additional context.

In your PR fgm/mongodb#76, I see MongoDB\GridFS\Bucket::openUploadStream() is already used in stream_write. You can also use MongoDB\GridFS\Bucket::openDownloadStreamByName() to implement stream_read.

from mongo-php-library.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.