Comments (4)
Tracking in PHPLIB-1206.
@fgm: Can you share some more details about the API you're actually trying to create atop PHPLIB's internal StreamWrapper? Specifically, I'm curious about how you'd intend to use it in an existing application and whether you expect the entire file (for both reads and writes) to be identified by a path string.
IIRC, the current design depends on passing a stream context to fopen()
because it was the only way to provide both the CollectionWrapper (abstraction for the files and chunks collections) and other necessary options (file document for reads, filename and options to writes). Unlike other stream wrappers (Amazon S3 comes to mind), we could not rely on a path string alone.
from mongo-php-library.
Hello @jmikola . Thanks for answering so fast The API is "just" full support for any file operation, without the caller having to pass a stream context. That is the Drupal core expectation for any stream wrapper. So, for example, being able to just do a file_get_contents("$scheme://$path")
.
In the meantime, I read more of the library implementation, and AFAICS it would not be directly applicable, because it only handles fopen
modes r
and w
, while for example using file_get_contents()
uses mode rb
. And that is without even considering all the combined modes like r+
we support in Drupal stream wrappers. The existing S3 wrapper, for example, normalizes modes by dropping b
and t
, thus actually always using the implicit b
modifier; and they have 3 kinds of stream: read, append, and write; with all of them supporting reads. They support operations in the middle of files because S3 supports Range read/writes; for GridFS the equivalent is likely to selectively access chunks instead of rebuilding the whole file on every operation.
I likely haven't understood enough of the library at this point, but I do not see why it was the only way to provide both the CollectionWrapper (abstraction for the files and chunks collections) and other necessary options (file document for reads, filename and options to writes)
. It seems to me that, assuming you have access to the underlying database one can implement StreamWrapper::initReadableStream
like this (just an idea, I don't pretend it works):
class StreamWrapper {
// ...snip...
protected static Database $database;
protected CollectionWrapper $cw;
public function getCollectionWrapper(): CollectionWrapper {
if (!isset($this->cw)) {
$this->cw = new CollectionWrapper(self::database->getManager(), self::database->getDatabaseName(), 'fs', []);
}
return $this->cw;
}
public function stream_open($path, $mode, $options, &$openedPath)
{
$this->initProtocol($path);
$this->mode = $mode;
if ($mode === 'r') {
return $this->initReadableStream($path);
}
if ($mode === 'w') {
return $this->initWritableStream($path);
}
return false;
}
private function initReadableStream(string $path)
{
$this->stream = new ReadableStream(
$this->getCollectionWrapper();
$this->collectionWrapper->findFileByFilenameAndRevision($path, 0), // Not sure how revisions work yet
);
return true;
}
}
// and then initialize like this (in the Drupal case), other users get they database how they normally do.
StreamWrapper::database = \Drupal::service('mongodb.database_factory')->get('files');
StreamWrapper::register();
Of course, in a real version, one would access the CollectionWrapper
from the context if any, and only fall back to that mechanism when it is not present: that's just to give an idea.
from mongo-php-library.
because it only handles fopen modes r and w, while for example using file_get_contents() uses mode rb
The decision to only support "r" and "w" modes is rather arbitrary, because the StreamWrapper is internal to the Bucket, which only uses those two modes. All reads are writes are implicitly binary, so there's no reason the "b" and "t" modifiers could not be accepted and ignored, as you mentioned they are with the S3 stream wrapper.
They support operations in the middle of files because S3 supports Range read/writes; for GridFS the equivalent is likely to selectively access chunks instead of rebuilding the whole file on every operation.
The current implementation in PHPLIB is based on the cross-driver GridFS spec, which discusses append operations under future work. At the time the spec was conceived, MongoDB did not implement transactions. It's possible this could now be revisited, although it will still be some time until all server versions supported by drivers also provide transaction support. In any event, I would not move forward with trying to implementing something in PHPLIB directly without first addressing this in the spec in order to ensure we cover as many edge cases as possible and don't end up with an API/design that will eventually conflict with the common driver implementation.
I do not see why a stream context was the only way to provide both the CollectionWrapper and other necessary options.
GridFS file identifiers can be arbitrary BSON types, which cannot be conveniently expressed in a file path. Embedding the BSON value as extended JSON or an encoded binary string (e.g. base64) is feasible, but that also wouldn't be very readable. I expect it also wouldn't play nicely with other libraries that intend to work with actual file paths using an arbitrary stream wrapper protocol. This is ultimately why we passed the BSON value via the stream context.
I'll note that the Bucket does produce file paths for uploaded and downloaded file like:
gridfs://{databaseName}/{bucketName}/{some _id representation}
However, that's just for informational purposes (e.g. debugging, an exception message). The file path is not used internally for anything.
As for the CollectionWrapper also being passed via the stream context, that was done to avoid having any static state on the stream wrapper. Using PHPLIB, applications can create any number of buckets and each file operation constructs a new stream wrapper instance. The library makes no assumptions about there being a singleton database connection, and we did not want to start doing so in a stream wrapper.
As much as I would have liked to provide a GridFS API like gridfs://database/collection/filename
akin to many of the other stream abstractions I surveyed, it didn't seem possible at the time we originally implemented this.
I see you followed up with a suggested design in PHPLIB-1206, so I'll respond to that in the JIRA issue.
from mongo-php-library.
Hello @fgm, thanks to the additional context.
In your PR fgm/mongodb#76, I see MongoDB\GridFS\Bucket::openUploadStream()
is already used in stream_write
. You can also use MongoDB\GridFS\Bucket::openDownloadStreamByName()
to implement stream_read
.
from mongo-php-library.
Related Issues (20)
- PHP+MongoDB: "Return value must be of type MongoDB\Driver\Server, null returned" HOT 3
- Can we drop `jean85/pretty-package-versions:1.2`? HOT 4
- Decision about `mongo-orchestration` folder and `Makefile` HOT 3
- [Documentation] Syntax highlighting in the method reference is not ideal HOT 1
- A never-ending query after upgrade to MongoDB 6.0.3 HOT 6
- Memory leak on updateOne in the loop HOT 1
- Deprecated: Return type of MongoDB\Model\BSONDocument::bsonSerialize() HOT 2
- Performance issue with PHP 8.2 HOT 2
- Extract mongodb-1.6.2.tgz error HOT 1
- MacOs M1 Cannot connect to Atlas After upgrade driver HOT 7
- This is a test
- $unset not properly work on multiple field? HOT 1
- Docs: missing `]` character in database->aggregate documentation HOT 1
- Question about fieldPaths (feature request?) HOT 1
- Non blocking io HOT 1
- $where is not allowed in this context HOT 2
- TLS connection with mongo cluster failed (while single host works) HOT 2
- getting result from mongodb with toArray() HOT 3
- Getting the error in driver version 1.14 HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mongo-php-library.