Comments (6)
Matthias Scudlik commented
fast content search would be great
from spring-data-mongodb.
Mark Pollack commented
can you elaborate a bit? Something like this http://nosql.mypopescu.com/post/383437318/integrating-mongodb-with-solr ?
I believe that mongodb itself will be offering some text search feature in the future but that might not be around gridfs....
from spring-data-mongodb.
Matthias Scudlik commented
Since GridFS is dealing with binary data it certainly makes not sense for all kind of data.
I suggest that there should be standard functionality for text files (text, xml, html,..)
and the possibility to add "adapters" for custom data.
This should probably be done by indexing the binary content. The "adapter" should be
able to add custom indexes. For example if you have a pdf document you could implement a
custom adapter that opens the pdf and adds the text of the pdf to the index.
On the other hand you could also have an zip archive that has some content you are looking for.
For the standard functionality the mimetype (GridFS is aware of that) should be enough (text, xml,..) to be able to determine how the index should be created.
For zip files the mimetype is not enough. Imagine you have different kinds of zip archives.
One may contain images, another may have word or openoffice documents or even a mix.
So my idea is that you can add such a custom adapter depending on the mimetype and a filepattern.
iamges_001.zip -> dont index
documents_001.zip -> extract zip file, add index for pdf documents.
Something like that. I think is is a common usecase and it would be really great to have that.
Using Lucene i would create that index in order to retrieve the id of the file/data and then get that file/data. But i didn't evaluate which technologie i would use for that and actually i haven't use
Lucene. As far as i know Solr is a full text search server and i would not recommend that because in my opinion a server is not a must have.
I hope this was understandable
from spring-data-mongodb.
Matthias Scudlik commented
I just had another idea that would be nice to have, but is probably hard: MimeType Sniffing for InputStreams that you can store
from spring-data-mongodb.
Oliver Drotbohm commented
Current draft is at this GitHub branch. Feedback is welcome!
from spring-data-mongodb.
Oliver Drotbohm commented
Just merged the initial draft into the master branch and deployed a snapshot build. No integration with any indexing support yet. We might want to create a separate ticket for that if there's demand
from spring-data-mongodb.
Related Issues (20)
- Redesign allowDiskUse attribute of Meta annotation.
- Nested JSON Handling Issue in Generic Class HOT 4
- Problem with replacement of @DBRef to @DocumentReference HOT 2
- AggregationResults include documents outside the MatchOperation when there is a MergeOperation present HOT 3
- Can't use properties in `@Indexed#expireAfter` HOT 1
- Align OffsetScrolling to zero-based indexes
- Criteria.regex is converted to String when applying to ID fields HOT 4
- Index creation not working in combination with multi tenancy HOT 3
- Adding `SECONDARY_READS` meta flag to the query does not affect the `readPreference` HOT 2
- Add support for value expressions in repository query methods
- Upgrade to MongoDB 5.0.1 driver
- Upgrade to MongoDB 4.11.2 driver HOT 1
- Disable auditable fields on query basis HOT 3
- Allow creating time series collections with custom name and CollectionOptions derived from annotation HOT 2
- Criteria.getCriteriaObject invalid when using Criteria.expr HOT 4
- Add a debug information for sort fields when using findByQuery
- Aggregation criteria match mapping fails with `NullPointerException` HOT 4
- Release 4.1.12 (2023.0.12)
- Release 4.2.6 (2023.1.6)
- Release 4.3 GA (2024.0.0)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spring-data-mongodb.