GithubHelp home page GithubHelp logo

Comments (6)

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Matthias Scudlik commented

fast content search would be great

from spring-data-mongodb.

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Mark Pollack commented

can you elaborate a bit? Something like this http://nosql.mypopescu.com/post/383437318/integrating-mongodb-with-solr ?

I believe that mongodb itself will be offering some text search feature in the future but that might not be around gridfs....

from spring-data-mongodb.

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Matthias Scudlik commented

Since GridFS is dealing with binary data it certainly makes not sense for all kind of data.

I suggest that there should be standard functionality for text files (text, xml, html,..)
and the possibility to add "adapters" for custom data.

This should probably be done by indexing the binary content. The "adapter" should be
able to add custom indexes. For example if you have a pdf document you could implement a
custom adapter that opens the pdf and adds the text of the pdf to the index.

On the other hand you could also have an zip archive that has some content you are looking for.
For the standard functionality the mimetype (GridFS is aware of that) should be enough (text, xml,..) to be able to determine how the index should be created.

For zip files the mimetype is not enough. Imagine you have different kinds of zip archives.
One may contain images, another may have word or openoffice documents or even a mix.
So my idea is that you can add such a custom adapter depending on the mimetype and a filepattern.

iamges_001.zip -> dont index
documents_001.zip -> extract zip file, add index for pdf documents.

Something like that. I think is is a common usecase and it would be really great to have that.

Using Lucene i would create that index in order to retrieve the id of the file/data and then get that file/data. But i didn't evaluate which technologie i would use for that and actually i haven't use
Lucene. As far as i know Solr is a full text search server and i would not recommend that because in my opinion a server is not a must have.

I hope this was understandable

from spring-data-mongodb.

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Matthias Scudlik commented

I just had another idea that would be nice to have, but is probably hard: MimeType Sniffing for InputStreams that you can store

from spring-data-mongodb.

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Oliver Drotbohm commented

Current draft is at this GitHub branch. Feedback is welcome!

from spring-data-mongodb.

spring-projects-issues avatar spring-projects-issues commented on July 21, 2024

Oliver Drotbohm commented

Just merged the initial draft into the master branch and deployed a snapshot build. No integration with any indexing support yet. We might want to create a separate ticket for that if there's demand

from spring-data-mongodb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.