GithubHelp home page GithubHelp logo

Comments (8)

jduss4 avatar jduss4 commented on August 17, 2024

Also related to #123

from datura.

techgique avatar techgique commented on August 17, 2024

The posting script could just skip indexing a file if it's already present and inform one in the report printed at the end. And maybe add a -f force option to overwrite? Could require the id be explicitly named with the -f option to be extra thorough and granular?

from datura.

jduss4 avatar jduss4 commented on August 17, 2024

@techgique you mean if the id belongs to a different collection? Because in general, we always want to override existing IDs since we would consider them to be an update. For example, re-indexing Cather Letters or something.

from datura.

jduss4 avatar jduss4 commented on August 17, 2024

I like the idea of a report or a -force override by ID. We could have a nuclear option somewhere, too, like if somebody had pushed a project under the wrong collection name and wanted to completely redo it by pushing to the correct collection, etc

from datura.

techgique avatar techgique commented on August 17, 2024

I wasn't thinking about the updating aspect at the time I wrote my comment, so good point. Without collection prefixing (an alternative to saying "namespacing"?), we wouldn't be able to tell if the id belongs to a different collection unless we open the file and look for evidence right? That could slow posting down more than we'd like though.

🤔 Not sure how else we'd tell it's a completely different file from a different project. Diff the files and if the count of different lines is over a certain percentage of the file's total line count? Perhaps file last modified time and index document posted/modified time could be of use in determining if a file contained updates for the index. It looks like you're trying to post an older file... 📎

Maybe only posting to production environment tries to do some of this fancier update-cautious checking that could be bypassed with a -f option?

from datura.

jduss4 avatar jduss4 commented on August 17, 2024

I haven't given this a ton of thought, but I think we could gather the ids that are going to be posted, send a request to elasticsearch to query all the ids returning only the fields id and collection, and then quickly see if any of those collections do not match the current name of the project. If so, we could filter those filenames out of the list that will be posted? I think if it was limited to one request and some filtering, it probably wouldn't take a super long time to do? It might require a little bit of re-architecting when and how things currently happen in the "data manager" class in the data repo...

from datura.

techgique avatar techgique commented on August 17, 2024

That sounds good to me. I'm less familiar with constructing the Elasticsearch queries, so I hadn't thought of doing it that way. 👍

from datura.

jduss4 avatar jduss4 commented on August 17, 2024

I think that you can query a list of things like that? I guess we'll find out :)

from datura.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.