GithubHelp home page GithubHelp logo

Comments (5)

mam10eks avatar mam10eks commented on August 26, 2024

cc @Parry-Parry, @heinrichreimer.

from ir_axioms.

mam10eks avatar mam10eks commented on August 26, 2024

Alright, for pretokenized indexes, termpipelines= is in the index/data.properties file, and in this case ir_axioms uses a default term-pipeline that applies some normalization.

@heinrichreimer Do you have any preferences how we could solve this? E.g., so that it is usable but maybe still compatible with previous behaviour?

from ir_axioms.

Parry-Parry avatar Parry-Parry commented on August 26, 2024

@heinrichreimer @mam10eks So I assume the default pipe is stopwords, porter stemmer, this is always included in data.properties should shouldn't be an issue in the default case

from ir_axioms.

mam10eks avatar mam10eks commented on August 26, 2024

one possible suggestion could also be that we introduce a new PreTokenizedTerrierIndexContext that is a TerrierIndexContext and jst overrides the termpipeline property?

from ir_axioms.

janheinrichmerker avatar janheinrichmerker commented on August 26, 2024

I'd say it would be best to fix this in the PyTerrier backend here:

def terms(
self,
query_or_document: Union[Query, Document]
) -> Sequence[str]:
text = self.contents(query_or_document)
return self._terms(text)

Is there a PyTerrier API to access the pre-tokenized terms given the document ID?

from ir_axioms.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.