Comments (6)
Thanks, this is very helpful! Charlie and I came up with a similar outline on Wednesday:
- Brief intro to Colab + Python (through text processing stuff on a sample MS MARCO passage)
- Explore the test collection, MS MARCO (this is where we want to use some of the functionalities from I-REX as you suggested)
- Indexing
- Interactive querying (similar to what we have in the README, again we can visualize the document vectors, similarities, etc.)
- Running batch experiment / evaluating TREC runs
- Re-ranking (with BERT might work, since we already have the Colab for the demo)
Do you have any other suggestions?
from pyserini.
Feedback inline:
- Brief intro to Colab + Python (through text processing stuff on a sample MS MARCO passage)
I assume this is things like tokenization, stemming, etc. with NLTK, Spacy, or whatever?
Do you want to do this from the raw JSON collection or fetching doc from the index?
Indexing
hrm... doing this from Python is going to be a challenge, I think...
Interactive querying (similar to what we have in the README, again we can visualize the document vectors, similarities, etc.)
Running batch experiment / evaluating TREC runs
Re-ranking (with BERT might work, since we already have the Colab for the demo)
sg
from pyserini.
Adding to the wish-list of functionalities: pull a raw document from the collection, demonstrate analysis (tokenization/stemming) with different analyzers (e.g., porter vs. krovetz).
from pyserini.
@lintool I have collection of some passages, how can I index it so that I can use functions in pysearch
.
could you please share me link to documentation which describe this step in pyserini
from pyserini.
@sipah00 easiest is probably this: https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-passage.md
Put it in the same format as the MS MARCO data, and then you can just reuse the instructions there and the "Pyserini demo on MS MARCO passage retrieval task" notebook here: https://github.com/castorini/anserini-notebooks
from pyserini.
Thank you
from pyserini.
Related Issues (20)
- Will different searcher and document_searcher affect the search results?
- Bug introduced by #1622 max_length in init_query_encoder HOT 1
- Normalize embeddings when using a custom dense encoder? HOT 3
- How to add stop words when building BM25 index?
- duplicate query encoder code
- Feature request: docker build for portability HOT 4
- test cases time out
- BM25 batch search with multi threads error: java.lang.OutOfMemoryError: Java heap space HOT 1
- Incorporate SPLADE++ ED BEIR regressions HOT 2
- How to build collections using msmarco and beir HOT 2
- How to get raw content HOT 4
- In Splade example for MS Marco evaluation why index 8.8M train passages and evaluate wiht 6980 queries from dev ?
- trec_eval error HOT 6
- LuceneSearcher + multiprocessing problem
- Upgrading to Pyserini 0.24 means `.raw` option not available. HOT 1
- `castorini/unicoil-d2q-msmarco-passage` referenced in documentation is missing HOT 1
- 'io.anserini.search.ScoredDoc' object has no attribute 'raw' HOT 1
- No matching jar found HOT 2
- ValueError: Topic beir-v1.0.0-robust04.test.splade-pp-ed Not Found HOT 1
- how to use trained retriever? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyserini.