GithubHelp home page GithubHelp logo

Comments (8)

AmenRa avatar AmenRa commented on July 30, 2024 1

Hi, and thanks for your interest in ranx!

I think you are missing that all numba-based ranx functions need to be compiled the first time you use them (there is a disclaimer on top of each of my notebooks about that).
Also, Google Colab is very slow in compiling them.

If you run your notebook again (without reloading Colab), you should notice an extremely lower computation time.

Unfortunately, I suspect you must recompile ranx functionalities every time you start a new Colab instance.
On your local machine, the compiled functions should be automatically stored for future usage by numba.

If you are stuck with Colab, I suggest you compile the functions with toy examples before using them with real-world data.
You should be absolutely fine then with 60k queries, especially with MRR, which is very optimized.

On my local machine, a four years old MBP, I get these execution times for 1M queries with 100 results each:

%%time
evaluate(qrels, run, ["mrr@1", "mrr@5", "mrr@10"])
CPU times: user 11.6 s, sys: 38.6 ms, total: 11.6 s
Wall time: 1.05 s
%%time
evaluate(qrels, run, ["map", "mrr", "ndcg"])
CPU times: user 25.8 s, sys: 73.8 ms, total: 25.9 s
Wall time: 2.3 s

Hope this answer your question.

Please, consider giving ranx a star if you like it!

Best,

Elias

from ranx.

celsofranssa avatar celsofranssa commented on July 30, 2024

Wow, it is taking forever in a real example with about 60k queries!
Am I missing something?

from ranx.

celsofranssa avatar celsofranssa commented on July 30, 2024

Hi @AmenRa,

That was the case. Thank you for your quick answer.
And it will be a pleasure to give ranx a star.

from ranx.

milyenpabo avatar milyenpabo commented on July 30, 2024

Hi All,

I ran into the same problem. I'm not using Colab or any kind of notebook, but run my eval as a Python script (will be part of an eval pipeline). For each execution, instantiating the Qrel objects takes a looooong time, even for very a tiny eval set. For a dict of ~10 entries, the Qrel object creation takes 10-20 seconds on a beefy machine.

Is there a way to speed this up? E.g., @AmenRa you say:

"On your local machine, the compiled functions should be automatically stored for future usage by numba."

I'm afraid this is not happening. Any hints on how to check/fix this?

from ranx.

AmenRa avatar AmenRa commented on July 30, 2024

@milyenpabo Have you already tried with a dummy Qrels?
Could you please post a sample of your specific Qrels without any modification?

from ranx.

milyenpabo avatar milyenpabo commented on July 30, 2024

Thanks @AmenRa for picking this up quickly. I distilled a minimal example:

#!/usr/bin/env python3

from logger import log
from ranx import Qrels
import time

qrels_dict = {}
qrels_dict['test-query'] = {
    'word0' : 1,
    'word1' : 1,
    'word2' : 1,
    'word3' : 1,
    'word4' : 1,
    'word5' : 1,
    'word6' : 1,
    'word7' : 1,
    'word8' : 1,
    'word9' : 1
}

t = time.time()
log.info('Crearting a small Qrels object.')
qrels = Qrels(qrels_dict, name='Test')
log.info(f'Qrels object created in {time.time() - t:.2f} seconds.')

This program runs for roughly 17 seconds, the output is:

[INFO] 2023-11-14 20:30:44.025 generated new fontManager
[INFO] 2023-11-14 20:30:44.554 Crearting a small Qrels object.
[INFO] 2023-11-14 20:31:01.500 Qrels object created in 16.95 seconds.

I'm using ranx-0.3.18.

I also ran the above program with DEBUG logs, and from that, it seems that indeed some compilation-related things are eating up the 17 seconds. I attach the log file (7 MB, 64k lines):

ranx.log

I suspect I'm missing some basic numba knowledge here. I appreciate if you can provide any hints on how to fix this issue.

from ranx.

milyenpabo avatar milyenpabo commented on July 30, 2024

Ok, I'm reading up on Numba a bit, and I found the root cause.

  1. I found an option to disable JIT compilation:

https://numba.readthedocs.io/en/stable/user/troubleshoot.html#disabling-jit-compilation

I tried it, and my test program gets significantly faster:

[INFO] 2023-11-14 20:50:48.255 generated new fontManager
[INFO] 2023-11-14 20:50:48.785 Crearting a small Qrels object.
[INFO] 2023-11-14 20:50:48.785 Qrels object created in 0.00 seconds.

So, this is good for a quick fix. Although, this way I guess we lose the benefits of Numba for larger eval sets? So, a less pressing question is: is there a way to compile once and reuse for subsequent runs?

  1. I found the cache=true option in the Numba docs, precisely to allow for reusing the numba-compiled code:

https://numba.readthedocs.io/en/stable/user/faq.html#there-is-a-delay-when-jit-compiling-a-complicated-function-how-can-i-improve-it

Checking the ranx source, it seems it uses the cache=True option (most of the time):

https://github.com/search?q=repo%3AAmenRa%2Franx+jit&type=code

  1. I realized I left out a seemingly minor detail from my previous post: I run the program in a Docker container. So, it might be that every time the program terminates and the container is stopped and removed, I just lose the Numba cache with it? After a bit of digging, I found a way to specify the cache directory:

https://numba.readthedocs.io/en/stable/reference/envvars.html#numba-envvars-caching

Interestingly, the default should have worked well, because I'm volume mounting the program directory from the host... then, I realized that I'm mounting the directory in read-only mode, so the Numba cache cannot be written at all... Removing the read-only flag from the volume mount fixes the entire issue:

[INFO] 2023-11-14 21:20:37.417 generated new fontManager
[INFO] 2023-11-14 21:20:37.943 Crearting a small Qrels object.
[INFO] 2023-11-14 21:20:38.781 Qrels object created in 0.84 seconds.

--

So, long story short: things should have worked out of the box, except they didn't... I'll leave this pitfall here, in case someone can learn from it later.

from ranx.

AmenRa avatar AmenRa commented on July 30, 2024

Hi, thanks for the information and debugging effort.
I bet it will be of help to other people.
If you like ranx, please give it a star.
Thank you!

from ranx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.