GithubHelp home page GithubHelp logo

Comments (29)

UnixJunkie avatar UnixJunkie commented on July 21, 2024 1

@mooreryan the solution proposed by Thierry is very simple: it ensures that import_module will be evaluated only once.
Then you don't have to care about Py.initialize and its parameters.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024 1

The current function import_module must me renamed to something like import_module_internal.
Then you define:

let lazy_import_module = Lazy.from_fun import_module_internal
let import_module () =
  Lazy.force lazy_import_module

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024 1

The Lazy module provides a functionality which is sometimes called memoization.
I use this module very rarely, but it has its uses.
Reading the doc is recommended: https://ocaml.org/api/Stdlib.Lazy.html
In Haskell, everything is lazy; in OCaml, everything is strict.
But, the Lazy module allows to make parts of OCaml lazy (a value is only evaluated/computed if it is really required, and only once).

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024 1

Maybe there is a performance difference, but you'll have to bench to measure it.
I cannot tell in advance.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024 1

Thanks a lot Ryan!

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Huh...that's pretty neat. So how would you hope to use that flag? Something like this...

$ pyml_bindgen specs.txt toto Toto --embed-python-source /path/to/toto.py

Also, do you know why the Py.compile function needs the filename?

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

(more info for reference)

thierry-martinez/pyml#25

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Here is the relevant bit about filename from the python docs

The filename argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ('<string>' is commonly used).

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

Your proposed command-line example looks good.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Now that I'm thinking of it...I'm wondering about whether calling Py.compile and exec_code_module on every function call is a good idea. Pretty sure those don't memoize/cache the results.

Currently import_module () is called on every function call that pyml_bindgen generates. That shouldn't be that big of a deal as python caches the imports (as far as I know...haven't benchmarked it).

I haven't benched it to see how much of a problem it is, but now I'm wondering.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

Currently, I see that import_module() is just called by the generated __init__ method.
In the worst case, the result of import_module() could be stored in a variable of the generated module.

let imported_module = import_module ()

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Ah sorry I misspoke...the import_module isn't called on all function calls...eg it isn't for bound class methods, but it is on certain ones...like module associated functions (e.g., like this).

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

So I've got a first take at this feature on the embed-source branch.

There are some cram tests for it now, but if you would be willing to try it out, that would be great.

In the meantime, I will do a quick benchmark to see if that repeated import_module nonsense will be a problem.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

Yeah, I would suggest doing the import_module call just once, at module initialization.
Cash it in a module variable; e.g. imported_module, then use this variable instead of
more calls to import_module()

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

I'll do some tests.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

I don't think you git pushed your commits into this branch 'embed-source'.
I don't see the new command line option in pyml_bindgen --help.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

ok, after a git pull inside the right branch, I got the commits.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

It works as well as the version I had hand-crafted.
Maybe, once you can cache the result of import_module(), it would be worth making a new release in opam.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Hmm, I will have to think on this for a bit. If we just run the import_module function, for example something like this:

let imported_module = import_module ()

It will fail at runtime as the import_module function needs the Python interpreter to already be initialized (eg with Py.initialize), which is why it currently is functional value rather than just having it in the module directly.

I could add a guard like, if not (Py.is_initialized ()) then Py.initialize ..., but I don't really want initialize statements scattered throughout the modules, not to mention that function has many arguments that specific users may want to change.

Let me know if you have any ideas.

from ocaml_python_bindgen.

thierry-martinez avatar thierry-martinez commented on July 21, 2024

We could perhaps define import_module as a Py.Object.t Lazy.t instead of a unit -> Py.Object.t.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Current fix

So here is what I have so far (this was before I saw @thierry-martinez mention lazy as a possible solution).

This in cases where the source is not embedded

    let imported_module =
      if not (Py.is_initialized ()) then Py.initialize ();
      Py.Import.import_module "adder"

This (more or less) in cases where source is embedded

    let imported_module =
      if not (Py.is_initialized ()) then Py.initialize ();
      let source =
        ...
      in
      let filename =
        ...
      in
      let bytecode = Py.compile ~filename ~source `Exec in
      Py.Import.exec_code_module
        ...

Bench

And then I benchmarked the new "cached" versions against the original in both cases.

┌───────────────────────────┬──────────┬─────────┬────────────┐
│ Name                      │ Time/Run │ mWd/Run │ Percentage │
├───────────────────────────┼──────────┼─────────┼────────────┤
│ Adder.add                 │   2.49us │ 103.00w │      6.36% │
│ Adder_embedded.add        │  39.12us │ 187.00w │    100.00% │
│ Adder_cached.add          │   1.20us │ 100.00w │      3.07% │
│ Adder_cached_embedded.add │   1.19us │ 100.00w │      3.03% │
└───────────────────────────┴──────────┴─────────┴────────────┘

As you see the cached versions are much better, even speeding up the original. So perhaps we don't even need to bother with lazy at all.

Specifying args to Py.initialize

So that actually seems pretty good. The question remains how to let users provide args to Py.initialize.

I'm thinking the default could be this maybe Py.initialize ~version:3 () (or maybe without version...idk). Then either pass in the remaining arguments as optional CLI arguments to pyml_bindgen (not a bad idea, but it does add 6 new flags for pyml_bindgen), or let the user specify the call directly...something like this

$ pyml_bindgen ... --initialize 'Py.initialize ~interpreter:"python3" ~version:3 ~minor:5 ~debug_build:true ()'

And just pass that as is into the generated code...that does seem a bit error prone however.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Interesting...I have never used Lazy.t in ocaml....how would it it be used here?

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Ahh interesting...do you think there's any (practical) difference between the above

let imported_module = lazy (...)

And then whenever its used (e.g., Py.Module.get imported_module "sparkles", replacing that with Lazy force like so

Py.Module.get (Lazy.force imported_module) "sparkles"

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Reading the doc is recommended: https://ocaml.org/api/Stdlib.Lazy.html

Yep I was just on there...I've heard Lazy module mentioned a few times but never had a chance to try it...always have been using thunks to delay computation until now.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

Maybe there is a performance difference, but you'll have to bench to measure it. I cannot tell in advance.

Gotcha...was mainly wondering if one was more common to see or not. I will see how it goes now.

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

I benched it and it didn't seem to make a difference. However there is this line from the docs

It should only be used if the function f is already defined. In particular it is always less efficient to write from_fun (fun () -> expr) than lazy expr.

So I went with something like that.

The CI workflow is passing now, so all that's left is to make some examples and docs about it, and then cut a release.

Of course, if you have some time, feel free to give it a shot (this time its on a different branch, embed-source2).

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

No problem, I will test.

from ocaml_python_bindgen.

UnixJunkie avatar UnixJunkie commented on July 21, 2024

It works as expected; import_module() is now just Lazy.force of something.
Great job! 🥇

from ocaml_python_bindgen.

mooreryan avatar mooreryan commented on July 21, 2024

This has been merged into the main branch...and included in a new release submitted to opam. Thanks for the helpful discussions!

from ocaml_python_bindgen.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.