GithubHelp home page GithubHelp logo

Comments (8)

gpoore avatar gpoore commented on June 4, 2024

No, there is not per-chunk caching, since that's not really practical across languages.

In the future, it may be possible to have per chunk caching for some languages, as long as there is pre-existing software that can manage caching. I believe knitr has per-chunk caching for R and possibly Julia and Python...there might be a way to leverage some existing solutions.

from codebraid.

grwlf avatar grwlf commented on June 4, 2024

I am working on a project called Pylightnix which in theory should handle the required caching. If you don't mind, I could try to add this feature to the codebraid. I plan to find the place where python blocks are executed and attempt to wrap it into pylightnix "stages".
Update: I realized that one would need to save an internal state of the interpreter in order to solve the problem. This could really be hard, so now I am not so sure that I can handle it.

from codebraid.

gpoore avatar gpoore commented on June 4, 2024

I looked at Pylightnix, and was also wondering about dealing with global state. You can basically think about the code chunks as a list of strings, with each string being the code from a code chunk. If you can come up with a function that takes such a list of strings and executes them with caching, then that function can be incorporated into Codebraid. If you want to try to implement caching, I'd suggest working on a function that operates on a list like this first, before trying to build something within Codebraid itself. (Also, I'm working on adding new features to Codebraid that involve a lot of modifications, so the existing code is about to change significantly.)

There are a few ways that we might get some caching without full per-chunk caching. Let me know if any of these are of interest for what you are doing.

  • It would be possible for a session to depend on one or more other sessions. For example, you could put expensive calculations in a session that saves the output at the end, and then put visualization in a separate session that loads the saved output and plots it. The visualization session would specify that it depends on the calculation session. Any time the calculation session changes, it causes the visualization session to be re-executed as well. But the visualization session can be modified without affecting the calculations.
  • It would be possible to use a Jupyter kernel that is not restarted between document creation runs. Only modified code would be executed by the kernel. This would have the standard out-of-order execution downsides as a Jupyter notebook. However, it would also allow very fast iteration.

from codebraid.

grwlf avatar grwlf commented on June 4, 2024

Thanks for your advices. I agree that indeed it would be better for me to do a simplified proof-of-concept first. I thought a bit more on the problem: I don't like Jupyter because I think it is too heavy to be manageable. Instead, it may be just fine to open a pipe pointing to the python shell running in the background and save this pipe as a file. Then I could require users to pass the name of this file a an argument and call it a poor-man's serialization of the interpreter state:) The rest of the demo should not be hard - I think we could assume that (a) lines of code in each chunk are "prerequisits" for this chunk; (b) the output recevied from the pipe during the last execution of the chunk is the "artifact" of this chunke that needs to be cached (I'll ignore stderr for simplicity); (c) the job now is to build dependencies between chunks, e.g. by saying that each chunk depends on all previous chunks in a file.

That could be a bit fragile, but I think it could work.

from codebraid.

gpoore avatar gpoore commented on June 4, 2024

A pipe might work. Saving the pipe and then passing it as an argument for the next document build might not be necessary. I'm interested in adding a new mode where Codebraid runs continuously in the background and automatically rebuilds the document under various conditions. For example, when the document is saved it could be rebuilt with all code replaced by the text "waiting for results", and then every 10 seconds it could be rebuilt with all code results that are available by that time. This will ultimately allow for a (nearly) live preview mode.

from codebraid.

grwlf avatar grwlf commented on June 4, 2024

Got it. I'm aware that there are compilers which work this way, some of my colleagues used one for compiling Haskell code in the background. However, I have an impression that Python environment will never be stable enough to withstand a moderatly-long editing session: as an example, I have to restart my IPython console from time to time to let it re-load files and fix some internal problems with multiple versions of classes. Apart from these doubts, I agree that it could be a nice feature.

Meanwhile I've uploaded a small proof-of-concept application called MDRUN. It processes Markdown documents by sending code sections through the Python interpreter. It runs everything in one-pass, uses non-trivial POSIX-plumbing to keep the interpreter alive between sessions. It also uses Pylightnix for the per-chunk cache management, as planned. At every run the program evaluates only the changed sections and their successors.

An example input document is here and there is the result.

I'm going to keep the master branch of Pylightnix in a working state for some time, including this sample. Feel free to let me know when/if you think I could help with adding a similar feature to the codebraid.

from codebraid.

gpoore avatar gpoore commented on June 4, 2024

The current built-in code execution system is based on templates. The code from the Markdown document is extracted from the Pandoc AST, then inserted into templates to create a source file that is executed. For this approach, adding new code execution features means creating new templates. This isn't ideal for what you need.

I've been working on adding support for running code with interactive subprocesses like a Python interactive shell for some time. I'm currently in the midst of modifying the built-in code execution system to add better support for this as well as some async-related features. Once this is finished, adding new code-execution features will be possible by specifying an executable that will read code from stdin (or potentially a file) and write (properly formatted) code output to stdout (or potentially a file). This should make it straightforward to use a slightly modified version of your MDRUN.py with the built-in code execution system.

It will probably be at least a few weeks till the new features are finished...it's part of a larger set of features I've been working on for months. I will try to remember to add a note in this issue when that's available for experimentation. If I don't add a note in the next month or so, you might check back about progress.

from codebraid.

grwlf avatar grwlf commented on June 4, 2024

FYI The request to cache the results was mainly to enjoy the partial document evaluation. Now I've implemented the latter feature as a separate project, see LitREPL. The editor (currently - vim) sends the whole document to the backend which extracts code/result sections using a lightweight parser, pipes the code through the background interpreter (Python and IPython are supported), produces the result and finally sends the document back to the editor. The communication is performed via Unix pipes, thus, there is a POSIX-compatible OS requirement for now. I found that the Lark library greatly simplifies the parsing business. With its help, the tool supports both Markdown and Latex document formats. Feel free to borrow the code if needed, I've used the same BSDv3 license as you do in Codebraid.

from codebraid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.