entangled / entangled Goto Github PK

bi-directional tangle daemon for literate programming

License: Apache License 2.0

Haskell 75.89% Makefile 0.29% Shell 10.94% Dhall 11.15% Dockerfile 0.43% Nix 0.69% TypeScript 0.61%

entangled's Introduction

title	author
Entangled	Johan Hidding

Literate programming [/ˈlɪtəɹət ˈpɹəʊɡɹæmɪŋ/]{.phonetic} (computing) Literate programming is a programming paradigm introduced by Donald Knuth in which a program is given as an explanation of the program logic in a natural language, such as English, interspersed with snippets of macros and traditional source code, from which a compilable source code can be generated. (Wikipedia)

In short: you write Markdown containing code fragments. These code fragments are combined into working code in a process called tangling.

Entangled makes writing literate programs easier by keeping code blocks in markdown up-to-date with generated source files. By monitoring the tangled source files, any change in the master document or source files is reflected in the other. In practice this means:

Write well documented code using Markdown.
Use any programming language you like (or are forced to use).
Keep debugging and using other IDE features without change.
Generate a report in PDF or HTML from the same source (see examples at Entangled homepage).

Status

Entangled is approaching 1.0 release! It has been tested Linux, Windows and MacOS. Still, it is highly recommended to use version control and commit often. If you encounter unexpected behaviour, please post an issue and describe the steps to reproduce.

Features:

live bi-directional updates
(reasonably) robust against wrongly edited source files
configurable with Dhall
hackable through SQLite
create PDF or HTML pages from literate source
line directives to point compilers to markdown source

Building

Entangled is written in Haskell, and uses the cabal build system. You can build an executable by running

# (requires cabal >= 3.x)
cabal build

Install the executable in your ~/.local/bin

cabal install

Run unit tests

cabal test

Using

Entangled should be run from the command-line. The idea is that you run it from the root folder of the project that you're working on. This folder should contain a entangled.dhall file that contains the configuration. You can get an example config file by running

entangled config

This config asumes you have the markdown files in a folder named ./lit, and stores information in a SQLite3 database located at ./.entangled/db. To run the daemon,

entangled daemon [files ...]

where the [files ...] bits is sequence of additional files that you want monitored.

Syntax (markdown side)

The markdown syntax Entangled uses is compatible with Pandoc's. This relies on the use of fenced code attributes. To tangle a code block to a file:

``` {.bash file=src/count.sh}
   ...
```

Composing a file using multiple code blocks is done through noweb syntax. You can reference a named code block in another code block by putting something like <<named-code-block>> on a single line. This reference may be indented. Such an indentation is then prefixed to each line in the final result.

A named code block should have an identifier given:

``` {.python #named-code-block}
   ...
```

If a name appears multiple times in the source, the code blocks are concatenated during tangling. When weaving, the first code block with a certain name will appear as <<name>>=, while consecutive code blocks with the same name will appear as <<name>>+=.

Please see the Hello World and other examples!

Syntax (source side)

In the source code we know exactly where the code came from, so there would be no strict need for extra syntax there. However, once we start to edit the source file it may not be clear where the extra code needs to end up. To make our life a little easier, named code blocks that were tangled into the file are marked with a comment at begin and end.

// ~|~ begin <<lit/story.md|main-body>>[0]
std::cout << "Hello, World!" << std::endl;
// ~|~ end

These comments should not be tampered with!

Running `entangled`

Assuming you have created a Markdown file, say program.md, you can start entangled by running

entangled daemon ./program.md

in the shell. You may run entangled --help to get help on options, or check out the user manual.

Running `entangled` with Docker

Entangled is available as a Docker image.

Assuming you have created a Markdown file, say program.md, you can start entangled by running

docker run --rm --user $(id -u):$(id -g) --volume $PWD:/data nlesc/entangled daemon ./program.md

This command starts a Docker container with the current working directory mounted as /data and running with your user/group id so files are written with the correct ownership.

Distribution

If you've written a literate code using Entangled and would like to distribute it, one way is to include the tangled source code in the tar ball. You may also wish to use the pandoc filters included in entangled/filters.

Development

Credits

The following persons have made contributions to Entangled:

Michał J. Gajda (gh:mgajda), first implemented the line-directive feature
Danny Wilson (gh:vizanto), first implemented the project annotation

Generating manpage

pandoc lit/a2-manpage.md -s -t man | /usr/bin/man -l -

License

Entangled is distributed under the Apache v2 license.

entangled's People

Contributors

Stargazers

Watchers

Forkers

vizanto tauoverpi merijn sverhoeven mgajda justjosias flow-luka cffbots mbrukman vendored aadorian cshintov swoldanski wietze5

entangled's Issues

develop branch: fix tangling from multiple files with identical names

Update to newer files when daemon starts

Currently, when entangled is started, the markdown is king, even when the markdown file is older then the source files. This is annoying (to say the least) if you start the daemon after you accidentally edited files without the daemon running.

Create folder if it doesn't exist

currently tangling to files in non-existing folders crashes the daemon

develop: implement "print default config" option

tangling goes into unending loop with self-reference

Scriptable tangle output

In my repo I would like to make a git commit hook which will run tangle and stage any written and generated files into the commit. However to know which files to stage I need to parse an output like

• Writing 'src/py/tasks.py'
• Creating 'src/py/templates/form.html'

Which is hard to parse, can an option like --plain be added to the tangle subcommand that will generate easier to parse output for example

src/py/tasks.py
src/py/templates/form.html

documentation: restructuring

The documentation needs restructuring for 1.0 release.

Hello, World
Basic use
Literate programming techniques
Examples, go to separate repository: entangled/examples

develop: improve literacy

Much of the code is still undocumented

Why do files have an end marker?

First of all, great project!

I've just started experimenting to see what the limitations are.

Question: why do files have an end marker if it's unsafe to put things after it anyway?

Details

With a code block as follows:


    ```c++ {file=test.c}
    contents here
    ```

This file is generated:

// ------ language="C++" file="test.c"
contents here
// ------ end

Now changing the test.c file to the following:

// ------ language="C++" file="test.c"
contents here!
// ------ end
this will get deleted

the markdown gets the ! as expected.


    ```c++ {file=test.c}
    contents here!
    ```

All good so far.

Adding 2 more ! to the markdown, and then saving, the file ends up like this:

// ------ language="C++" file="test.c"
contents here!!!
// ------ end

Subcommand to verify tangle is up to date

In a CI job I would like to verify that the Markdown and tangled files are in sync with each other.

At the moment I am running the pandoc filter followed by git diff-index --quiet HEAD --. The git command will exit 1 when there are uncommitted files.

I would like to have a verify subcommand which will verify that all tangle files are updated. It should print the files that are not up to date and exit with 1 when not all files are up to date.

develop: better unit testing

target file is not removed, even remains in memory if renamed

Warn about duplicate file=Target.hs statement

If file=Target.hs is used for two code fragments, enTangleD will update change both when the source file is edited. This is confusing, if the name clash was accidental.

One should probably issue a warning.

List code fragments that are not included anywhere

When the #code-fragment is not ever mentioned, it goes to no output file at the moment.
It would be nice to print list of fragments that are not mentioned, so that author can fix it pre-emptively.

with only whitespace in between only every other file gets tangled

enable github-zenodo integration

Hi Johan,
would you please enable the GitHub-Zenodo integration so that we can harvest citation data and improve Search Engine Optimization on https://research-software.nl/software/entangled ? Thanks

For reference, https://guides.github.com/activities/citable-code/

You may need to click Grant here https://github.com/settings/connections/applications/c04ff9cf27ed8474bc1c

switch to stdio or rio

The stdio library is more complete (though lower level) than directory ~~and filepath~~.
Actually filepath does the very useful job of manipulating FilePaths in a cross-platform manner.
This requires building with GHC 8.6, due to dependencies in stdio -> primitive -> base < 4.13.
A good reference: https://www.fpcomplete.com/blog/enhancing-file-durability-in-programs

Named reference used in 2 output files: edited in one does not update the other

Setup

example.md

```js {#some-reference}
This code block is tangled into multiple files
```

```js {file=one.js}
<<some-reference>>
```

```js {file=other.js}
<<some-reference>>
```

Bug

Edit one.js.

The markdown file will be updated correctly
one.js will remain as is
bug:other.js will not be updated

Output

* [11:38:35] Initializing
    * Monitoring 'example.md'
    * Creating 'one.js'
    * Creating 'other.js'

* [11:39:31] Untangling 'one.js'
    updated NameReferenceId "some-reference" "example" 0
    * Overwriting 'example.md'

feature request: ignore blocks

It is common to include smaller code blocks within an enumerated or itemized list in Markdown:

1. This is an item
2. This is another item
    ```{.haskell #myref}
    function2 x = x+1
    ```

In this case, new version of enTangleD will not find the #myref, since it is indented.

Unable to run examples (using .deb release)

$ docker run -it --rm ubuntu:bionic /bin/bash
$ apt-get update && apt-get install -y curl libatomic1 git
$ TMP_FILE=`mktemp`; curl -SL https://github.com/entangled/entangled/releases/download/v0.2.0/entangled_0.2.0.0_amd64.deb --output $TMP_FILE; dpkg -i $TMP_FILE; rm $TMP_FILE
$ git clone https://github.com/entangled/entangled.git
$ cd entangled/examples/99-bottles
$ entangled 99-bottles.md
enTangleD, version 0.2.0: https://jhidding.github.io/enTangleD/
Copyright 2018-2019, Johan Hidding, Netherlands eScience Center
Licensed under the Apache License, Version 2.0

entangled: <stdout>: commitAndReleaseBuffer: invalid argument (invalid character)

It also fails to reset terminal the color afterwards.

How to use this with haddock/doxygen/...?

I'm wondering how to use entangled with haddock for example. I don't really want the full API doc duplicated in the markdown (except maybe for the parameter docs Int -- ^ number of people). But I also don't want people to look up the entangled doc just to know how to use the API and it should render properly on hackage.

How to deal with this?

Make it possible to write scripts with shebang #! line

Currently the first line in the code block ends up third in the program output. Maybe there's a way to add support for setting a header area that goes before the first entangled comment. The following input would then give:

The most useless program

``` {.python file=myscript header=1}
#!/usr/bin/env python3

import sys

sys.exit(0)
```

generating the file myscript

#!/usr/bin/env python3
# ~\~ language=Python filename=myscript header=1
# ~\~ begin <<lit/index.md|myscript>>[0]

import sys

sys.exit(0)
# ~\~ end

The parser can see that the first line belongs to the #myscript block.

develop: port source line pragma and project annotation features from master

tangle: reference not found crashes daemon

When we run the daemon, we expect to change and improve the source file.
It will often happen that references are missing, we need to improve them, and get the update.
However, latest version of enTangleD crashes when reference is not found, instead of watching for changes and updating the errors list:

Entangled, version 1.1.0: https://entangled.github.io/

entangled: TangleError "TangleError \"reference not found: ReferenceName {unReferenceName = \\\"treelike\\\"}\""

Commit a60d8d6acbf4b622bb92873c9535c627158a84ca.
We can also note that the error is printed with unnecessary double-quoting and double-escaping \\\"

list/tangle subcommands do not work without existing db

I tried to run the run entangled list and entangled tangle -a on files in gist https://gist.github.com/sverhoeven/30ffad9797a51fbf4ff0b45ad78a05c7

I expected the commands to tangle or list the test.sh file, but test.sh file is not generated or listed.

If I first run entangled daemon then remove test.sh file then the list and tangle -a commands work like expected.

The .entangled/db.sqlite does get created when running non daemon sub commands, but only running daemon inserts rows.

Option to run non-daemonised, once

During git rebase -i or in CI it's useful to be able to tangle/untangle once, without having to hit CTRL-C.

Support output languages without comments

JSON does not allow comments, and it can be used for configuration or test example.
It would be nice to allow a language that does not have any comments at all.

An alternative way of handling this would be to allow to define per-language postprocessor.

develop: get rid of compiler warnings

develop: unit test daemon

develop: switch to RIO completely

Currently the code is mixed standard Prelude / RIO. All the code should import RIO.

Print understandable error if no config is present

develop: make `entangled tangle -a` verbose

develop: delete target files that have gone

If a target file is renamed, the old target needs to be removed.

Include language highlighting for other renderers than pandoc

In info string the code snippet does not get highlighted on GitHub or VS code because they don't understand {.python #bla}, can we change entangled and filter to also accept a infostring that works also for GitHub and VScode

Replace INotify with FSNotify

System.FSNotify

develop: daemon, eliminate `wait` statements

Currently, wait statements are used, just pausing the program for 0.1 sec before resuming fsnotify. This is to give the filesystem time to sync, but this is not a very good solution.

develop: pretty logging messages

develop: deduplicate code between command-line and daemon

develop: add trailing newline when writing files

Broken link in README

The link to Hello World in README.md is broken.

Makefile unknown language

When trying to run entangled v0.2.0 from Debian package it fails to support a makefile.

I tried to run entangled README.md INSTALL.md on cpp2wasm repo, but got error

Error tangling 'Makefile': Error: unknown language: <unknown-language>

The INSTALL.md includes a fenced code attribute of {.makefile file=Makefile}.

In the entangled code of v0.2.0 I see that this language is supported.

I was expecting a Makefile to be generated, but it has not.

Improve default style-sheet

The generated HTML could look a bit tighter, maybe some material-ui theme would improve it.

entangled.Annotate.Project puts link outside comment

I set annotate = entangled.Annotate.Project in my config then a result JS comments looked like

/* ~\~ begin <<README.md|react-state>>[0] */ project://README.md#1002

My web browser can not parse that.

I expected it to look like

/* ~\~ begin <<README.md|react-state>>[0] project://README.md#1002 */

It seems to me the project directive is put after the entangled.Comment.Block:end instead of before.

Workflows: get gnuplot example to work

Supporting workflows is a task in itself, but for starters, it would be nice to get a Gnuplot script to expand into a plot.

develop: test editing target code with duplicate snippets

The stitcher should recognize which of the duplicate snippets has been edited, and update to the changed version. If multiple instances have been edited, the operation should bounce.

Unable to build: AesonException "Error in $.packages.cassava.constraints.flags['bytestring--lt-0_10_4']: Invalid flag name: \"bytestring--lt-0_10_4\""

Steps to reproduce:

$ docker run -it --rm ubuntu:bionic /bin/bash
$ apt-get update && apt-get install -y git haskell-stack ghc pandoc
$ git clone https://github.com/entangled/entangled.git
$ cd entangled
$ stack build
Downloaded lts-12.9 build plan.    
AesonException "Error in $.packages.cassava.constraints.flags['bytestring--lt-0_10_4']: Invalid flag name: \"bytestring--lt-0_10_4\""

$ stack --version | head -n1
Version 1.5.1 x86_64
$ stack upgrade
$ /root/.local/bin/stack --version
Version 2.1.3, Git revision 636e3a759d51127df2b62f90772def126cdf6d1f (7735 commits) x86_64 hpack-0.31.2
$ /root/.local/bin/stack build
Could not parse '/entangled/stack.yaml':
Aeson exception:
Error in $.packages[1]: failed to parse field 'packages': expected Text, encountered Object
See http://docs.haskellstack.org/en/stable/yaml_configuration/

Pass source line number to the compiler

Currently language-specific configuration only includes a comment prefix, like:

-- -------

However, that means that the line number and source file name are not passed to the compiler,
so compiler error line is mangled.
Haskell has the compiler directive:

{-# LINE 42 "foo.hs" #-}

C/C++ has a similar:

#line 42 "foo.c"

Including this line number would mean that compiler output could automatically jump to the line number with the error, and the pace of work would become faster.