orgapp / orgajs Goto Github PK

View Code? Open in Web Editor NEW

612.0 16.0 63.0 52.57 MB

parse org-mode content into AST

Home Page: https://orga.js.org

License: MIT License

JavaScript 41.57% Shell 0.03% TypeScript 56.67% Astro 1.68% CSS 0.04%

org-mode ast javascript unified gatsbyjs

orgajs's Introduction

Orga

What Is It

Orga is a flexible org-mode syntax parser. It parses org content into AST (Abstract Syntax Tree 🌲). And it’s written in JavaScript.

Why

org-mode is simply a superior format than other more popular ones, but it’s mostly trapped inside of emacs. It’s so good that it was the #1 reason to learn and use emacs for a lot of people (me included). But it’s too good to not share with the rest of the world. If it can run in JavaScript, it can run on anything.

Compatible Eco-systems

It integrates natively with popular tools.

Unified

☔️ interface for parsing, inspecting, transforming, and serializing content through syntax trees

The orga parser is completely compatible with unified. Which means you get to take advantage of the works of others put into the pipeline. linting for natural language, correct your writing, write music? etc. Here is an example.

Webpack

@orgajs/loader is a webpack loader that made orga native citizen of webpack ecosystem. Coupled with plugins, it works smoothly. Take a look at the example project.

React

A JavaScript library for building user interfaces

You can render react components directly in your org file. Something like this:

* Hello World

Let's render *the box*.

#+begin_export jsx
<div style={{
  backgroundColor: 'gold',
  padding: '1em',
  border: '1px solid black',
  boxShadow: '5px 5px'
}}>I am a box with shadow</div>
#+end_export

Try it our yourself in the playground.

Gatsby

Build blazing fast, modern apps and websites with React

gatsby-plugin-orga is a powerful plugin that plug org-mode into gatsby system. This website is built with gatsby and gatsby-theme-orga-docs theme. Here is a minimal example project.

Nextjs

The React Framework

Because orga is native to webpack, it’s fairly simply to intergrate with nextjs. Example project.

Examples

Take a look at the collection of examples to quickly get started.

Contribute

See the contributing file for ways to get started.

orgajs's People

Contributors

Stargazers

Watchers

orgajs's Issues

Preserve as much of the original structure as possible

Hello there,

Thanks again for such an awesome project. It would be great to have an orga-stringify utility to fit more completely into the unified ecosystem and open ourselves up to using more transform tools. Then we could parse org files to an AST, transform them and then re-render the org. Ideally minimal transformations would re-render something pretty close to the original. To do that, we'd need to preserve as much of the original structure as possible.

I propose something like these changes. I'm after the effect more than the approach so I'm happy to discuss/modify/whatever. If you'd like me to make this a pull request, please let me know.

My thinking is that the extra structure in the AST can always be stripped when not needed. For instance, you could filter out whitespace/keyword nodes as well as trim() inner text if desired. But having it in the AST allows us to (nearly) faithfully re-render the original org file.

In there is a separate commit with the changes to the snapped files if you just want to see the effect on the AST. I think in a couple of cases it even renders a bit more accurately.

Again, more than happy to discuss.

Thanks again,
-Doug

P.S. I have a prototype for orga-stringify as well which I'll add to my fork as soon as I figure out how lerna works.

Expose meta content for filters

Problem

Currently, orga does not support variable graphql queries since only internal is exposed.

Details

For remark this is valid:

allMarkdownRemark(
      filter: { frontmatter: { category: { in: ["Pandas"] } } }
    ) {
      totalCount
      edges {
        node {
          frontmatter {
            title
            category
            date
          }
        }
      }
    }
}

However, the only option for filtering orga is allOrga ( filter: { internal: {content: { regex: "/CATEGORY:/"}}} )

Since currently query CategoryPage($category: String) { ( filter: { internal: {content: { regex: "/CATEGORY:/" regex: $category }}} ) also does not work.

Suggestion

If the meta object could be exposed to be similar to the remark frontmatter object, it would be greatly appreciated.

Tags can contain '@' symbol

According to docs: https://orgmode.org/guide/Tags.html

Should the term of description list display be bolded?

If I use the description list,

- Term 1 :: description
- Term 2 :: descrition

it will be rendering as below:

Term 1 :: description
Term 2 :: descrition

Maybe the term name display should be bolded, and just need one colon:

Term 1 : description
Term 2 : descrition

Orgmode Description list

[Question] Will support TOC?

In Emacs, I can export org file to HTML with Table of Contents, will it be supported in future?

newline breaks italic markup

I'm using Orga with GatsbyJS. I have the following in my post:

... or /The C++ Programming
Language/.

It is rendered as

... or /The C++ Programming Language/.

But if I remove the newline after Programming, it is correctly rendered as

... or The C++ Programming Language.

I suspect other formatting markup may be broken too.

Performance degrades quickly with line length and line count, possibly line content

I noticed that for some files (e.g. > 10k lines) the parser practically cannot complete. This seems to be a function of number of lines and line lengths. For example, a large file with many blank lines can still complete. I don't know if this is specific to orga, or to unifyjs.

Here's a quick test on how the total parsing time for a trivial file changes with the line length, content, and line count:

parse = require("orga").parse;

function pad(n, width) {
    n += '';
    return n.length >= width ? n : new Array(width - n.length + 1).join('0') + n;
}

[
    (n) => `_${n}___${n}_`,
    (n) => `(${n})_(${n})`,
    (_) => 'xxxxxx_xxxxxx',
    (_) => 'xxxxxx_xxxxxx######_######',
].forEach(function(templater) {
    for(nlines = 500; nlines <= 5000; nlines += 500) {
        var lines = [
        '* blah\n',
        ];
        for(var n=1;++n<=nlines;) {
            lines.push(templater(pad(n, 4)));
        }
        var text = lines.join('\n');
        var t0 = new Date().getTime();
        var ast = parse(text)
        var tN = new Date().getTime();
        var dt = tN - t0;
        var lastLine = lines[lines.length-1];
        console.log(`|\t${lastLine}\t|\t${nlines}\t|\t${dt}\t|`);
    }
});

Plotting the output shows this:

I was wondering whether the same regex issue was affecting likes w ith ( and ) but it seems like that is not the case, but the lines with parens are clearly slower than lines without parens.

This is the table from the script (first column is added afterwards):

|under| _0500___0500_              |  500 |   162 |
|under| _1000___1000_              | 1000 |   626 |
|under| _1500___1500_              | 1500 |  1376 |
|under| _2000___2000_              | 2000 |  2474 |
|under| _2500___2500_              | 2500 |  3865 |
|under| _3000___3000_              | 3000 |  5475 |
|under| _3500___3500_              | 3500 |  7521 |
|under| _4000___4000_              | 4000 |  9850 |
|under| _4500___4500_              | 4500 | 12427 |
|under| _5000___5000_              | 5000 | 15190 |
|paren| (0500)_(0500)              |  500 |   217 |
|paren| (1000)_(1000)              | 1000 |   858 |
|paren| (1500)_(1500)              | 1500 |  1955 |
|paren| (2000)_(2000)              | 2000 |  3483 |
|paren| (2500)_(2500)              | 2500 |  5439 |
|paren| (3000)_(3000)              | 3000 |  7850 |
|paren| (3500)_(3500)              | 3500 | 10711 |
|paren| (4000)_(4000)              | 4000 | 13873 |
|paren| (4500)_(4500)              | 4500 | 17560 |
|paren| (5000)_(5000)              | 5000 | 21619 |
|xxxxx| xxxxxx_xxxxxx              |  500 |   215 |
|xxxxx| xxxxxx_xxxxxx              | 1000 |   854 |
|xxxxx| xxxxxx_xxxxxx              | 1500 |  1923 |
|xxxxx| xxxxxx_xxxxxx              | 2000 |  3429 |
|xxxxx| xxxxxx_xxxxxx              | 2500 |  5471 |
|xxxxx| xxxxxx_xxxxxx              | 3000 |  7843 |
|xxxxx| xxxxxx_xxxxxx              | 3500 | 10564 |
|xxxxx| xxxxxx_xxxxxx              | 4000 | 13859 |
|xxxxx| xxxxxx_xxxxxx              | 4500 | 17445 |
|xxxxx| xxxxxx_xxxxxx              | 5000 | 21576 |
|xand#| xxxxxx_xxxxxx######_###### |  500 |   815 |
|xand#| xxxxxx_xxxxxx######_###### | 1000 |  3211 |
|xand#| xxxxxx_xxxxxx######_###### | 1500 |  7253 |
|xand#| xxxxxx_xxxxxx######_###### | 2000 | 12967 |
|xand#| xxxxxx_xxxxxx######_###### | 2500 | 20093 |
|xand#| xxxxxx_xxxxxx######_###### | 3000 | 28973 |
|xand#| xxxxxx_xxxxxx######_###### | 3500 | 39765 |
|xand#| xxxxxx_xxxxxx######_###### | 4000 | 51717 |
|xand#| xxxxxx_xxxxxx######_###### | 4500 | 65081 |
|xand#| xxxxxx_xxxxxx######_###### | 5000 | 80055 |

Is there an easy way to fix this?

robust timestamp parsing

Currently, the timestamp parsing is pretty simple, which does not support repeaters. Need a more robust version.

Another issue related to #4 is that orga does not handle repeated dates correctly. Actually for a date like this: DEADLINE: <2017-12-10 Sun .+1W>

from #36

unist-builder and prismjs should be dependencies?

just trying out the package (working towards this issue for a vscode plugin: vscode-org-mode/vscode-org-mode#133). Tests failed until I installed unist-builder and prismjs, so I'm assuming they need to be added to the dependencies list?

Broken links from gatsby documentation

Hello,

after transfer to orgapp organization, links from https://www.gatsbyjs.org/packages/gatsby-transformer-orga/ are not working. It took me some time to find documentation again, google wasn't much help :)

Looks like it is taking info from package.json as same info is presented on npmjs.

link in table does not work

with the lastest version 0.2.7 on npm, links in a table cell does not work

const { parse } = require(`orga`);

const orgLink = `[[https://some.link][Some Link]]`;

var ast = parse(orgLink);

console.log(ast.children[0].children[0]); // this works fine

const orgLinkInTable = `
| Text            | [[http://I.should.be.link][link in header]]                            |
|-------------+---------------------------------------------------------------|
| Text   Text  | [[https://I.should.be.link][link in row]]                                |
`;

ast = parse(orgLinkInTable);

console.log(ast.children[0].children[0].children[1].children[0]); // link in header
console.log(ast.children[0].children[2].children[1].children[0]); // link in body

output:

{ type: 'link',
  children: [],
  uri:
   { raw: 'https://some.link',
     protocol: 'https',
     location: '//some.link' },
  desc: 'Some Link',
  parent:
   { type: 'paragraph',
     children: [ [Circular] ],
     parent: { type: 'root', children: [Array], meta: {} } } }


{ type: 'text',
  children: [],
  value: '[[http://I.should.be.link][link in header]]',
  parent:
   { type: 'tableCell',
     children: [ [Circular] ],
     parent: { type: 'tableRow', children: [Array], parent: [Object] } } }


{ type: 'text',
  children: [],
  value: '[[https://I.should.be.link][link in row]]',
  parent:
   { type: 'tableCell',
     children: [ [Circular] ],
     parent: { type: 'tableRow', children: [Array], parent: [Object] } } }

ReferenceError: regeneratorRuntime is not defined

There is an error ReferenceError: regeneratorRuntime is not defined when gatsby develop with gatsby: 2.0.37, and without any problem on gatsby: 1.9.277.

bug: parser unrecoverably fails if there is a `#+TODO: (...)` directive

const { parse } = require("orga")
parse("#+TODO: (whatever)\n* something");

will throw

TypeError: Cannot read property 'value' of undefined
    at _parse (./orgajs/node_modules/orga/lib/inline.js:101:19)
    at parse (./orgajs/node_modules/orga/lib/inline.js:25:10)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/headline.js:49:32)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/blank.js:9:15)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/keyword.js:38:15)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.parse (./orgajs/node_modules/orga/lib/parser.js:80:15)
    at parse (./orgajs/node_modules/orga/lib/index.js:13:17)

This is unrecoverable, meaning that after getting this error, the parser will continue to throw the same error on any string with a headline, such as parse("* something"). This issue persists even if the orga is re-required after removing the require cache using delete require.cache[require.resolve("orga")]

I have not chased this very far and don't understand why the error persists across reimporting, but the undefined comes from somewhere around
https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/lexer.js#L116
where pattern for matching headline gets set to a regex containing (whatever) and basically fails for all headlines.

Thanks for your work.

Why no DEADLINE: and SCHEDULED: ?

I was just trying this on your site: https://orga.js.org

* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri> SCHEDULED: <2020-07-03 Fri>

I get:

{ type: 'root',
  children: 
   [ { type: 'section',
       children: 
        [ { type: 'headline',
            children: 
             [ { type: 'text',
                 children: [],
                 value: 'Review code',
                 parent: [Circular] },
               { type: 'planning',
                 children: [],
                 keyword: 'DEADLINE',
                 parent: [Circular] } ],
            level: 1,
            keyword: 'TODO',
            priority: 'B',
            tags: [ 'work' ],
            parent: [Circular] } ],
       level: 1,
       parent: [Circular] } ],
  meta: {} }

When I put a newline between the two keywords it looks a bit better, but the SCHEDULED is still not there:

* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri> 
SCHEDULED: <2020-07-03 Fri>

results in:

{ type: 'root',
  children: 
   [ { type: 'section',
       children: 
        [ { type: 'headline',
            children: 
             [ { type: 'text',
                 children: [],
                 value: 'Review code',
                 parent: [Circular] },
               { type: 'planning',
                 children: [],
                 keyword: 'DEADLINE',
                 date: Thu Jul 02 2020 00:00:00 GMT+0200 (Central European Summer Time),
                 end: undefined,
                 parent: [Circular] } ],
            level: 1,
            keyword: 'TODO',
            priority: 'B',
            tags: [ 'work' ],
            parent: [Circular] } ],
       level: 1,
       parent: [Circular] } ],
  meta: {} }

I am wondering why they do not get parsed as of https://orgmode.org/manual/Inserting-deadline_002fschedule.html

Did i understand something wrong, or is is a yet unimplemented feature or is it a bug?

Thank you for this work! It seems very promising!

Table header is not supported

Hi. Thanks for the great work on Orga.

I'm using gatsby-transformer-orga for my sites. Today I found that table headers are not supported. For example, the following table:

| A | B |
|---+---|
| 1 | 2 |

should translate to:

<table>
  <thead>
    <tr>
      <th>A</th>
      <th>B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>2</td>
    </tr>
  </tbody>
</table>

But gatsby-transformer-orga puts everything in <table><tbody>...</tbody></table>.

Custom URL for pathnames doesn't work after last upgrade

Hello,

after upgrade to latest version, I'm unable to use custom URL (as shown in https://github.com/xiaoxinghu/gatsby-orga/blob/master/gatsby-node.js where code is commented now).

My Org pages are stored in src/help and I want them accessible on https://siteurl/help/<title>/. Problem is that after migration from orga to OrgContent it looks like like slugs are created directly in plugin (owner of slug field is orga transform plugin) and I'm no longer able to add slug field through onCreateNode.

IMHO this should work:

const { createFilePath } = require(`gatsby-source-filesystem`)

// Add custom url pathname for blog posts.
exports.onCreateNode = ({ node, getNode, actions }) => {
  const { createNodeField } = actions

  if (node.internal.type === `OrgContent`) {
    const path = createFilePath({ node, getNode })
    const slug = `/help${path}`

    createNodeField({
      node,
      name: `slug`,
      value: slug,
    })
  }
}

But I'm getting error Error: A plugin tried to update a node field that it doesn't own

Is there any chance to get old behaviour back or is there correst way how to proceed? I tried many approaches so far without success.

Code block indentation

Problem

It seems that code blocks aren't stripped of (common) leading whitespace. This has the effect that as you get deeper down the hierarchy and content gets more and more indented, the exported code blocks also become more indented.

For instance, the below content would have each code block indented one space more than the previous.

* Header level one
  Content.
  #+begin_src elisp
    (message "one")
  #+end_src
  
** Header level 2
   Content
   #+begin_src elisp
     (message "two")
   #+end_src

*** Header level 3
    Content
    #+begin_src elisp
      (message "three")
    #+end_src

Solution?

I think it would make more sense if all code blocks had their shared leading whitespace removed. As an example, consider this bit of Org:

* Header level one
  Content.
  #+begin_src js
    const f = x => {
      return x * x
    }

  #+end_src

The code snippet should render flush against the left margin like this

const f = x => {
  return x * x
}

instead of indented like this.

    const f = x => {
      return x * x
    }

Notes

Is there a way to achieve this in Orga today? If not, I'd be happy to take a look at implementing this if you could point me in the right direction. (Though looking at the source, it looks a little intimidating; I've never really written a parser before, so it might take more time than expected.)

Thanks for the great package, by the way!

Parsing in emacs

Thanks for introducing me to unified. Your package looks great. Would it be possible to use org-mode's own parser to generate the unified syntax tree?

Parse elements inside drawers - clocks, properties

Hello, thanks for the great work on Orga.

Currently, property elements ("key: value" pair in Properties drawer) and clock elements ("CLOCK: [..]" are not recognized/parsed as specific types in the AST output.
It would be nice to have them parsed as a specific type (similar to tags, tables, list).

For example -the following orgmode file

* modularization of custom functions, speed-up and optimalization :optimalition:module:package:
:LOGBOOK:  
CLOCK: [2019-05-24 pá 11:18]--[2019-05-24 pá 11:18] =>  0:00
- general info on python modules and packaging
:END:      
:PROPERTIES:
:type:     enhancement
:dated:    <2017-12-02 so>
:END:
package various code snippets to the regular package
modules
  - string processsing
  - sklearn custom transformers

and its AST (in json)

{
    "type": "root",
    "children": [
        {
            "type": "section",
            "children": [
                {
                    "type": "headline",
                    "children": [
                        {
                            "type": "text",
                            "children": [],
                            "value": "modularization of custom functions, speed-up and optimalization",
                            "parent": "[Circular ~.children.0.children.0]"
                        },
                        {
                            "type": "drawer",
                            "children": [],
                            "name": "LOGBOOK",
                            "value": "CLOCK: [2019-05-24 pá 11:18]--[2019-05-24 pá 11:18] =>  0:00\n- general info on python modules and packaging",
                            "parent": "[Circular ~.children.0.children.0]"
                        },
                        {
                            "type": "drawer",
                            "children": [],
                            "name": "PROPERTIES",
                            "value": ":type:     enhancement\n:dated:    <2017-12-02 so>",
                            "parent": "[Circular ~.children.0.children.0]"
                        }
                    ],
                    "level": 1,
                    "tags": [
                        "optimalition",
                        "module",
                        "package"
                    ],
                    "parent": "[Circular ~.children.0]"
                },
                {
                    "type": "paragraph",
                    "children": [
                        {
                            "type": "text",
                            "children": [],
                            "value": "package various code snippets to the regular package modules",
                            "parent": "[Circular ~.children.0.children.1]"
                        }
                    ],
                    "parent": "[Circular ~.children.0]"
                },
                {
                    "type": "list",
                    "children": [
                        {
                            "type": "list.item",
                            "children": [
                                {
                                    "type": "text",
                                    "children": [],
                                    "value": "string processsing",
                                    "parent": "[Circular ~.children.0.children.2.children.0]"
                                }
                            ],
                            "ordered": false,
                            "parent": "[Circular ~.children.0.children.2]"
                        },
                        {
                            "type": "list.item",
                            "children": [
                                {
                                    "type": "text",
                                    "children": [],
                                    "value": "sklearn custom transformers",
                                    "parent": "[Circular ~.children.0.children.2.children.1]"
                                }
                            ],
                            "ordered": false,
                            "parent": "[Circular ~.children.0.children.2]"
                        }
                    ],
                    "ordered": false,
                    "parent": "[Circular ~.children.0]"
                }
            ],
            "level": 1,
            "parent": "[Circular ~]"
        }
    ],
    "meta": {}
}

Handle task progress indicator in headlines

Currently progress indicator is treated as plain text. [/] [%]. They should be separate elements within the AST.

Footnotes don't link back up to their 'usage'

Hey,

Just saw the comments around V2 and the good news around footnotes in #42. When trying out the example that was linked to, I noticed that the footnote doesn't link back up to where it was referenced in the text.

I think that the footnote should link back to all the places in the text where it has been referenced, so that the reader / end user can easily jump back to where it was used.

Just so we're clear about what I mean: here's an image of references from Wikipedia. The little ^ in front of every reference links back to the bit of the text that references it.

I suppose that if a footnote is referenced multiple times, then there should also be multiple targets you can jump to.

Thanks for all your hard work; it's much appreciated!

[Question] Can the Regexps be re-written to avoid backreferences?

I'm trying out orga for a project that would let me embed the parser in the client-side. Unfortunately, the regexps in the parser use backreferences - something only Chrome supports at the moment.

Is it possible to re-write those in a more browser-compliant way?

❤️

WIP code for cli interface

I noticed:

build a emacs-less cli for org-mode (working on it)

Wondering where I might find the WIP code.

Incorrect TODO parsing with multiple todo sets in one file

Often it may make sense to have multiple sets of todo keywords in a file, but orga handles this incorrectly, with only the last parsed set of TODO keywords taking affect

Example from the org mode docs ( https://orgmode.org/manual/Per_002dfile-keywords.html )

#+TODO: TODO | DONE
#+TODO: REPORT BUG KNOWNCAUSE | FIXED
#+TODO: | CANCELED

Given how orga's updateTODOs function works, the previous TODOs are simply replaced with the newer ones. Instead, they should be collected then, if none are found before parsing headlines, it should default to ["TODO", "DONE"]

global `timestamp` recognition

Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.

Since emacs recognize timestamps globally, we should be able to do it as part of our inline tokenization.

As requested at #36

Empty table cells are dropped

If I parse a table with empty cells, these are simply dropped and never make it to the ast. Here is an example:

> var stringify = require('json-stringify-safe');
undefined
> const parse = require('orga')
undefined
> stringify(parse.parse('|         | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
  '~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
> stringify(parse.parse('| nonempty | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"nonempty","parent":"[Circular ' +
  '~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"},{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
  '~.children.0.children.0.children.1]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
>

As you can see, the first parse only returns a single cell whereas the second returns two (which is the correct behaviour)

Table formula tags are not binded to the table node

#+TBLFM after a table should be inside the table node.

📢 v2 is on it's way!

I am spending a fair amount of time working on v2 for a while now. It's close to feature complete, so I think it's time to give you some ideas about what's been going on and where this project is heading.
TL;DR: it's going to be awesome. 👍

Why v2

As you can see there are some issues queuing up there, for different reasons. Lot's of them are due to the limitation of the current design. We hack around to fix them (thank you all for the contributions), but they are not ideal. I am currently working on an org-mode related iOS app on the side, so I have created a similar parser in swift (several times...). Switching to a very different language provides a much clearer sense of the thing you are trying to build. With the lessons learnt there, I decided to make orgajs great ~~again~~ (it was not bad before).

The Improvements

Strong Types
The project was converted to TypeScript in the cheapest way possible a couple of months back. (e.g. const data: any = {...}). Now it's much closer to a real TypeScript project. This will make collaboration a lot easier.
Full OAST Spec
Taking advantage of the type system, we now have the full spec of OAST (Orga Abstract Syntax Tree).
Position (line, column)
All tokens and nodes will have position info built-in. 🎉

{
  type: 'headline',
  position: {
    start: { line: 1, column: 0 },
    end: { line: 1, column: 15 },
  }
  ...
}

Flexible Parsing Process
orgajs use to heavily rely on complicated regex matching in the scope of individual lines of text. Now it's a much more flexible process. The benefits are:
- easier to implement/modify/reason parsing logic
- room for performance optimization
- tokens are not bound to lines
Better DX
Strong types, auto-completion, easy(er) to read codebase.
Test! Test! Test!
We need more tests. You want it to be more badass? write tests! I am also working on optimizing the test writing experience.

Breaking Changes

v2 is breaking compatibility with the current version. The broken parts are:

token names
syntax tree structure
nodes/tokens now have position info

Basically, everything that's important... The bright side is, that with monorepo setup, I can fix everything up pretty easily altogether.

The spec was loosely defined in v1, hopefully, with strongly typed components in v2, the API will have much better resilient against breaking changes in future releases.

Contribution

All contributions are welcome, it's non-trivial to write a parser for something as powerful as org-mode. orgajs need all the helps it can get from org-mode lovers. We need help with documentation, testing, bug fixes, new features ... While the architecture of this project solidifying during the development of v2, I can see a more collaborative future.

Implement 2-pass parsing to properly handle in-buffer settings that affect parsing of content

TODO is the only current in-buffer setting affected by this, but another unimplemented example would be PRIORITIES.

This is inconsistent with actual org-mode behavior and can cause issues in parsing org files, correct behavior requires two passes, one to discover in-buffer settings and another to actually parse the file into an AST.

Example:

> const { parse } = require('orga')
[...]
> parse('* TADA Header\n#+TODO: TADA | DANE').children[0].children[0].keyword
undefined
> parse('#+TODO: TADA | DANE\n* TADA Header').children[0].children[0].keyword
'TADA'

Allow adding unified plugins to AST processing?

Hey there,

Thanks again for an awesome project.

Do you think it might be a good idea to allow overriding the AST parsers/transformers for configurability? If someone wanted to touch up the produced HTML they could simply add a transformer plugin to edit the resulting tree. I'm thinking something like this change.

I don't have strong opinions for how to configure use of plugins (where they should be placed, how they are fetched, etc.). I don't have a need for this, so this is more food for thought than anything else. Browsing through the other issues, I wonder if something like this might put the power in the end user's hands. The obvious downside is the potential for incompatibility.

Please let me know what you think.

Regards,
-Doug

Missing features: checkboxes, and repeating timestamps

Hello! Thank you for the wonderful project that is orga. I am starting a project built on top of orga and so far it has been a pleasure to work with. I have had a few issues though, which are very minor. These include:

Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.
In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.
Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.
Another issue related to #4 is that orga does not handle repeated dates correctly. Actually for a date like this:

DEADLINE: <2017-12-10 Sun .+1W>

Orga completely ignores this and returns {keyword: "DEADLINE", type: "planning"} with no date stamp.

Really, thank you fr your work on orga. If you need any help fixing these issues I would be happy to be of assistance. Thanks!

bug: emphasize is not working for words with less than 2 characters

const { parse } = require('orga')

console.log(parse('*12*').children[0].children[0])
// { type: 'text',
//   value: '*12*',
//   ... }

console.log(parse('*123*').children[0].children[0])
// { type: 'bold',
//   children:
//    [ { type: 'text', children: [], value: '123', parent: [Circular] } ],
//   ... }

The example above is for bold, but this bug appears on all kinds of emphasize (bold, verbatim, underline, etc).

The problem is that the regex you're using expect at least two characters between each markups:

/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*12*')
// null

/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*123*')
// not null

Rows and offsets

I'm wondering, since I'd like to be able to use orga to parse text then decorate the original text without modifying it-- would there be some way for orga to keep track of which characters a specific node maps back to in the source text?

So for example, being able to say that row 6, characters 1 - 10, correspond to a bold node.

Did you ever attempt to go down that route, or from your perspective do you think it's feasible?

[Question] Converting GNU Extended Regular Expressions to PRE

Hi @xiaoxinghu

I am starting to work on the similar project, but in Rust. I stumbled upon your project and I was very happy to learn that I am not the only one crazy about org-mode.

The first thing I learned is that emacs uses different version of regexps (namely GNU ERE while all normal programming languages use Perl Compatible Regexps)

I am currently trying to go through https://code.orgmode.org/bzg/org-mode/src/master/lisp/org-element.el trying to write something something similar in Rust.

How did you convert regexps in org-element-paragraph-separate and org-element--object-regexp to PCRE? Did you use any automatic tools?

Secondly, since you basically did the same job of writing a parser - is there any advice you can share?
It will be my first attempt at writing a parser.

Thanks

The value of Org meta keys are overwritten if used multiple times

Issue

Given the following valid Org file which simply includes two other Org files

#+INCLUDE: file.org
#+INCLUDE: anotherfile.org

the following GraphQL entry is created:

"node": {
  "fields": {
    "slug": "/blog/test/"
  },
  "meta": {
    "include": "anotherfile.org"
  }
}

The reference to file.org is not saved. It seems to only save the last entry under that meta key.

According to the Org mode manual

Org . . . accepts multiple lines for a keyword

Expected behaviour:

The GraphQL entry should include both values under the key as an array:

"node": {
  "fields": {
    "slug": "/blog/test/"
  },
  "meta": {
    "include": [
      "file.org",
      "anotherfile.org"
    ]
  }
}

progress indicator in headlines

Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.

as requested at #36

No inline markup in blocks

This org source:

Becomes:

This might be the correct behaviour in #+BEGIN_EXAMPLE blocks, but I think it's pretty common to have emphasis in blockquotes.

Minimal example does not work

The minimal example on https://github.com/xiaoxinghu/orgajs/tree/master/examples/example

repro on node 10.6.0:

mkdir /tmp/orga-test
cd /tmp/orga-test
yarn add orga-unified oast-to-hast unified to-vfile vfile-reporter rehype-document rehype-stringify

cat > test.js <<EOF
var vfile = require('to-vfile')
var report = require('vfile-reporter')
var unified = require('unified')
var parse = require('orga-unified')
var mutate = require('oast-to-hast')
var stringify = require('rehype-stringify')
var doc = require('rehype-document')

unified()
  .use(parse)
  .use(mutate)
  .use(doc, {title: 'Hi!'})
  .use(stringify)
  .process(vfile.readSync('./README.org'), function (err, file) {
    console.error(report(err || file))
    console.log(String(file))
  })
EOF

cat > README.org <<EOF
* hello
  world
EOF

node test.js

results in

/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31
  var meta = tree.meta || {};
                  ^

TypeError: Cannot read property 'meta' of undefined
    at Function.toHAST (/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31:19)
    at freeze (/tmp/orga-test/node_modules/unified/index.js:123:28)
    at Function.process (/tmp/orga-test/node_modules/unified/index.js:360:5)
    at Object.<anonymous> (/tmp/orga-test/test.js:14:4)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)

Would you be able to provide some updated guidance on how to set up orga for a simple org-string to html conversion?

For context, I am playing around with https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/inline.js to address the performance in #6. I don't understand the unifiedjs toolchain so I am just trying to infer the structure from inline.js alone. By using a character based parser and bypassing the regex for markups we easily get a 5x speedup, which for large files is still not fast, but no longer prohibitive.

I think another problem with the regex matching is that syntax like /*italic-bold*/ and */bold-italic/* will not get matched, although both are valid org. A stack based parser should be able to handle these, but I'm not familiar with the testing setup here, so I am looking for a way to get the rendered html.

FWIW, the parser I am using looks like this now:

// inline.js

// after `markups` is defined
var inlineMarkups = {};
markups.forEach(function(nameMarker) {
  inlineMarkups[nameMarker.marker.replace('\\', '')] = nameMarker.name;
});

var StackParser = function(patterns) {
    var self = this;
    self.patterns = patterns;
    var ch, last_marker;
    self.parse = function(text) {

      if(Array.isArray(text)) {
        return text.reduce(function (all, node) {
          if (node.hasOwnProperty('type') && node.type != 'text') {
            return all.concat(node);
          }
          return all.concat(self.parse(node));
        }, []);
      }
      
      var string = text.value;
      var buffer = [];
      var stack = [];
      var nodes = [];
      for(var i=0; i<string.length; ++i) {
        ch = string[i];
        if(stack.length > 0) {
          last_marker = stack[stack.length-1];
        }
        if(ch == last_marker) {
          // end markup
          stack.pop();
          if(buffer.length > 0) {
            nodes.push(new _node2.default(patterns[ch], buffer.join('')));
            buffer = [];
            last_marker = null;
          }
        } else if(inlineMarkups[ch]) {
          // begin markup
          if(buffer.length > 0) {
            nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
          }
          stack.push(ch);
          buffer = [];
        } else {
          buffer.push(ch);
        }
      }
      if(buffer.length > 0) {
        nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
      }
      return nodes;
    };
  };

...

    for (var _iterator = markups[Symbol.iterator](), _step; !(_iteratorNormalCompletion = (_step = _iterator.next()).done); _iteratorNormalCompletion = true) {
      var _ref = _step.value;
      var name = _ref.name;
      var marker = _ref.marker;
      // bypass the old parser
      break
      _loop(name, marker);
    }
    var inlineparser = new StackParser(inlineMarkups);
    text = inlineparser.parse(text);

I don't know if it emits the AST correctly. It works for trivial cases, but it breaks for example URLs (slashes become italics). Currently I am comparing the result of hast -> hiccup using an ad-hoc converter from a clojurescript environment, so the testing is tricky. FYI, my converter code is here, but the unified stuff is magic to me. That is why I'm trying the minimal example.

Incorrect todo keyword parsing when using keybinds or special markers (for adding timestamps, etc) in keyword list

This is due to issues with todo keyword parsing, I assume.

In #5 you mentioned that "Normally we don't expect to see non-alphanumeric characters in todo keywords", but this isn't true. There's a common use-case for these: https://orgmode.org/manual/Fast-access-to-TODO-states.html and https://orgmode.org/manual/Tracking-TODO-state-changes.html, both of which I'm basically always using.

Because this behavior isn't handled (I assume), this is happening because the parser takes the todo keyword "DONE" in the file to be "DONE(d!)", etc, rather than "DONE" with a keybind of "d", that logs the time when it's switched to, and gets confused.

`#+TBLFM` support for tables

In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.

as requested at #36

parser: latex support?

希望:

parser解析出latex公式: $a+b$, [a+b ]
orga-rehype 生成的html直接就可以展示数学公式，比如用katex

输入org:

* test

  $a+b$


  \[a+b+\alpha\]

html
<div class="section"><h1>test</h1><p>$a+b$</p><p>\[a+b+\alpha\]</p></div>

会的，这个 parser 会尽量跟org-mode syntax 12保持一致。对于parser本身，我觉得最make sense 的实现是直接返回node:

{
type: “latex”,
name: “equation”,
value: “x=\sqrt{b}”
}
具体的render可以在org-mode to HTML 层用其他包来实现。这样其实就没有多少工作在parser这边。

footnotes are not parsed

Hello,

Thank you for your effort to provide us with this software! A bridge between org and the web really adds to my org mode experience.

I am currently using orga as part of https://www.gatsbyjs.org/packages/gatsby-transformer-orga/. It works very well, except for footnotes. Markup like [fn:this is a footnote] is rendered fine by Emacs but not at all by orga. See this example.

Your example suggests orga supports footnotes. So, what am I doing wrong?

License file?

Hi there,

The license is clear in all the respective package.json files, but it would be clearer to add a LICENSE.org file at the top level in the repository. It would also work better with various tools.

[Feature Request] Allow for JS Injection

When working with gatsby, it's very helpful to embed components within the markup language. A great example of this is mdx. It'd be nice if orga could support this and then perhaps later support a custom "provider" for replacing various tags.

I think a quick scroll through this will better explain my request. Essentially I'd like an "orgx" robustness to org as mdx is to md.

This may relate to #49 but it felt separate enough to make an issue for.

Org paragraphs are rendering as `<div>` elements instead of `<p>` elements

The example readme.org file is transformed into:

<div>This is an example project.</div><div>#+BEGIN_SRC sh npm install npm run build #+END_SRC</div><div>Take a look at <code>readme.html</code> file.</div>

These divs aren't able to be styled as <p> components, and so css like paragraph spacing is not applied. It should render to:

<p>This is an example project.</p><p>#+BEGIN_SRC sh npm install npm run build #+END_SRC</p><p>Take a look at <code>readme.html</code> file.</p>

Should the paragraph lines be joined?

Hi. Really like your work.
I have a question regarding the paragraphs. When I parse the following two strings I become the same result.

const ast = parse(`
 * This is heading
   This is line one
   This is line two
 `);

and

 const ast = parse(`
 * This is heading
   This is line one This is line two
 `);

The both produce the same ast:

console.log(ast.children[0].children[1].children[0]);
// { type: 'text',
//   children: [],
//   value: 'This is line one This is line two',
//   parent: 
//    { type: 'paragraph',
//      children: [ [Circular] ],
//      parent: { type: 'section', children: [Array], level: 1, parent: [Object] } } }

I would expect the value to have newlines for the first example. Maybe my logic is wrong but if we join with newline, instead with space here it should be okay.
Say if you would like a pull request and have a nice day!

Not handling files with CRLF

Parsing this example produces odd results. More specifically, the table isn't parsed properly. I think changing this split to be /\r\n|\n/ fixes this issue. I'm happy to make a pull request if you'd like.

Issue with React Native

Hi,

Thanks for this great package. I'm trying to use orga-rehype for my react native project. However whenever I import orga-rehype, this error is printed out forever.

Any idea why?

TIA

add example of usage in a browser

I'd like to use org-js in a simple html page to render either a string or a file. Do you have a working example of how to do this directly in an html page, using browserify, babel, webpack, or whatever else? Thanks!

Maximum call stack size exceeded with orga transformer on gatsbyjs 2.3.4

After package upgrade to gatsbyjs version 2.3.4 I'm getting UNHANDLED REJECTION when gatsby-transformer-orga is enabled.

❯ gatsby develop                                                        
success open and validate gatsby-configs — 0.050 s
success load plugins — 0.235 s           
success onPreInit — 0.188 s                                             
success initialize cache — 0.012 s
success copy gatsby files — 0.033 s
success onPreBootstrap — 0.010 s
success source and transform nodes — 0.275 s
error UNHANDLED REJECTION                                               

                                         
  RangeError: Maximum call stack size exceeded                          

  - RegExp.test 

  - date.js:93 looksLikeADate
    [igloonet.hosting]/[gatsby]/dist/schema/types/date.js:93:109
                                            
  - example-value.js:161 getType
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:161:14
                              
  - example-value.js:40 nodes.map.node
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:40:20
                                   
  - Array.map                   
                                            
  - example-value.js:38 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:38:27
                                       
  - Array.reduce                     
                               
  - example-value.js:37 getExampleObject   
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44                              

  - example-value.js:96 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29
                        
  - Array.reduce
                  
  - example-value.js:37 getExampleObject
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44

  - example-value.js:96 Array.from.reduce                                 
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29

  - Array.reduce
                                                 
  - example-value.js:37 getExampleObject                                                                                              
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44
                                     
  - example-value.js:96 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29

  - Array.reduce

orgapp / orgajs Goto Github PK

orgajs's Introduction

Orga

What Is It

Why

Compatible Eco-systems

Examples

Contribute

orgajs's People

Contributors

Stargazers

Watchers

Forkers

orgajs's Issues

Problem

Details

Suggestion

Problem

Solution?

Notes

Why v2

The Improvements

Breaking Changes

Contribution

Issue

Expected behaviour:

Recommend Projects

Recommend Topics

Recommend Org

Jobs