GithubHelp home page GithubHelp logo

orgapp / orgajs Goto Github PK

View Code? Open in Web Editor NEW
604.0 16.0 63.0 52.57 MB

parse org-mode content into AST

Home Page: https://orga.js.org

License: MIT License

JavaScript 41.57% Shell 0.03% TypeScript 56.67% Astro 1.68% CSS 0.04%
org-mode ast javascript unified gatsbyjs

orgajs's People

Contributors

agilecreativity avatar blackglory avatar chaseadamsio avatar dependabot[bot] avatar floscr avatar github-actions[bot] avatar guiltydolphin avatar lpan avatar shroomist avatar vinhowe avatar whacked avatar woofers avatar xiaoxinghu avatar yusaira-khan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orgajs's Issues

bug: parser unrecoverably fails if there is a `#+TODO: (...)` directive

const { parse } = require("orga")
parse("#+TODO: (whatever)\n* something");

will throw

TypeError: Cannot read property 'value' of undefined
    at _parse (./orgajs/node_modules/orga/lib/inline.js:101:19)
    at parse (./orgajs/node_modules/orga/lib/inline.js:25:10)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/headline.js:49:32)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/blank.js:9:15)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.process (./orgajs/node_modules/orga/lib/processors/keyword.js:38:15)
    at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
    at Parser.parse (./orgajs/node_modules/orga/lib/parser.js:80:15)
    at parse (./orgajs/node_modules/orga/lib/index.js:13:17)

This is unrecoverable, meaning that after getting this error, the parser will continue to throw the same error on any string with a headline, such as parse("* something"). This issue persists even if the orga is re-required after removing the require cache using delete require.cache[require.resolve("orga")]

I have not chased this very far and don't understand why the error persists across reimporting, but the undefined comes from somewhere around
https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/lexer.js#L116
where pattern for matching headline gets set to a regex containing (whatever) and basically fails for all headlines.

Thanks for your work.

Preserve as much of the original structure as possible

Hello there,

Thanks again for such an awesome project. It would be great to have an orga-stringify utility to fit more completely into the unified ecosystem and open ourselves up to using more transform tools. Then we could parse org files to an AST, transform them and then re-render the org. Ideally minimal transformations would re-render something pretty close to the original. To do that, we'd need to preserve as much of the original structure as possible.

I propose something like these changes. I'm after the effect more than the approach so I'm happy to discuss/modify/whatever. If you'd like me to make this a pull request, please let me know.

My thinking is that the extra structure in the AST can always be stripped when not needed. For instance, you could filter out whitespace/keyword nodes as well as trim() inner text if desired. But having it in the AST allows us to (nearly) faithfully re-render the original org file.

In there is a separate commit with the changes to the snapped files if you just want to see the effect on the AST. I think in a couple of cases it even renders a bit more accurately.

Again, more than happy to discuss.

Thanks again,
-Doug

P.S. I have a prototype for orga-stringify as well which I'll add to my fork as soon as I figure out how lerna works.

Code block indentation

Problem

It seems that code blocks aren't stripped of (common) leading whitespace. This has the effect that as you get deeper down the hierarchy and content gets more and more indented, the exported code blocks also become more indented.

For instance, the below content would have each code block indented one space more than the previous.

* Header level one
  Content.
  #+begin_src elisp
    (message "one")
  #+end_src
  
** Header level 2
   Content
   #+begin_src elisp
     (message "two")
   #+end_src

*** Header level 3
    Content
    #+begin_src elisp
      (message "three")
    #+end_src

Solution?

I think it would make more sense if all code blocks had their shared leading whitespace removed. As an example, consider this bit of Org:

* Header level one
  Content.
  #+begin_src js
    const f = x => {
      return x * x
    }

  #+end_src

The code snippet should render flush against the left margin like this

const f = x => {
  return x * x
}

instead of indented like this.

    const f = x => {
      return x * x
    }

Notes

Is there a way to achieve this in Orga today? If not, I'd be happy to take a look at implementing this if you could point me in the right direction. (Though looking at the source, it looks a little intimidating; I've never really written a parser before, so it might take more time than expected.)

Thanks for the great package, by the way!

robust timestamp parsing

Currently, the timestamp parsing is pretty simple, which does not support repeaters. Need a more robust version.

Another issue related to #4 is that orga does not handle repeated dates correctly. Actually for a date like this: DEADLINE: <2017-12-10 Sun .+1W>

from #36

Footnotes don't link back up to their 'usage'

Hey,

Just saw the comments around V2 and the good news around footnotes in #42. When trying out the example that was linked to, I noticed that the footnote doesn't link back up to where it was referenced in the text.

I think that the footnote should link back to all the places in the text where it has been referenced, so that the reader / end user can easily jump back to where it was used.

Just so we're clear about what I mean: here's an image of references from Wikipedia. The little ^ in front of every reference links back to the bit of the text that references it.

Screenshot 2020-08-14 at 08 25 54

I suppose that if a footnote is referenced multiple times, then there should also be multiple targets you can jump to.

Thanks for all your hard work; it's much appreciated!

Table header is not supported

Hi. Thanks for the great work on Orga.

I'm using gatsby-transformer-orga for my sites. Today I found that table headers are not supported. For example, the following table:

| A | B |
|---+---|
| 1 | 2 |

should translate to:

<table>
  <thead>
    <tr>
      <th>A</th>
      <th>B</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>2</td>
    </tr>
  </tbody>
</table>

But gatsby-transformer-orga puts everything in <table><tbody>...</tbody></table>.

global `timestamp` recognition

Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.

Since emacs recognize timestamps globally, we should be able to do it as part of our inline tokenization.

As requested at #36

๐Ÿ“ข v2 is on it's way!

I am spending a fair amount of time working on v2 for a while now. It's close to feature complete, so I think it's time to give you some ideas about what's been going on and where this project is heading.
TL;DR: it's going to be awesome. ๐Ÿ‘

Why v2

As you can see there are some issues queuing up there, for different reasons. Lot's of them are due to the limitation of the current design. We hack around to fix them (thank you all for the contributions), but they are not ideal. I am currently working on an org-mode related iOS app on the side, so I have created a similar parser in swift (several times...). Switching to a very different language provides a much clearer sense of the thing you are trying to build. With the lessons learnt there, I decided to make orgajs great again (it was not bad before).

The Improvements

  • Strong Types
    The project was converted to TypeScript in the cheapest way possible a couple of months back. (e.g. const data: any = {...}). Now it's much closer to a real TypeScript project. This will make collaboration a lot easier.

  • Full OAST Spec
    Taking advantage of the type system, we now have the full spec of OAST (Orga Abstract Syntax Tree).

  • Position (line, column)
    All tokens and nodes will have position info built-in. ๐ŸŽ‰

{
  type: 'headline',
  position: {
    start: { line: 1, column: 0 },
    end: { line: 1, column: 15 },
  }
  ...
}
  • Flexible Parsing Process
    orgajs use to heavily rely on complicated regex matching in the scope of individual lines of text. Now it's a much more flexible process. The benefits are:

    • easier to implement/modify/reason parsing logic
    • room for performance optimization
    • tokens are not bound to lines
  • Better DX
    Strong types, auto-completion, easy(er) to read codebase.

  • Test! Test! Test!
    We need more tests. You want it to be more badass? write tests! I am also working on optimizing the test writing experience.

Breaking Changes

v2 is breaking compatibility with the current version. The broken parts are:

  • token names
  • syntax tree structure
  • nodes/tokens now have position info

Basically, everything that's important... The bright side is, that with monorepo setup, I can fix everything up pretty easily altogether.

The spec was loosely defined in v1, hopefully, with strongly typed components in v2, the API will have much better resilient against breaking changes in future releases.

Contribution

All contributions are welcome, it's non-trivial to write a parser for something as powerful as org-mode. orgajs need all the helps it can get from org-mode lovers. We need help with documentation, testing, bug fixes, new features ... While the architecture of this project solidifying during the development of v2, I can see a more collaborative future.

Custom URL for pathnames doesn't work after last upgrade

Hello,

after upgrade to latest version, I'm unable to use custom URL (as shown in https://github.com/xiaoxinghu/gatsby-orga/blob/master/gatsby-node.js where code is commented now).

My Org pages are stored in src/help and I want them accessible on https://siteurl/help/<title>/. Problem is that after migration from orga to OrgContent it looks like like slugs are created directly in plugin (owner of slug field is orga transform plugin) and I'm no longer able to add slug field through onCreateNode.

IMHO this should work:

const { createFilePath } = require(`gatsby-source-filesystem`)

// Add custom url pathname for blog posts.
exports.onCreateNode = ({ node, getNode, actions }) => {
  const { createNodeField } = actions

  if (node.internal.type === `OrgContent`) {
    const path = createFilePath({ node, getNode })
    const slug = `/help${path}`

    createNodeField({
      node,
      name: `slug`,
      value: slug,
    })
  }
}

But I'm getting error Error: A plugin tried to update a node field that it doesn't own

Is there any chance to get old behaviour back or is there correst way how to proceed? I tried many approaches so far without success.

bug: emphasize is not working for words with less than 2 characters

const { parse } = require('orga')

console.log(parse('*12*').children[0].children[0])
// { type: 'text',
//   value: '*12*',
//   ... }

console.log(parse('*123*').children[0].children[0])
// { type: 'bold',
//   children:
//    [ { type: 'text', children: [], value: '123', parent: [Circular] } ],
//   ... }

The example above is for bold, but this bug appears on all kinds of emphasize (bold, verbatim, underline, etc).

The problem is that the regex you're using expect at least two characters between each markups:

/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*12*')
// null

/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*123*')
// not null

[Feature Request] Allow for JS Injection

When working with gatsby, it's very helpful to embed components within the markup language. A great example of this is mdx. It'd be nice if orga could support this and then perhaps later support a custom "provider" for replacing various tags.

I think a quick scroll through this will better explain my request. Essentially I'd like an "orgx" robustness to org as mdx is to md.

This may relate to #49 but it felt separate enough to make an issue for.

Parsing in emacs

Thanks for introducing me to unified. Your package looks great. Would it be possible to use org-mode's own parser to generate the unified syntax tree?

Rows and offsets

I'm wondering, since I'd like to be able to use orga to parse text then decorate the original text without modifying it-- would there be some way for orga to keep track of which characters a specific node maps back to in the source text?

So for example, being able to say that row 6, characters 1 - 10, correspond to a bold node.

Did you ever attempt to go down that route, or from your perspective do you think it's feasible?

Empty table cells are dropped

If I parse a table with empty cells, these are simply dropped and never make it to the ast. Here is an example:

> var stringify = require('json-stringify-safe');
undefined
> const parse = require('orga')
undefined
> stringify(parse.parse('|         | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
  '~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
> stringify(parse.parse('| nonempty | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"nonempty","parent":"[Circular ' +
  '~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"},{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
  '~.children.0.children.0.children.1]"}],"parent":"[Circular ' +
  '~.children.0.children.0]"}],"parent":"[Circular ' +
  '~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
>

As you can see, the first parse only returns a single cell whereas the second returns two (which is the correct behaviour)

The value of Org meta keys are overwritten if used multiple times

Issue

Given the following valid Org file which simply includes two other Org files

#+INCLUDE: file.org
#+INCLUDE: anotherfile.org

the following GraphQL entry is created:

"node": {
  "fields": {
    "slug": "/blog/test/"
  },
  "meta": {
    "include": "anotherfile.org"
  }
}

The reference to file.org is not saved. It seems to only save the last entry under that meta key.

According to the Org mode manual

Org . . . accepts multiple lines for a keyword

Expected behaviour:

The GraphQL entry should include both values under the key as an array:

"node": {
  "fields": {
    "slug": "/blog/test/"
  },
  "meta": {
    "include": [
      "file.org",
      "anotherfile.org"
    ]
  }
}

Expose meta content for filters

Problem

Currently, orga does not support variable graphql queries since only internal is exposed.

Details

For remark this is valid:

allMarkdownRemark(
      filter: { frontmatter: { category: { in: ["Pandas"] } } }
    ) {
      totalCount
      edges {
        node {
          frontmatter {
            title
            category
            date
          }
        }
      }
    }
}

However, the only option for filtering orga is allOrga ( filter: { internal: {content: { regex: "/CATEGORY:/"}}} )

Since currently query CategoryPage($category: String) { ( filter: { internal: {content: { regex: "/CATEGORY:/" regex: $category }}} ) also does not work.

Suggestion

If the meta object could be exposed to be similar to the remark frontmatter object, it would be greatly appreciated.

parser: latex support?

ๅธŒๆœ›:

  1. parser่งฃๆžๅ‡บlatexๅ…ฌๅผ: $a+b$, [a+b ]
  2. orga-rehype ็”Ÿๆˆ็š„html็›ดๆŽฅๅฐฑๅฏไปฅๅฑ•็คบๆ•ฐๅญฆๅ…ฌๅผ๏ผŒๆฏ”ๅฆ‚็”จkatex

่พ“ๅ…ฅorg:

* test

  $a+b$


  \[a+b+\alpha\]

html
<div class="section"><h1>test</h1><p>$a+b$</p><p>\[a+b+\alpha\]</p></div>

ไผš็š„๏ผŒ่ฟ™ไธช parser ไผšๅฐฝ้‡่ทŸorg-mode syntax 12ไฟๆŒไธ€่‡ดใ€‚ๅฏนไบŽparserๆœฌ่บซ๏ผŒๆˆ‘่ง‰ๅพ—ๆœ€make sense ็š„ๅฎž็Žฐๆ˜ฏ็›ดๆŽฅ่ฟ”ๅ›žnode:

{
type: โ€œlatexโ€,
name: โ€œequationโ€,
value: โ€œx=\sqrt{b}โ€
}
ๅ…ทไฝ“็š„renderๅฏไปฅๅœจorg-mode to HTML ๅฑ‚็”จๅ…ถไป–ๅŒ…ๆฅๅฎž็Žฐใ€‚่ฟ™ๆ ทๅ…ถๅฎžๅฐฑๆฒกๆœ‰ๅคšๅฐ‘ๅทฅไฝœๅœจparser่ฟ™่พนใ€‚

No inline markup in blocks

This org source:
image

Becomes:
image

This might be the correct behaviour in #+BEGIN_EXAMPLE blocks, but I think it's pretty common to have emphasis in blockquotes.

Missing features: checkboxes, and repeating timestamps

Hello! Thank you for the wonderful project that is orga. I am starting a project built on top of orga and so far it has been a pleasure to work with. I have had a few issues though, which are very minor. These include:

  1. Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.

  2. In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.

  3. Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.

  4. Another issue related to #4 is that orga does not handle repeated dates correctly. Actually for a date like this:

DEADLINE: <2017-12-10 Sun .+1W>

Orga completely ignores this and returns {keyword: "DEADLINE", type: "planning"} with no date stamp.

Really, thank you fr your work on orga. If you need any help fixing these issues I would be happy to be of assistance. Thanks!

Why no DEADLINE: and SCHEDULED: ?

I was just trying this on your site: https://orga.js.org

* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri> SCHEDULED: <2020-07-03 Fri>

I get:

{ type: 'root',
  children: 
   [ { type: 'section',
       children: 
        [ { type: 'headline',
            children: 
             [ { type: 'text',
                 children: [],
                 value: 'Review code',
                 parent: [Circular] },
               { type: 'planning',
                 children: [],
                 keyword: 'DEADLINE',
                 parent: [Circular] } ],
            level: 1,
            keyword: 'TODO',
            priority: 'B',
            tags: [ 'work' ],
            parent: [Circular] } ],
       level: 1,
       parent: [Circular] } ],
  meta: {} }

When I put a newline between the two keywords it looks a bit better, but the SCHEDULED is still not there:

* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri> 
SCHEDULED: <2020-07-03 Fri>

results in:

{ type: 'root',
  children: 
   [ { type: 'section',
       children: 
        [ { type: 'headline',
            children: 
             [ { type: 'text',
                 children: [],
                 value: 'Review code',
                 parent: [Circular] },
               { type: 'planning',
                 children: [],
                 keyword: 'DEADLINE',
                 date: Thu Jul 02 2020 00:00:00 GMT+0200 (Central European Summer Time),
                 end: undefined,
                 parent: [Circular] } ],
            level: 1,
            keyword: 'TODO',
            priority: 'B',
            tags: [ 'work' ],
            parent: [Circular] } ],
       level: 1,
       parent: [Circular] } ],
  meta: {} }

I am wondering why they do not get parsed as of https://orgmode.org/manual/Inserting-deadline_002fschedule.html

Did i understand something wrong, or is is a yet unimplemented feature or is it a bug?

Thank you for this work! It seems very promising!

newline breaks italic markup

I'm using Orga with GatsbyJS. I have the following in my post:

... or /The C++ Programming
Language/.

It is rendered as

... or /The C++ Programming Language/.

But if I remove the newline after Programming, it is correctly rendered as

... or The C++ Programming Language.

I suspect other formatting markup may be broken too.

add example of usage in a browser

I'd like to use org-js in a simple html page to render either a string or a file. Do you have a working example of how to do this directly in an html page, using browserify, babel, webpack, or whatever else? Thanks!

[Question] Converting GNU Extended Regular Expressions to PRE

Hi @xiaoxinghu

I am starting to work on the similar project, but in Rust. I stumbled upon your project and I was very happy to learn that I am not the only one crazy about org-mode.

The first thing I learned is that emacs uses different version of regexps (namely GNU ERE while all normal programming languages use Perl Compatible Regexps)

I am currently trying to go through https://code.orgmode.org/bzg/org-mode/src/master/lisp/org-element.el trying to write something something similar in Rust.

How did you convert regexps in org-element-paragraph-separate and org-element--object-regexp to PCRE? Did you use any automatic tools?

Secondly, since you basically did the same job of writing a parser - is there any advice you can share?
It will be my first attempt at writing a parser.

Thanks

Implement 2-pass parsing to properly handle in-buffer settings that affect parsing of content

TODO is the only current in-buffer setting affected by this, but another unimplemented example would be PRIORITIES.

This is inconsistent with actual org-mode behavior and can cause issues in parsing org files, correct behavior requires two passes, one to discover in-buffer settings and another to actually parse the file into an AST.

Example:

> const { parse } = require('orga')
[...]
> parse('* TADA Header\n#+TODO: TADA | DANE').children[0].children[0].keyword
undefined
> parse('#+TODO: TADA | DANE\n* TADA Header').children[0].children[0].keyword
'TADA'

Incorrect todo keyword parsing when using keybinds or special markers (for adding timestamps, etc) in keyword list

This is due to issues with todo keyword parsing, I assume.

In #5 you mentioned that "Normally we don't expect to see non-alphanumeric characters in todo keywords", but this isn't true. There's a common use-case for these: https://orgmode.org/manual/Fast-access-to-TODO-states.html and https://orgmode.org/manual/Tracking-TODO-state-changes.html, both of which I'm basically always using.

Because this behavior isn't handled (I assume), this is happening because the parser takes the todo keyword "DONE" in the file to be "DONE(d!)", etc, rather than "DONE" with a keybind of "d", that logs the time when it's switched to, and gets confused.

Parse elements inside drawers - clocks, properties

Hello, thanks for the great work on Orga.

Currently, property elements ("key: value" pair in Properties drawer) and clock elements ("CLOCK: [..]" are not recognized/parsed as specific types in the AST output.
It would be nice to have them parsed as a specific type (similar to tags, tables, list).

For example -the following orgmode file

* modularization of custom functions, speed-up and optimalization :optimalition:module:package:
:LOGBOOK:  
CLOCK: [2019-05-24 pรก 11:18]--[2019-05-24 pรก 11:18] =>  0:00
- general info on python modules and packaging
:END:      
:PROPERTIES:
:type:     enhancement
:dated:    <2017-12-02 so>
:END:
package various code snippets to the regular package
modules
  - string processsing
  - sklearn custom transformers

and its AST (in json)

{
    "type": "root",
    "children": [
        {
            "type": "section",
            "children": [
                {
                    "type": "headline",
                    "children": [
                        {
                            "type": "text",
                            "children": [],
                            "value": "modularization of custom functions, speed-up and optimalization",
                            "parent": "[Circular ~.children.0.children.0]"
                        },
                        {
                            "type": "drawer",
                            "children": [],
                            "name": "LOGBOOK",
                            "value": "CLOCK: [2019-05-24 pรก 11:18]--[2019-05-24 pรก 11:18] =>  0:00\n- general info on python modules and packaging",
                            "parent": "[Circular ~.children.0.children.0]"
                        },
                        {
                            "type": "drawer",
                            "children": [],
                            "name": "PROPERTIES",
                            "value": ":type:     enhancement\n:dated:    <2017-12-02 so>",
                            "parent": "[Circular ~.children.0.children.0]"
                        }
                    ],
                    "level": 1,
                    "tags": [
                        "optimalition",
                        "module",
                        "package"
                    ],
                    "parent": "[Circular ~.children.0]"
                },
                {
                    "type": "paragraph",
                    "children": [
                        {
                            "type": "text",
                            "children": [],
                            "value": "package various code snippets to the regular package modules",
                            "parent": "[Circular ~.children.0.children.1]"
                        }
                    ],
                    "parent": "[Circular ~.children.0]"
                },
                {
                    "type": "list",
                    "children": [
                        {
                            "type": "list.item",
                            "children": [
                                {
                                    "type": "text",
                                    "children": [],
                                    "value": "string processsing",
                                    "parent": "[Circular ~.children.0.children.2.children.0]"
                                }
                            ],
                            "ordered": false,
                            "parent": "[Circular ~.children.0.children.2]"
                        },
                        {
                            "type": "list.item",
                            "children": [
                                {
                                    "type": "text",
                                    "children": [],
                                    "value": "sklearn custom transformers",
                                    "parent": "[Circular ~.children.0.children.2.children.1]"
                                }
                            ],
                            "ordered": false,
                            "parent": "[Circular ~.children.0.children.2]"
                        }
                    ],
                    "ordered": false,
                    "parent": "[Circular ~.children.0]"
                }
            ],
            "level": 1,
            "parent": "[Circular ~]"
        }
    ],
    "meta": {}
}

progress indicator in headlines

Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.

as requested at #36

Minimal example does not work

The minimal example on https://github.com/xiaoxinghu/orgajs/tree/master/examples/example

repro on node 10.6.0:

mkdir /tmp/orga-test
cd /tmp/orga-test
yarn add orga-unified oast-to-hast unified to-vfile vfile-reporter rehype-document rehype-stringify

cat > test.js <<EOF
var vfile = require('to-vfile')
var report = require('vfile-reporter')
var unified = require('unified')
var parse = require('orga-unified')
var mutate = require('oast-to-hast')
var stringify = require('rehype-stringify')
var doc = require('rehype-document')

unified()
  .use(parse)
  .use(mutate)
  .use(doc, {title: 'Hi!'})
  .use(stringify)
  .process(vfile.readSync('./README.org'), function (err, file) {
    console.error(report(err || file))
    console.log(String(file))
  })
EOF

cat > README.org <<EOF
* hello
  world
EOF

node test.js

results in

/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31
  var meta = tree.meta || {};
                  ^

TypeError: Cannot read property 'meta' of undefined
    at Function.toHAST (/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31:19)
    at freeze (/tmp/orga-test/node_modules/unified/index.js:123:28)
    at Function.process (/tmp/orga-test/node_modules/unified/index.js:360:5)
    at Object.<anonymous> (/tmp/orga-test/test.js:14:4)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
    at Function.Module._load (internal/modules/cjs/loader.js:530:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)

Would you be able to provide some updated guidance on how to set up orga for a simple org-string to html conversion?

For context, I am playing around with https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/inline.js to address the performance in #6. I don't understand the unifiedjs toolchain so I am just trying to infer the structure from inline.js alone. By using a character based parser and bypassing the regex for markups we easily get a 5x speedup, which for large files is still not fast, but no longer prohibitive.

I think another problem with the regex matching is that syntax like /*italic-bold*/ and */bold-italic/* will not get matched, although both are valid org. A stack based parser should be able to handle these, but I'm not familiar with the testing setup here, so I am looking for a way to get the rendered html.

FWIW, the parser I am using looks like this now:

// inline.js

// after `markups` is defined
var inlineMarkups = {};
markups.forEach(function(nameMarker) {
  inlineMarkups[nameMarker.marker.replace('\\', '')] = nameMarker.name;
});

var StackParser = function(patterns) {
    var self = this;
    self.patterns = patterns;
    var ch, last_marker;
    self.parse = function(text) {

      if(Array.isArray(text)) {
        return text.reduce(function (all, node) {
          if (node.hasOwnProperty('type') && node.type != 'text') {
            return all.concat(node);
          }
          return all.concat(self.parse(node));
        }, []);
      }
      
      var string = text.value;
      var buffer = [];
      var stack = [];
      var nodes = [];
      for(var i=0; i<string.length; ++i) {
        ch = string[i];
        if(stack.length > 0) {
          last_marker = stack[stack.length-1];
        }
        if(ch == last_marker) {
          // end markup
          stack.pop();
          if(buffer.length > 0) {
            nodes.push(new _node2.default(patterns[ch], buffer.join('')));
            buffer = [];
            last_marker = null;
          }
        } else if(inlineMarkups[ch]) {
          // begin markup
          if(buffer.length > 0) {
            nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
          }
          stack.push(ch);
          buffer = [];
        } else {
          buffer.push(ch);
        }
      }
      if(buffer.length > 0) {
        nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
      }
      return nodes;
    };
  };

...

    for (var _iterator = markups[Symbol.iterator](), _step; !(_iteratorNormalCompletion = (_step = _iterator.next()).done); _iteratorNormalCompletion = true) {
      var _ref = _step.value;
      var name = _ref.name;
      var marker = _ref.marker;
      // bypass the old parser
      break
      _loop(name, marker);
    }
    var inlineparser = new StackParser(inlineMarkups);
    text = inlineparser.parse(text);    

I don't know if it emits the AST correctly. It works for trivial cases, but it breaks for example URLs (slashes become italics). Currently I am comparing the result of hast -> hiccup using an ad-hoc converter from a clojurescript environment, so the testing is tricky. FYI, my converter code is here, but the unified stuff is magic to me. That is why I'm trying the minimal example.

Should the paragraph lines be joined?

Hi. Really like your work.
I have a question regarding the paragraphs. When I parse the following two strings I become the same result.

const ast = parse(`
 * This is heading
   This is line one
   This is line two
 `);

and

 const ast = parse(`
 * This is heading
   This is line one This is line two
 `);

The both produce the same ast:

console.log(ast.children[0].children[1].children[0]);
// { type: 'text',
//   children: [],
//   value: 'This is line one This is line two',
//   parent: 
//    { type: 'paragraph',
//      children: [ [Circular] ],
//      parent: { type: 'section', children: [Array], level: 1, parent: [Object] } } }

I would expect the value to have newlines for the first example. Maybe my logic is wrong but if we join with newline, instead with space here it should be okay.
Say if you would like a pull request and have a nice day!

`#+TBLFM` support for tables

In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.

as requested at #36

Org paragraphs are rendering as `<div>` elements instead of `<p>` elements

The example readme.org file is transformed into:

<div>This is an example project.</div><div>#+BEGIN_SRC sh npm install npm run build #+END_SRC</div><div>Take a look at <code>readme.html</code> file.</div>

These divs aren't able to be styled as <p> components, and so css like paragraph spacing is not applied. It should render to:

<p>This is an example project.</p><p>#+BEGIN_SRC sh npm install npm run build #+END_SRC</p><p>Take a look at <code>readme.html</code> file.</p>

Maximum call stack size exceeded with orga transformer on gatsbyjs 2.3.4

After package upgrade to gatsbyjs version 2.3.4 I'm getting UNHANDLED REJECTION when gatsby-transformer-orga is enabled.

โฏ gatsby develop                                                        
success open and validate gatsby-configs โ€” 0.050 s
success load plugins โ€” 0.235 s           
success onPreInit โ€” 0.188 s                                             
success initialize cache โ€” 0.012 s
success copy gatsby files โ€” 0.033 s
success onPreBootstrap โ€” 0.010 s
success source and transform nodes โ€” 0.275 s
error UNHANDLED REJECTION                                               

                                         
  RangeError: Maximum call stack size exceeded                          

  - RegExp.test 

  - date.js:93 looksLikeADate
    [igloonet.hosting]/[gatsby]/dist/schema/types/date.js:93:109
                                            
  - example-value.js:161 getType
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:161:14
                              
  - example-value.js:40 nodes.map.node
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:40:20
                                   
  - Array.map                   
                                            
  - example-value.js:38 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:38:27
                                       
  - Array.reduce                     
                               
  - example-value.js:37 getExampleObject   
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44                              

  - example-value.js:96 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29
                        
  - Array.reduce
                  
  - example-value.js:37 getExampleObject
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44

  - example-value.js:96 Array.from.reduce                                 
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29

  - Array.reduce
                                                 
  - example-value.js:37 getExampleObject                                                                                              
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44
                                     
  - example-value.js:96 Array.from.reduce
    [igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29

  - Array.reduce

License file?

Hi there,

The license is clear in all the respective package.json files, but it would be clearer to add a LICENSE.org file at the top level in the repository. It would also work better with various tools.

Allow adding unified plugins to AST processing?

Hey there,

Thanks again for an awesome project.

Do you think it might be a good idea to allow overriding the AST parsers/transformers for configurability? If someone wanted to touch up the produced HTML they could simply add a transformer plugin to edit the resulting tree. I'm thinking something like this change.

I don't have strong opinions for how to configure use of plugins (where they should be placed, how they are fetched, etc.). I don't have a need for this, so this is more food for thought than anything else. Browsing through the other issues, I wonder if something like this might put the power in the end user's hands. The obvious downside is the potential for incompatibility.

Please let me know what you think.

Regards,
-Doug

Incorrect TODO parsing with multiple todo sets in one file

Often it may make sense to have multiple sets of todo keywords in a file, but orga handles this incorrectly, with only the last parsed set of TODO keywords taking affect

Example from the org mode docs ( https://orgmode.org/manual/Per_002dfile-keywords.html )

#+TODO: TODO | DONE
#+TODO: REPORT BUG KNOWNCAUSE | FIXED
#+TODO: | CANCELED

Given how orga's updateTODOs function works, the previous TODOs are simply replaced with the newer ones. Instead, they should be collected then, if none are found before parsing headlines, it should default to ["TODO", "DONE"]

Issue with React Native

Hi,

Thanks for this great package. I'm trying to use orga-rehype for my react native project. However whenever I import orga-rehype, this error is printed out forever.

image

Any idea why?

TIA

WIP code for cli interface

I noticed:

build a emacs-less cli for org-mode (working on it)

Wondering where I might find the WIP code.

Performance degrades quickly with line length and line count, possibly line content

I noticed that for some files (e.g. > 10k lines) the parser practically cannot complete. This seems to be a function of number of lines and line lengths. For example, a large file with many blank lines can still complete. I don't know if this is specific to orga, or to unifyjs.

Here's a quick test on how the total parsing time for a trivial file changes with the line length, content, and line count:

parse = require("orga").parse;

function pad(n, width) {
    n += '';
    return n.length >= width ? n : new Array(width - n.length + 1).join('0') + n;
}

[
    (n) => `_${n}___${n}_`,
    (n) => `(${n})_(${n})`,
    (_) => 'xxxxxx_xxxxxx',
    (_) => 'xxxxxx_xxxxxx######_######',
].forEach(function(templater) {
    for(nlines = 500; nlines <= 5000; nlines += 500) {
        var lines = [
        '* blah\n',
        ];
        for(var n=1;++n<=nlines;) {
            lines.push(templater(pad(n, 4)));
        }
        var text = lines.join('\n');
        var t0 = new Date().getTime();
        var ast = parse(text)
        var tN = new Date().getTime();
        var dt = tN - t0;
        var lastLine = lines[lines.length-1];
        console.log(`|\t${lastLine}\t|\t${nlines}\t|\t${dt}\t|`);
    }
});

Plotting the output shows this:

speedtest

I was wondering whether the same regex issue was affecting likes w ith ( and ) but it seems like that is not the case, but the lines with parens are clearly slower than lines without parens.

This is the table from the script (first column is added afterwards):

|under| _0500___0500_              |  500 |   162 |
|under| _1000___1000_              | 1000 |   626 |
|under| _1500___1500_              | 1500 |  1376 |
|under| _2000___2000_              | 2000 |  2474 |
|under| _2500___2500_              | 2500 |  3865 |
|under| _3000___3000_              | 3000 |  5475 |
|under| _3500___3500_              | 3500 |  7521 |
|under| _4000___4000_              | 4000 |  9850 |
|under| _4500___4500_              | 4500 | 12427 |
|under| _5000___5000_              | 5000 | 15190 |
|paren| (0500)_(0500)              |  500 |   217 |
|paren| (1000)_(1000)              | 1000 |   858 |
|paren| (1500)_(1500)              | 1500 |  1955 |
|paren| (2000)_(2000)              | 2000 |  3483 |
|paren| (2500)_(2500)              | 2500 |  5439 |
|paren| (3000)_(3000)              | 3000 |  7850 |
|paren| (3500)_(3500)              | 3500 | 10711 |
|paren| (4000)_(4000)              | 4000 | 13873 |
|paren| (4500)_(4500)              | 4500 | 17560 |
|paren| (5000)_(5000)              | 5000 | 21619 |
|xxxxx| xxxxxx_xxxxxx              |  500 |   215 |
|xxxxx| xxxxxx_xxxxxx              | 1000 |   854 |
|xxxxx| xxxxxx_xxxxxx              | 1500 |  1923 |
|xxxxx| xxxxxx_xxxxxx              | 2000 |  3429 |
|xxxxx| xxxxxx_xxxxxx              | 2500 |  5471 |
|xxxxx| xxxxxx_xxxxxx              | 3000 |  7843 |
|xxxxx| xxxxxx_xxxxxx              | 3500 | 10564 |
|xxxxx| xxxxxx_xxxxxx              | 4000 | 13859 |
|xxxxx| xxxxxx_xxxxxx              | 4500 | 17445 |
|xxxxx| xxxxxx_xxxxxx              | 5000 | 21576 |
|xand#| xxxxxx_xxxxxx######_###### |  500 |   815 |
|xand#| xxxxxx_xxxxxx######_###### | 1000 |  3211 |
|xand#| xxxxxx_xxxxxx######_###### | 1500 |  7253 |
|xand#| xxxxxx_xxxxxx######_###### | 2000 | 12967 |
|xand#| xxxxxx_xxxxxx######_###### | 2500 | 20093 |
|xand#| xxxxxx_xxxxxx######_###### | 3000 | 28973 |
|xand#| xxxxxx_xxxxxx######_###### | 3500 | 39765 |
|xand#| xxxxxx_xxxxxx######_###### | 4000 | 51717 |
|xand#| xxxxxx_xxxxxx######_###### | 4500 | 65081 |
|xand#| xxxxxx_xxxxxx######_###### | 5000 | 80055 |

Is there an easy way to fix this?

link in table does not work

with the lastest version 0.2.7 on npm, links in a table cell does not work

const { parse } = require(`orga`);

const orgLink = `[[https://some.link][Some Link]]`;

var ast = parse(orgLink);

console.log(ast.children[0].children[0]); // this works fine

const orgLinkInTable = `
| Text            | [[http://I.should.be.link][link in header]]                            |
|-------------+---------------------------------------------------------------|
| Text   Text  | [[https://I.should.be.link][link in row]]                                |
`;

ast = parse(orgLinkInTable);

console.log(ast.children[0].children[0].children[1].children[0]); // link in header
console.log(ast.children[0].children[2].children[1].children[0]); // link in body

output:

{ type: 'link',
  children: [],
  uri:
   { raw: 'https://some.link',
     protocol: 'https',
     location: '//some.link' },
  desc: 'Some Link',
  parent:
   { type: 'paragraph',
     children: [ [Circular] ],
     parent: { type: 'root', children: [Array], meta: {} } } }


{ type: 'text',
  children: [],
  value: '[[http://I.should.be.link][link in header]]',
  parent:
   { type: 'tableCell',
     children: [ [Circular] ],
     parent: { type: 'tableRow', children: [Array], parent: [Object] } } }


{ type: 'text',
  children: [],
  value: '[[https://I.should.be.link][link in row]]',
  parent:
   { type: 'tableCell',
     children: [ [Circular] ],
     parent: { type: 'tableRow', children: [Array], parent: [Object] } } }

[Question] Can the Regexps be re-written to avoid backreferences?

I'm trying out orga for a project that would let me embed the parser in the client-side. Unfortunately, the regexps in the parser use backreferences - something only Chrome supports at the moment.

Is it possible to re-write those in a more browser-compliant way?

โค๏ธ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.