orgapp / orgajs Goto Github PK
View Code? Open in Web Editor NEWparse org-mode content into AST
Home Page: https://orga.js.org
License: MIT License
parse org-mode content into AST
Home Page: https://orga.js.org
License: MIT License
const { parse } = require("orga")
parse("#+TODO: (whatever)\n* something");
will throw
TypeError: Cannot read property 'value' of undefined
at _parse (./orgajs/node_modules/orga/lib/inline.js:101:19)
at parse (./orgajs/node_modules/orga/lib/inline.js:25:10)
at Parser.process (./orgajs/node_modules/orga/lib/processors/headline.js:49:32)
at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
at Parser.process (./orgajs/node_modules/orga/lib/processors/blank.js:9:15)
at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
at Parser.process (./orgajs/node_modules/orga/lib/processors/keyword.js:38:15)
at Parser.parseSection (./orgajs/node_modules/orga/lib/parser.js:96:24)
at Parser.parse (./orgajs/node_modules/orga/lib/parser.js:80:15)
at parse (./orgajs/node_modules/orga/lib/index.js:13:17)
This is unrecoverable, meaning that after getting this error, the parser will continue to throw the same error on any string with a headline, such as parse("* something")
. This issue persists even if the orga
is re-required after removing the require cache using delete require.cache[require.resolve("orga")]
I have not chased this very far and don't understand why the error persists across reimporting, but the undefined
comes from somewhere around
https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/lexer.js#L116
where pattern
for matching headline
gets set to a regex containing (whatever)
and basically fails for all headlines.
Thanks for your work.
Currently progress indicator is treated as plain text. [/] [%]. They should be separate elements within the AST.
Hello there,
Thanks again for such an awesome project. It would be great to have an orga-stringify utility to fit more completely into the unified ecosystem and open ourselves up to using more transform tools. Then we could parse org files to an AST, transform them and then re-render the org. Ideally minimal transformations would re-render something pretty close to the original. To do that, we'd need to preserve as much of the original structure as possible.
I propose something like these changes. I'm after the effect more than the approach so I'm happy to discuss/modify/whatever. If you'd like me to make this a pull request, please let me know.
My thinking is that the extra structure in the AST can always be stripped when not needed. For instance, you could filter out whitespace/keyword nodes as well as trim() inner text if desired. But having it in the AST allows us to (nearly) faithfully re-render the original org file.
In there is a separate commit with the changes to the snapped files if you just want to see the effect on the AST. I think in a couple of cases it even renders a bit more accurately.
Again, more than happy to discuss.
Thanks again,
-Doug
P.S. I have a prototype for orga-stringify as well which I'll add to my fork as soon as I figure out how lerna works.
It seems that code blocks aren't stripped of (common) leading whitespace. This has the effect that as you get deeper down the hierarchy and content gets more and more indented, the exported code blocks also become more indented.
For instance, the below content would have each code block indented one space more than the previous.
* Header level one
Content.
#+begin_src elisp
(message "one")
#+end_src
** Header level 2
Content
#+begin_src elisp
(message "two")
#+end_src
*** Header level 3
Content
#+begin_src elisp
(message "three")
#+end_src
I think it would make more sense if all code blocks had their shared leading whitespace removed. As an example, consider this bit of Org:
* Header level one
Content.
#+begin_src js
const f = x => {
return x * x
}
#+end_src
The code snippet should render flush against the left margin like this
const f = x => {
return x * x
}
instead of indented like this.
const f = x => {
return x * x
}
Is there a way to achieve this in Orga today? If not, I'd be happy to take a look at implementing this if you could point me in the right direction. (Though looking at the source, it looks a little intimidating; I've never really written a parser before, so it might take more time than expected.)
Thanks for the great package, by the way!
If I use the description list,
- Term 1 :: description
- Term 2 :: descrition
it will be rendering as below:
Maybe the term name display should be bolded, and just need one colon:
Hey,
Just saw the comments around V2 and the good news around footnotes in #42. When trying out the example that was linked to, I noticed that the footnote doesn't link back up to where it was referenced in the text.
I think that the footnote should link back to all the places in the text where it has been referenced, so that the reader / end user can easily jump back to where it was used.
Just so we're clear about what I mean: here's an image of references from Wikipedia. The little ^
in front of every reference links back to the bit of the text that references it.
I suppose that if a footnote is referenced multiple times, then there should also be multiple targets you can jump to.
Thanks for all your hard work; it's much appreciated!
Hi. Thanks for the great work on Orga.
I'm using gatsby-transformer-orga
for my sites. Today I found that table headers are not supported. For example, the following table:
| A | B |
|---+---|
| 1 | 2 |
should translate to:
<table>
<thead>
<tr>
<th>A</th>
<th>B</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>2</td>
</tr>
</tbody>
</table>
But gatsby-transformer-orga
puts everything in <table><tbody>...</tbody></table>
.
Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.
Since emacs recognize timestamps globally, we should be able to do it as part of our inline tokenization.
As requested at #36
In Emacs, I can export org file to HTML with Table of Contents
, will it be supported in future?
Parsing this example produces odd results. More specifically, the table isn't parsed properly. I think changing this split to be /\r\n|\n/
fixes this issue. I'm happy to make a pull request if you'd like.
I am spending a fair amount of time working on v2 for a while now. It's close to feature complete, so I think it's time to give you some ideas about what's been going on and where this project is heading.
TL;DR: it's going to be awesome. ๐
As you can see there are some issues queuing up there, for different reasons. Lot's of them are due to the limitation of the current design. We hack around to fix them (thank you all for the contributions), but they are not ideal. I am currently working on an org-mode related iOS app on the side, so I have created a similar parser in swift
(several times...). Switching to a very different language provides a much clearer sense of the thing you are trying to build. With the lessons learnt there, I decided to make orgajs
great again (it was not bad before).
Strong Types
The project was converted to TypeScript in the cheapest way possible a couple of months back. (e.g. const data: any = {...}
). Now it's much closer to a real TypeScript project. This will make collaboration a lot easier.
Full OAST Spec
Taking advantage of the type system, we now have the full spec of OAST (Orga Abstract Syntax Tree).
Position (line, column)
All tokens and nodes will have position info built-in. ๐
{
type: 'headline',
position: {
start: { line: 1, column: 0 },
end: { line: 1, column: 15 },
}
...
}
Flexible Parsing Process
orgajs use to heavily rely on complicated regex matching in the scope of individual lines of text. Now it's a much more flexible process. The benefits are:
Better DX
Strong types, auto-completion, easy(er) to read codebase.
Test! Test! Test!
We need more tests. You want it to be more badass? write tests! I am also working on optimizing the test writing experience.
v2 is breaking compatibility with the current version. The broken parts are:
Basically, everything that's important... The bright side is, that with monorepo setup, I can fix everything up pretty easily altogether.
The spec was loosely defined in v1, hopefully, with strongly typed components in v2, the API will have much better resilient against breaking changes in future releases.
All contributions are welcome, it's non-trivial to write a parser for something as powerful as org-mode. orgajs
need all the helps it can get from org-mode lovers. We need help with documentation, testing, bug fixes, new features ... While the architecture of this project solidifying during the development of v2, I can see a more collaborative future.
Hello,
after upgrade to latest version, I'm unable to use custom URL (as shown in https://github.com/xiaoxinghu/gatsby-orga/blob/master/gatsby-node.js where code is commented now).
My Org pages are stored in src/help
and I want them accessible on https://siteurl/help/<title>/
. Problem is that after migration from orga
to OrgContent
it looks like like slugs are created directly in plugin (owner of slug field is orga transform plugin) and I'm no longer able to add slug field through onCreateNode
.
IMHO this should work:
const { createFilePath } = require(`gatsby-source-filesystem`)
// Add custom url pathname for blog posts.
exports.onCreateNode = ({ node, getNode, actions }) => {
const { createNodeField } = actions
if (node.internal.type === `OrgContent`) {
const path = createFilePath({ node, getNode })
const slug = `/help${path}`
createNodeField({
node,
name: `slug`,
value: slug,
})
}
}
But I'm getting error Error: A plugin tried to update a node field that it doesn't own
Is there any chance to get old behaviour back or is there correst way how to proceed? I tried many approaches so far without success.
const { parse } = require('orga')
console.log(parse('*12*').children[0].children[0])
// { type: 'text',
// value: '*12*',
// ... }
console.log(parse('*123*').children[0].children[0])
// { type: 'bold',
// children:
// [ { type: 'text', children: [], value: '123', parent: [Circular] } ],
// ... }
The example above is for bold, but this bug appears on all kinds of emphasize (bold, verbatim, underline, etc).
The problem is that the regex you're using expect at least two characters between each markups:
/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*12*')
// null
/\*([^,'"\s].+?[^,'"\s])\*/m.exec('*123*')
// not null
When working with gatsby, it's very helpful to embed components within the markup language. A great example of this is mdx. It'd be nice if orga could support this and then perhaps later support a custom "provider" for replacing various tags.
I think a quick scroll through this will better explain my request. Essentially I'd like an "orgx" robustness to org as mdx is to md.
This may relate to #49 but it felt separate enough to make an issue for.
Thanks for introducing me to unified. Your package looks great. Would it be possible to use org-mode's own parser to generate the unified syntax tree?
I'm wondering, since I'd like to be able to use orga to parse text then decorate the original text without modifying it-- would there be some way for orga to keep track of which characters a specific node maps back to in the source text?
So for example, being able to say that row 6, characters 1 - 10, correspond to a bold
node.
Did you ever attempt to go down that route, or from your perspective do you think it's feasible?
If I parse a table with empty cells, these are simply dropped and never make it to the ast. Here is an example:
> var stringify = require('json-stringify-safe');
undefined
> const parse = require('orga')
undefined
> stringify(parse.parse('| | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
'~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
'~.children.0.children.0]"}],"parent":"[Circular ' +
'~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
> stringify(parse.parse('| nonempty | something |'))
'{"type":"root","children":[{"type":"table","children":[{"type":"table.row","children":[{"type":"table.cell","children":[{"type":"text","children":[],"value":"nonempty","parent":"[Circular ' +
'~.children.0.children.0.children.0]"}],"parent":"[Circular ' +
'~.children.0.children.0]"},{"type":"table.cell","children":[{"type":"text","children":[],"value":"something","parent":"[Circular ' +
'~.children.0.children.0.children.1]"}],"parent":"[Circular ' +
'~.children.0.children.0]"}],"parent":"[Circular ' +
'~.children.0]"}],"parent":"[Circular ~]"}],"meta":{}}'
>
As you can see, the first parse only returns a single cell whereas the second returns two (which is the correct behaviour)
According to docs: https://orgmode.org/guide/Tags.html
Given the following valid Org file which simply includes two other Org files
#+INCLUDE: file.org
#+INCLUDE: anotherfile.org
the following GraphQL entry is created:
"node": {
"fields": {
"slug": "/blog/test/"
},
"meta": {
"include": "anotherfile.org"
}
}
The reference to file.org is not saved. It seems to only save the last entry under that meta key.
According to the Org mode manual
Org . . . accepts multiple lines for a keyword
The GraphQL entry should include both values under the key as an array:
"node": {
"fields": {
"slug": "/blog/test/"
},
"meta": {
"include": [
"file.org",
"anotherfile.org"
]
}
}
Currently, orga
does not support variable graphql queries since only internal
is exposed.
For remark
this is valid:
allMarkdownRemark(
filter: { frontmatter: { category: { in: ["Pandas"] } } }
) {
totalCount
edges {
node {
frontmatter {
title
category
date
}
}
}
}
}
However, the only option for filtering orga
is allOrga ( filter: { internal: {content: { regex: "/CATEGORY:/"}}} )
Since currently query CategoryPage($category: String) { ( filter: { internal: {content: { regex: "/CATEGORY:/" regex: $category }}} )
also does not work.
If the meta
object could be exposed to be similar to the remark
frontmatter
object, it would be greatly appreciated.
Hello,
Thank you for your effort to provide us with this software! A bridge between org and the web really adds to my org mode experience.
I am currently using orga as part of https://www.gatsbyjs.org/packages/gatsby-transformer-orga/. It works very well, except for footnotes. Markup like [fn:this is a footnote]
is rendered fine by Emacs but not at all by orga. See this example.
Your example suggests orga supports footnotes. So, what am I doing wrong?
#+TBLFM
after a table should be inside the table node.
ๅธๆ:
่พๅ ฅorg:
* test
$a+b$
\[a+b+\alpha\]
html
<div class="section"><h1>test</h1><p>$a+b$</p><p>\[a+b+\alpha\]</p></div>
ไผ็๏ผ่ฟไธช parser ไผๅฐฝ้่ทorg-mode syntax 12ไฟๆไธ่ดใๅฏนไบparserๆฌ่บซ๏ผๆ่งๅพๆmake sense ็ๅฎ็ฐๆฏ็ดๆฅ่ฟๅnode:
{
type: โlatexโ,
name: โequationโ,
value: โx=\sqrt{b}โ
}
ๅ ทไฝ็renderๅฏไปฅๅจorg-mode to HTML ๅฑ็จๅ ถไปๅ ๆฅๅฎ็ฐใ่ฟๆ ทๅ ถๅฎๅฐฑๆฒกๆๅคๅฐๅทฅไฝๅจparser่ฟ่พนใ
Hello! Thank you for the wonderful project that is orga. I am starting a project built on top of orga and so far it has been a pleasure to work with. I have had a few issues though, which are very minor. These include:
Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.
In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.
Orga does not recognize date tags that are not either placed after DEADLINE or SCHEDULED and handles them as text. Orga is clearly able to recognize these tags so it should be easy to extend this to date objects outside of DEADLINE or SCHEDULED.
Another issue related to #4 is that orga does not handle repeated dates correctly. Actually for a date like this:
DEADLINE: <2017-12-10 Sun .+1W>
Orga completely ignores this and returns {keyword: "DEADLINE", type: "planning"} with no date stamp.
Really, thank you fr your work on orga. If you need any help fixing these issues I would be happy to be of assistance. Thanks!
I was just trying this on your site: https://orga.js.org
* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri> SCHEDULED: <2020-07-03 Fri>
I get:
{ type: 'root',
children:
[ { type: 'section',
children:
[ { type: 'headline',
children:
[ { type: 'text',
children: [],
value: 'Review code',
parent: [Circular] },
{ type: 'planning',
children: [],
keyword: 'DEADLINE',
parent: [Circular] } ],
level: 1,
keyword: 'TODO',
priority: 'B',
tags: [ 'work' ],
parent: [Circular] } ],
level: 1,
parent: [Circular] } ],
meta: {} }
When I put a newline between the two keywords it looks a bit better, but the SCHEDULED is still not there:
* TODO [#B] Review code :work:
DEADLINE: <2020-07-03 Fri>
SCHEDULED: <2020-07-03 Fri>
results in:
{ type: 'root',
children:
[ { type: 'section',
children:
[ { type: 'headline',
children:
[ { type: 'text',
children: [],
value: 'Review code',
parent: [Circular] },
{ type: 'planning',
children: [],
keyword: 'DEADLINE',
date: Thu Jul 02 2020 00:00:00 GMT+0200 (Central European Summer Time),
end: undefined,
parent: [Circular] } ],
level: 1,
keyword: 'TODO',
priority: 'B',
tags: [ 'work' ],
parent: [Circular] } ],
level: 1,
parent: [Circular] } ],
meta: {} }
I am wondering why they do not get parsed as of https://orgmode.org/manual/Inserting-deadline_002fschedule.html
Did i understand something wrong, or is is a yet unimplemented feature or is it a bug?
Thank you for this work! It seems very promising!
I'm using Orga with GatsbyJS. I have the following in my post:
... or /The C++ Programming
Language/.
It is rendered as
... or /The C++ Programming Language/.
But if I remove the newline after Programming
, it is correctly rendered as
... or The C++ Programming Language.
I suspect other formatting markup may be broken too.
Hello,
after transfer to orgapp
organization, links from https://www.gatsbyjs.org/packages/gatsby-transformer-orga/ are not working. It took me some time to find documentation again, google wasn't much help :)
Looks like it is taking info from package.json
as same info is presented on npmjs.
I'd like to use org-js in a simple html page to render either a string or a file. Do you have a working example of how to do this directly in an html page, using browserify, babel, webpack, or whatever else? Thanks!
Hi @xiaoxinghu
I am starting to work on the similar project, but in Rust. I stumbled upon your project and I was very happy to learn that I am not the only one crazy about org-mode.
The first thing I learned is that emacs uses different version of regexps (namely GNU ERE while all normal programming languages use Perl Compatible Regexps)
I am currently trying to go through https://code.orgmode.org/bzg/org-mode/src/master/lisp/org-element.el trying to write something something similar in Rust.
How did you convert regexps in org-element-paragraph-separate
and org-element--object-regexp
to PCRE? Did you use any automatic tools?
Secondly, since you basically did the same job of writing a parser - is there any advice you can share?
It will be my first attempt at writing a parser.
Thanks
TODO
is the only current in-buffer setting affected by this, but another unimplemented example would be PRIORITIES
.
This is inconsistent with actual org-mode behavior and can cause issues in parsing org files, correct behavior requires two passes, one to discover in-buffer settings and another to actually parse the file into an AST.
Example:
> const { parse } = require('orga')
[...]
> parse('* TADA Header\n#+TODO: TADA | DANE').children[0].children[0].keyword
undefined
> parse('#+TODO: TADA | DANE\n* TADA Header').children[0].children[0].keyword
'TADA'
There is an error ReferenceError: regeneratorRuntime is not defined
when gatsby develop
with gatsby: 2.0.37
, and without any problem on gatsby: 1.9.277
.
This is due to issues with todo keyword parsing, I assume.
In #5 you mentioned that "Normally we don't expect to see non-alphanumeric characters in todo keywords", but this isn't true. There's a common use-case for these: https://orgmode.org/manual/Fast-access-to-TODO-states.html and https://orgmode.org/manual/Tracking-TODO-state-changes.html, both of which I'm basically always using.
Because this behavior isn't handled (I assume), this is happening because the parser takes the todo keyword "DONE" in the file to be "DONE(d!)", etc, rather than "DONE" with a keybind of "d", that logs the time when it's switched to, and gets confused.
Hello, thanks for the great work on Orga.
Currently, property elements ("key: value" pair in Properties drawer) and clock elements ("CLOCK: [..]" are not recognized/parsed as specific types in the AST output.
It would be nice to have them parsed as a specific type (similar to tags, tables, list).
For example -the following orgmode file
* modularization of custom functions, speed-up and optimalization :optimalition:module:package:
:LOGBOOK:
CLOCK: [2019-05-24 pรก 11:18]--[2019-05-24 pรก 11:18] => 0:00
- general info on python modules and packaging
:END:
:PROPERTIES:
:type: enhancement
:dated: <2017-12-02 so>
:END:
package various code snippets to the regular package
modules
- string processsing
- sklearn custom transformers
and its AST (in json)
{
"type": "root",
"children": [
{
"type": "section",
"children": [
{
"type": "headline",
"children": [
{
"type": "text",
"children": [],
"value": "modularization of custom functions, speed-up and optimalization",
"parent": "[Circular ~.children.0.children.0]"
},
{
"type": "drawer",
"children": [],
"name": "LOGBOOK",
"value": "CLOCK: [2019-05-24 pรก 11:18]--[2019-05-24 pรก 11:18] => 0:00\n- general info on python modules and packaging",
"parent": "[Circular ~.children.0.children.0]"
},
{
"type": "drawer",
"children": [],
"name": "PROPERTIES",
"value": ":type: enhancement\n:dated: <2017-12-02 so>",
"parent": "[Circular ~.children.0.children.0]"
}
],
"level": 1,
"tags": [
"optimalition",
"module",
"package"
],
"parent": "[Circular ~.children.0]"
},
{
"type": "paragraph",
"children": [
{
"type": "text",
"children": [],
"value": "package various code snippets to the regular package modules",
"parent": "[Circular ~.children.0.children.1]"
}
],
"parent": "[Circular ~.children.0]"
},
{
"type": "list",
"children": [
{
"type": "list.item",
"children": [
{
"type": "text",
"children": [],
"value": "string processsing",
"parent": "[Circular ~.children.0.children.2.children.0]"
}
],
"ordered": false,
"parent": "[Circular ~.children.0.children.2]"
},
{
"type": "list.item",
"children": [
{
"type": "text",
"children": [],
"value": "sklearn custom transformers",
"parent": "[Circular ~.children.0.children.2.children.1]"
}
],
"ordered": false,
"parent": "[Circular ~.children.0.children.2]"
}
],
"ordered": false,
"parent": "[Circular ~.children.0]"
}
],
"level": 1,
"parent": "[Circular ~]"
}
],
"meta": {}
}
Currently, orga does not recognize [/] or [%] as separate objects in the headline. It currently only shows up as part of the headline text.
as requested at #36
The minimal example on https://github.com/xiaoxinghu/orgajs/tree/master/examples/example
repro on node 10.6.0
:
mkdir /tmp/orga-test
cd /tmp/orga-test
yarn add orga-unified oast-to-hast unified to-vfile vfile-reporter rehype-document rehype-stringify
cat > test.js <<EOF
var vfile = require('to-vfile')
var report = require('vfile-reporter')
var unified = require('unified')
var parse = require('orga-unified')
var mutate = require('oast-to-hast')
var stringify = require('rehype-stringify')
var doc = require('rehype-document')
unified()
.use(parse)
.use(mutate)
.use(doc, {title: 'Hi!'})
.use(stringify)
.process(vfile.readSync('./README.org'), function (err, file) {
console.error(report(err || file))
console.log(String(file))
})
EOF
cat > README.org <<EOF
* hello
world
EOF
node test.js
results in
/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31
var meta = tree.meta || {};
^
TypeError: Cannot read property 'meta' of undefined
at Function.toHAST (/tmp/orga-test/node_modules/oast-to-hast/lib/index.js:31:19)
at freeze (/tmp/orga-test/node_modules/unified/index.js:123:28)
at Function.process (/tmp/orga-test/node_modules/unified/index.js:360:5)
at Object.<anonymous> (/tmp/orga-test/test.js:14:4)
at Module._compile (internal/modules/cjs/loader.js:689:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
at Module.load (internal/modules/cjs/loader.js:599:32)
at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
at Function.Module._load (internal/modules/cjs/loader.js:530:3)
at Function.Module.runMain (internal/modules/cjs/loader.js:742:12)
Would you be able to provide some updated guidance on how to set up orga for a simple org-string to html conversion?
For context, I am playing around with https://github.com/xiaoxinghu/orgajs/blob/master/packages/orga/src/inline.js to address the performance in #6. I don't understand the unifiedjs toolchain so I am just trying to infer the structure from inline.js
alone. By using a character based parser and bypassing the regex for markups
we easily get a 5x speedup, which for large files is still not fast, but no longer prohibitive.
I think another problem with the regex matching is that syntax like /*italic-bold*/
and */bold-italic/*
will not get matched, although both are valid org. A stack based parser should be able to handle these, but I'm not familiar with the testing setup here, so I am looking for a way to get the rendered html.
FWIW, the parser I am using looks like this now:
// inline.js
// after `markups` is defined
var inlineMarkups = {};
markups.forEach(function(nameMarker) {
inlineMarkups[nameMarker.marker.replace('\\', '')] = nameMarker.name;
});
var StackParser = function(patterns) {
var self = this;
self.patterns = patterns;
var ch, last_marker;
self.parse = function(text) {
if(Array.isArray(text)) {
return text.reduce(function (all, node) {
if (node.hasOwnProperty('type') && node.type != 'text') {
return all.concat(node);
}
return all.concat(self.parse(node));
}, []);
}
var string = text.value;
var buffer = [];
var stack = [];
var nodes = [];
for(var i=0; i<string.length; ++i) {
ch = string[i];
if(stack.length > 0) {
last_marker = stack[stack.length-1];
}
if(ch == last_marker) {
// end markup
stack.pop();
if(buffer.length > 0) {
nodes.push(new _node2.default(patterns[ch], buffer.join('')));
buffer = [];
last_marker = null;
}
} else if(inlineMarkups[ch]) {
// begin markup
if(buffer.length > 0) {
nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
}
stack.push(ch);
buffer = [];
} else {
buffer.push(ch);
}
}
if(buffer.length > 0) {
nodes.push(new _node2.default('text').with({ value: buffer.join('')}));
}
return nodes;
};
};
...
for (var _iterator = markups[Symbol.iterator](), _step; !(_iteratorNormalCompletion = (_step = _iterator.next()).done); _iteratorNormalCompletion = true) {
var _ref = _step.value;
var name = _ref.name;
var marker = _ref.marker;
// bypass the old parser
break
_loop(name, marker);
}
var inlineparser = new StackParser(inlineMarkups);
text = inlineparser.parse(text);
I don't know if it emits the AST correctly. It works for trivial cases, but it breaks for example URLs (slashes become italics). Currently I am comparing the result of hast -> hiccup using an ad-hoc converter from a clojurescript environment, so the testing is tricky. FYI, my converter code is here, but the unified stuff is magic to me. That is why I'm trying the minimal example.
Hi. Really like your work.
I have a question regarding the paragraphs. When I parse the following two strings I become the same result.
const ast = parse(`
* This is heading
This is line one
This is line two
`);
and
const ast = parse(`
* This is heading
This is line one This is line two
`);
The both produce the same ast:
console.log(ast.children[0].children[1].children[0]);
// { type: 'text',
// children: [],
// value: 'This is line one This is line two',
// parent:
// { type: 'paragraph',
// children: [ [Circular] ],
// parent: { type: 'section', children: [Array], level: 1, parent: [Object] } } }
I would expect the value to have newlines for the first example. Maybe my logic is wrong but if we join with newline, instead with space here it should be okay.
Say if you would like a pull request and have a nice day!
In org mode #+TBLFMs need to be placed after the table they act on for them to have any effect. Orga does not respect this and places the #+TBLFM in the root meta. This is an issue because there is no way of know which table this formula is acting on. #+TBLFMs should really be placed in the table node instead of just thrown in the root node.
as requested at #36
The example readme.org
file is transformed into:
<div>This is an example project.</div><div>#+BEGIN_SRC sh npm install npm run build #+END_SRC</div><div>Take a look at <code>readme.html</code> file.</div>
These divs aren't able to be styled as <p>
components, and so css like paragraph spacing is not applied. It should render to:
<p>This is an example project.</p><p>#+BEGIN_SRC sh npm install npm run build #+END_SRC</p><p>Take a look at <code>readme.html</code> file.</p>
After package upgrade to gatsbyjs
version 2.3.4
I'm getting UNHANDLED REJECTION
when gatsby-transformer-orga
is enabled.
โฏ gatsby develop
success open and validate gatsby-configs โ 0.050 s
success load plugins โ 0.235 s
success onPreInit โ 0.188 s
success initialize cache โ 0.012 s
success copy gatsby files โ 0.033 s
success onPreBootstrap โ 0.010 s
success source and transform nodes โ 0.275 s
error UNHANDLED REJECTION
RangeError: Maximum call stack size exceeded
- RegExp.test
- date.js:93 looksLikeADate
[igloonet.hosting]/[gatsby]/dist/schema/types/date.js:93:109
- example-value.js:161 getType
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:161:14
- example-value.js:40 nodes.map.node
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:40:20
- Array.map
- example-value.js:38 Array.from.reduce
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:38:27
- Array.reduce
- example-value.js:37 getExampleObject
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44
- example-value.js:96 Array.from.reduce
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29
- Array.reduce
- example-value.js:37 getExampleObject
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44
- example-value.js:96 Array.from.reduce
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29
- Array.reduce
- example-value.js:37 getExampleObject
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:37:44
- example-value.js:96 Array.from.reduce
[igloonet.hosting]/[gatsby]/dist/schema/infer/example-value.js:96:29
- Array.reduce
Hi there,
The license is clear in all the respective package.json files, but it would be clearer to add a LICENSE.org file at the top level in the repository. It would also work better with various tools.
Hey there,
Thanks again for an awesome project.
Do you think it might be a good idea to allow overriding the AST parsers/transformers for configurability? If someone wanted to touch up the produced HTML they could simply add a transformer plugin to edit the resulting tree. I'm thinking something like this change.
I don't have strong opinions for how to configure use of plugins (where they should be placed, how they are fetched, etc.). I don't have a need for this, so this is more food for thought than anything else. Browsing through the other issues, I wonder if something like this might put the power in the end user's hands. The obvious downside is the potential for incompatibility.
Please let me know what you think.
Regards,
-Doug
Often it may make sense to have multiple sets of todo keywords in a file, but orga handles this incorrectly, with only the last parsed set of TODO keywords taking affect
Example from the org mode docs ( https://orgmode.org/manual/Per_002dfile-keywords.html )
#+TODO: TODO | DONE
#+TODO: REPORT BUG KNOWNCAUSE | FIXED
#+TODO: | CANCELED
Given how orga's updateTODOs function works, the previous TODOs are simply replaced with the newer ones. Instead, they should be collected then, if none are found before parsing headlines, it should default to ["TODO", "DONE"]
I noticed:
build a emacs-less cli for org-mode (working on it)
Wondering where I might find the WIP code.
just trying out the package (working towards this issue for a vscode plugin: vscode-org-mode/vscode-org-mode#133). Tests failed until I installed unist-builder
and prismjs
, so I'm assuming they need to be added to the dependencies list?
I noticed that for some files (e.g. > 10k lines) the parser practically cannot complete. This seems to be a function of number of lines and line lengths. For example, a large file with many blank lines can still complete. I don't know if this is specific to orga, or to unifyjs.
Here's a quick test on how the total parsing time for a trivial file changes with the line length, content, and line count:
parse = require("orga").parse;
function pad(n, width) {
n += '';
return n.length >= width ? n : new Array(width - n.length + 1).join('0') + n;
}
[
(n) => `_${n}___${n}_`,
(n) => `(${n})_(${n})`,
(_) => 'xxxxxx_xxxxxx',
(_) => 'xxxxxx_xxxxxx######_######',
].forEach(function(templater) {
for(nlines = 500; nlines <= 5000; nlines += 500) {
var lines = [
'* blah\n',
];
for(var n=1;++n<=nlines;) {
lines.push(templater(pad(n, 4)));
}
var text = lines.join('\n');
var t0 = new Date().getTime();
var ast = parse(text)
var tN = new Date().getTime();
var dt = tN - t0;
var lastLine = lines[lines.length-1];
console.log(`|\t${lastLine}\t|\t${nlines}\t|\t${dt}\t|`);
}
});
Plotting the output shows this:
I was wondering whether the same regex issue was affecting likes w ith (
and )
but it seems like that is not the case, but the lines with parens are clearly slower than lines without parens.
This is the table from the script (first column is added afterwards):
|under| _0500___0500_ | 500 | 162 |
|under| _1000___1000_ | 1000 | 626 |
|under| _1500___1500_ | 1500 | 1376 |
|under| _2000___2000_ | 2000 | 2474 |
|under| _2500___2500_ | 2500 | 3865 |
|under| _3000___3000_ | 3000 | 5475 |
|under| _3500___3500_ | 3500 | 7521 |
|under| _4000___4000_ | 4000 | 9850 |
|under| _4500___4500_ | 4500 | 12427 |
|under| _5000___5000_ | 5000 | 15190 |
|paren| (0500)_(0500) | 500 | 217 |
|paren| (1000)_(1000) | 1000 | 858 |
|paren| (1500)_(1500) | 1500 | 1955 |
|paren| (2000)_(2000) | 2000 | 3483 |
|paren| (2500)_(2500) | 2500 | 5439 |
|paren| (3000)_(3000) | 3000 | 7850 |
|paren| (3500)_(3500) | 3500 | 10711 |
|paren| (4000)_(4000) | 4000 | 13873 |
|paren| (4500)_(4500) | 4500 | 17560 |
|paren| (5000)_(5000) | 5000 | 21619 |
|xxxxx| xxxxxx_xxxxxx | 500 | 215 |
|xxxxx| xxxxxx_xxxxxx | 1000 | 854 |
|xxxxx| xxxxxx_xxxxxx | 1500 | 1923 |
|xxxxx| xxxxxx_xxxxxx | 2000 | 3429 |
|xxxxx| xxxxxx_xxxxxx | 2500 | 5471 |
|xxxxx| xxxxxx_xxxxxx | 3000 | 7843 |
|xxxxx| xxxxxx_xxxxxx | 3500 | 10564 |
|xxxxx| xxxxxx_xxxxxx | 4000 | 13859 |
|xxxxx| xxxxxx_xxxxxx | 4500 | 17445 |
|xxxxx| xxxxxx_xxxxxx | 5000 | 21576 |
|xand#| xxxxxx_xxxxxx######_###### | 500 | 815 |
|xand#| xxxxxx_xxxxxx######_###### | 1000 | 3211 |
|xand#| xxxxxx_xxxxxx######_###### | 1500 | 7253 |
|xand#| xxxxxx_xxxxxx######_###### | 2000 | 12967 |
|xand#| xxxxxx_xxxxxx######_###### | 2500 | 20093 |
|xand#| xxxxxx_xxxxxx######_###### | 3000 | 28973 |
|xand#| xxxxxx_xxxxxx######_###### | 3500 | 39765 |
|xand#| xxxxxx_xxxxxx######_###### | 4000 | 51717 |
|xand#| xxxxxx_xxxxxx######_###### | 4500 | 65081 |
|xand#| xxxxxx_xxxxxx######_###### | 5000 | 80055 |
Is there an easy way to fix this?
with the lastest version 0.2.7 on npm, links in a table cell does not work
const { parse } = require(`orga`);
const orgLink = `[[https://some.link][Some Link]]`;
var ast = parse(orgLink);
console.log(ast.children[0].children[0]); // this works fine
const orgLinkInTable = `
| Text | [[http://I.should.be.link][link in header]] |
|-------------+---------------------------------------------------------------|
| Text Text | [[https://I.should.be.link][link in row]] |
`;
ast = parse(orgLinkInTable);
console.log(ast.children[0].children[0].children[1].children[0]); // link in header
console.log(ast.children[0].children[2].children[1].children[0]); // link in body
output:
{ type: 'link',
children: [],
uri:
{ raw: 'https://some.link',
protocol: 'https',
location: '//some.link' },
desc: 'Some Link',
parent:
{ type: 'paragraph',
children: [ [Circular] ],
parent: { type: 'root', children: [Array], meta: {} } } }
{ type: 'text',
children: [],
value: '[[http://I.should.be.link][link in header]]',
parent:
{ type: 'tableCell',
children: [ [Circular] ],
parent: { type: 'tableRow', children: [Array], parent: [Object] } } }
{ type: 'text',
children: [],
value: '[[https://I.should.be.link][link in row]]',
parent:
{ type: 'tableCell',
children: [ [Circular] ],
parent: { type: 'tableRow', children: [Array], parent: [Object] } } }
I'm trying out orga for a project that would let me embed the parser in the client-side. Unfortunately, the regexps in the parser use backreferences - something only Chrome supports at the moment.
Is it possible to re-write those in a more browser-compliant way?
โค๏ธ
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.