condenast / atjson Goto Github PK
View Code? Open in Web Editor NEWatjson is a living content format for annotating content
Home Page: https://atjson.condenast.io
License: Apache License 2.0
atjson is a living content format for annotating content
Home Page: https://atjson.condenast.io
License: Apache License 2.0
We have a utility in our Commonmark renderer to adjust the boundaries of certain annotations when they would produce an invalid delimiter run. This logic had assumed that the rules for valid delimiter runs were the same regardless of what the specific delimiter character was, but this is not the case.
Here are the rules for delimiters, from least to most restrictive:
If the delimiter is ^
or ~
:
If the delimiter is *
, **
, or ~~
If the delimiter run is _
or __
Here are some examples of the correct behavior. Here square brackets represent the delimiter boundary, an underscore represents a whitespace character, and a dash represents a punctuation character:
Original | Split for ^, ~ | Split for *, **, ~~ | Split for _, __ |
---|---|---|---|
[_a_b] |
_[a_b] |
_[a_b] |
_[a_b] |
a[-b] |
a[-b] |
a-[b] |
a-[b] |
a[b_c] |
a[b_c] |
a[b_c] |
ab_[c] |
a[bc] |
a[bc] |
a[bc] |
abc[] |
During our meeting discussion, we had a discussion around improving developer ergonomics of the @atjson/renderer-react
package, and wanted to make subdocuments usable in a more natural React way.
This change requires looking at an annotation to see if it has subdocuments
, rendering the subdocument using the components given to the top level render, and then returning that property. This does change compatibility using AttributesOf
type definition.
So, to summarize, using an annotation like so:
import { ObjectAnnotation } from "@atjson/document";
export class Image extends ObjectAnnotation<{
src: string;
caption: CaptionSource;
}> {
static vendorPrefix = "test";
static type = "image";
static subdocuments = { caption: CaptionSource };
}
Previously, the React component would require calling ReactRenderer.render
again:
import { AttributesOf } from "@atjson/document";
import ReactRenderer from "@atjson/renderer-react";
import * as React from "react";
import { FC } from "react";
import { Image as Annotation } from "./annotation";
import components from '../components';
export const Image: FC<AttributesOf<Annotation>> = props => {
return (
<figure>
<img src={props.src} />
<figcaption>{ReactRenderer.render(props.caption, components)}</figcaption>
</figure>
);
};
With this proposal, this code could be simplified:
Previously, the React component would require calling ReactRenderer.render
again:
import { AttributesOf } from "@atjson/renderer-react";
import * as React from "react";
import { FC } from "react";
import { Image as Annotation } from "./annotation";
export const Image: FC<AttributesOf<Annotation>> = props => {
return (
<figure>
<img src={props.src} />
<figcaption>{props.caption}</figcaption>
</figure>
);
};
To summarize, the suggested changes here are:
@atjson/renderer-react
for a React-aware AttributesOf
ReactRenderer.render
on any subdocuments that are found.Currently we don't have this readily available for our performance profiles, which makes it a bit difficult in cases like #394, where function names were changed with the change. The cumulative summary with the confidence interval would help in this case so we're more aware of the affect of the changes.
When the annotation being compared has additional attributes, Annotation.equals
returns a false positive.
Annotation.equals
should return false
when the attributes are not strictly equal.
We were seeing usage of shortClose
causing parsing issues due to short closed elements being restricted in the HTML spec to void elements.
The shortClose
property should be removed from the $
method in the HTML renderer and we should short close automatically according to whether the tag name is in the list of valid void elements.
Commonmark has a concept of "tight" lists, which is a shorthand that says whether there should be surrounding whitespace in a list item or not. We've adopted this into the offset-annotations
package, which I believe is a mistake. It has the potential of causing changes in behavior because a client is unaware of how they should render the markup with / without the tight
parameter.
In addition to this, it changes how annotations should be rendered according to the outer context.
The proposal is to wrap list items to wrap list items in paragraphs if tight
is false in the converter from markdown to atjson.
The where
API (for documents, not collections) is like a relational db api, so indexes make sense for us to keep. We shouldn't create a dsl (currently) for creating indexes.
Reindexing has to happen less than we call where
queries to get the performance benefits.
This will only be beneficial if the index is long-lived. (cf. if the index exists on a collection, the index isn't super useful). In the case of converters, we create a new collection and discard the collection along with the annotations. To see a large improvement with this, we may need to rearchitect consuming code.
Cache busting will be hard, and we'll need to cache-bust the indexes when addAnnotations
, removeAnnotations
, and replaceAnnotations
is called.
Most of the where
queries we do are faceted by type
, and we suspect that indexing by type
will make all consuming code of atjson faster if queries are indexed by type
and we could facet query results by type at the cost of O(1)
.
⚠️ The functional API will not use this and this should not be a breaking change to atjson.
We're working on a CKEditor integration, and strikethrough is failing to work.
A strikethrough annotation is rendered into <s>strikethrough</s>
Strikethroughs are rendered as strikethrough
(no tags).
This is a master issue to work through the Condé Nast Open Source Checklist.
Some annotation definitions don't have an annotation schema defined on them.
The following packages have incomplete / incorrectly defined annotation definitions on them:
@atjson/source-gdocs-paste
@atjson/source-mobiledoc
@atjson/source-prism
@atjson/source-url
When I visit https://github.com/CondeNast-Copilot/atjson and click on the link in the repo description : https://condenast-copilot.github.io/atjson
I get redirected to https://condenast-copilot.github.io/atjson/latest and a 404
I expect to see the latest docs, like : https://condenast-copilot.github.io/atjson/v0.8.2 ?
404 page
N/A
github
firefox, ubuntu (not relevant here)
As a kickoff of #183, let's have a discussion of how schemas should be structured.
In the branch that I created, the schema interface looks like:
interface SchemaDefinition {
type: string;
version: string;
annotations: {
[key: string]: typeof Annotation;
}
}
The other option here is the most minimal approach, which is an annotation lookup table:
interface SchemaDefinition {
[key: string]: typeof Annotation;
}
Before we start writing this, we want to solicit some feedback and ensure that we understand requirements that we may want on this.
Some general questions:
type
to retain this behaviour, or should we sunset that pattern?@bachbui, @colinarobinson, @blaine have all contributed to this discussion prior to this issue being opened
❤️ Thank you ❤️
In a Google Doc, you can create a nested list like
1. List item
2. List item
a. Nested list item
b. Nested list item
3. List item
This is represented in GDocs as a single list with 5 list items, where the outer items have attributes ls_nest: 0
and in nested items have attributes ls_nest: 1
. When converting this from the GDocs source to Offset, we drop the ls_nest
attribute and just produce a list with 5 elements.
Produce annotations like:
List item\nList item\nNested list item\nNested list item\nList item
^-----item-----^ ^-----item-----^
^------ List { level: 2 } -------^
^-item--^ ^-item--^ ^------------- item -------------^ ^-item--^
^---------------------- List { level: 1 } ------------------------^
Produced annotations like:
List item\nList item\nNested list item\nNested list item\nList item
^-item--^ ^-item--^ ^-----item-----^ ^-----item-----^ ^-item--^
^----------------------------- List ------------------------------^
Software | Version(s) |
---|---|
Node | 10 |
Lerna | 3.2 |
npm | 6.4 |
Browser | Chrome 80 |
I'd like to propose to primarily use the Pandoc document model for source and rendering. Pandoc is a universal document converter covering Markdown, Office, TeX and many more. It is being developed and used heavily since years so it also covers most edge cases and pitfalls of these formats. Pandoc internally converts document formats to its document model which can be read and written as JSON, e.g.:
echo '# Hello _World_' | pandoc -t json
For reference of the model see this Haskell package. Maybe this mapping to Perl I've written a few years ago, is also of use. So the workflow for converting documents to and from atjson would be:
When atjson specification will be finished, support of atjson could also be added to the Pandoc source code.
What do you think about use of Pandoc for conversion from and to atjson?
This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.
Warning
These dependencies are deprecated:
These updates are awaiting their schedule. Click on a checkbox to get an update now.
@typescript-eslint/eslint-plugin
, @typescript-eslint/parser
, eslint
, eslint-config-prettier
, eslint-plugin-jest
)@types/react
, @types/react-dom
, react
, react-dom
)@docusaurus/core
, @docusaurus/preset-classic
)@typescript-eslint/eslint-plugin
, @typescript-eslint/parser
, eslint
, eslint-config-prettier
, eslint-plugin-jest
, eslint-plugin-prettier
)@ckeditor/ckeditor5-build-classic
, @ckeditor/ckeditor5-engine
, @commitlint/cli
, @commitlint/config-conventional
, @types/markdown-it
, @types/node
, @types/parse5
, @types/prettier
, @wordpress/shortcode
, actions/checkout
, actions/github-script
, actions/setup-node
, actions/upload-artifact
, conventional-changelog-core
, entities
, husky
, jsdom
, lerna
, lint-staged
, prettier
, react
, react-dom
)These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
@ckeditor/ckeditor5-build-classic
, @ckeditor/ckeditor5-engine
).github/workflows/ci.yml
actions/checkout v3
actions/setup-node v3
actions/checkout v3
actions/setup-node v3
.github/workflows/docs.yml
actions/checkout v3
actions/setup-node v3
.github/workflows/perf.yml
actions/setup-node v3
actions/checkout v3
actions/checkout v3
actions/github-script v6
actions/upload-artifact v3
.github/workflows/prerelease.yml
actions/github-script v6
actions/github-script v3
actions/checkout v3
actions/setup-node v3
actions/github-script v6
actions/github-script v6
.github/workflows/release.yml
actions/checkout v3
actions/setup-node v3
package.json
@babel/core 7.24.7
@babel/plugin-proposal-class-properties 7.18.6
@babel/preset-env 7.24.7
@babel/preset-react 7.24.7
@babel/preset-typescript 7.24.7
@ckeditor/ckeditor5-build-classic 37.0.1
@ckeditor/ckeditor5-engine 35.3.2
@commitlint/cli 17.8.1
@commitlint/config-conventional 17.8.1
@condenast/perf-kit 0.1.4
@types/chance 1.1.6
@types/entities 2.0.0
@types/jest 29.5.12
@types/jsdom 21.1.7
@types/markdown-it 12.2.3
@types/minimist 1.2.5
@types/node 18.19.39
@types/parse5 6.0.3
@types/prettier 2.7.3
@types/react 18.2.70
@types/react-dom 18.2.22
@types/sax 1.2.7
@types/wordpress__shortcode 2.3.6
@typescript-eslint/eslint-plugin 5.58.0
@typescript-eslint/parser 5.58.0
babel-jest 29.7.0
chance 1.1.11
commonmark 0.31.0
commonmark-spec 0.31.2
conventional-changelog-core 4.2.4
eslint 8.38.0
eslint-config-prettier 8.8.0
eslint-plugin-jest 27.2.1
eslint-plugin-prettier 4.2.1
husky 8.0.3
jest 29.7.0
jest-environment-jsdom 29.7.0
jsdom 21.1.2
lerna 6.6.2
lint-staged 13.2.3
markdown-it 14.1.0
minimist 1.2.8
prettier 2.8.8
react 17.0.2
react-dom 17.0.2
ts-loader 9.4.4
typescript 5.4.5
uuid-random 1.3.2
packages/@atjson/document/package.json
uuid-random ^1.3.0
packages/@atjson/hir/package.json
packages/@atjson/offset-annotations/package.json
packages/@atjson/react/package.json
react *
packages/@atjson/renderer-commonmark/package.json
packages/@atjson/renderer-graphviz/package.json
packages/@atjson/renderer-hir/package.json
packages/@atjson/renderer-html/package.json
entities ^4.3.1
packages/@atjson/renderer-plain-text/package.json
packages/@atjson/renderer-react/package.json
react *
packages/@atjson/renderer-webcomponent/package.json
packages/@atjson/source-ckeditor/package.json
packages/@atjson/source-commonmark/package.json
entities ~4.5.0
markdown-it 14.1.0
packages/@atjson/source-gdocs-paste/package.json
packages/@atjson/source-html/package.json
parse5 ^7.1.2
packages/@atjson/source-mobiledoc/package.json
packages/@atjson/source-prism/package.json
@types/sax ^1.2.7
entities ^4.5.0
sax ^1.3.0
packages/@atjson/source-url/package.json
packages/@atjson/source-wordpress-shortcode/package.json
@wordpress/shortcode 3.54.0
packages/@atjson/util/package.json
website/package.json
@babel/plugin-proposal-class-properties 7.18.6
@babel/plugin-proposal-object-rest-spread 7.20.7
@babel/preset-typescript 7.24.7
@docusaurus/core 3.0.0
@docusaurus/preset-classic 3.0.0
classnames 2.3.2
react 18.2.0
react-dom 18.2.0
resize-observer 1.0.4
styled-components 6.1.11
@types/react 18.2.70
@types/styled-components 5.1.34
.nvmrc
node 20.12.2
The character position within a Unicode string depends on whether the string is normalized and which Unicode normalization form is used. atjson should specify to normalize content strings to avoid character position mismatch and to ensure same content results in same character sequence.
When are two content strings assumed to be equivalent? Does atjson recommend or require Unicode normalization form and which?
I recommend NFC (Normalization Form Canonical Composition).
We currently have some fairly rudimentary performance regression testing in renderer-commonmark
. It turns out that it's really hard to catch these, because they don't cause any test failures, nor does it give any indication of whether the change in atjson caused any relative change in performance.
The proposal here is the following:
tests
folderImplement schema interface as decided in #311
Please include type "macros" for grabbing annotation names and annotation classes.
Given a schema:
import { Bold, Italic } from "@atjson/offset-annotations";
const MySchema = {
annotations: {
Bold,
Italic
}
};
The annotations name type should return a type of "Bold" | "Italic"
and the annotations class type should return typeof Bold | typeof Italic
.
You can reference the document-in-test branch on this repository for some examples of how to do this. Ask @tim-evans if you have questions on handling this via conditional types.
Conceptually, renderers are a more general case of converters, being essentially a function from a document to any type. With the idea from #285 of adding additional safeguards and guarantees around renderer implementations, writing converters as renderers could help ensure that the converter satisfies some useful properties such as handling all the possible annotations in the source document.
(Written by @colinarobinson ❤️)
There is potential to use atjson strictly as a document manipulation tool for any format for which we have a source defined. For example:
// Add target "_blank" to all links
let doc = HTMLSource.fromRaw(someHtmlString);
let links = doc.where({type: "-html-a"});
links.update(link => {
link.attributes.target = "_blank";
});
One thing that makes this difficult currently is that in order to render this back to HTML, we have to convert the document to a common format, which seems unnecessary. It would be nice if each source could define its own "natural" or "native" renderer which acts on the schema of the source rather than on a common format schema. We have currently only been writing renderers acting on a common format schema in order to avoid the temptation of writing converters between every pair of document formats, but it seems this proposed natural renderer still is in line with that philosophy since it only acts on the source schema.
Currently, there is nothing stopping users from adding these renderer definitions. I wonder if we would want to formalize it in the api somehow:
// Add target "_blank" to all links
let doc = HTMLSource.fromRaw(someHtmlString);
let links = doc.where({type: "-html-a"});
links.update(link => {
link.attributes.target = "_blank";
});
let newHtmlString = doc.render(); // calls the natural renderer of the source doc
It is possible that the natural rendering of a document produces different results than converting to a common format and using the existing renderers. I'm not sure if that is a problem or not.
Converter
Source ---> Common Format
\ |
\ |
\ | HTMLRenderer
Natural \ |
Renderer \ |
=== html?
We often have to determine whether two documents are equivalent, which is exposed as document.equals()
. This is implemented by comparing the canonical versions of the documents and checking that their content
and annotations
are equal. For annotations, this equivalency is implemented by checking their start
and end
positions match, and then doing a deep comparison of their attributes
properties.
However, an annotation might have some properties, particularly in their attributes
, which might not represent a meaningful difference. For example, if an annotation was created during a conversion, it is sometimes useful to include some properties from the original annotation in the converted version as signposts for verification. These properties should be ignored when determining if two annotations are equivalent.
It's currently possible to override equals
on the annotation but we could provide nicer hooks. One possibility is to add a declarative API to annotations where one could list these 'non-data' attributes.
There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.
Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.
Package names and project layout after #57 is merged.
Annotation classes begin the process of binding sources and renderers more tightly together. It's becoming clearer to me that it's easier to move renderers and sources into the same package.
renderer-commonmark
+ source-commonmark
= format-commonmark
There's also some questions about where render code for annotations should go. There's some thoughts that rendering code for annotations should live next to annotations. A possible file structure for this is:
📁 annotations
📁 bold
📄 annotation.ts
📄 component.ts
📄 style.css
📄 template.html
This is a tracking issue for all documentation that needs some love!
Please comment or add your handle next to the line item that you're taking ❤️
Our brands use Apple News as a distribution, and we have expertise in publishing to Apple News. We have a service at Condé Nast that supports this, and I'm interested in outlining a more holistic approach to Apple News that provides better support for folks.
From a high level, I'd like to provide the following:
article.json
that can be sent to Apple for use on the News app.article.json
using a React application by leveraging atjson's React renderer.article.json
so there can be rich preview (and eventual addition / manipulation) using atjsonI played around with type definitions for the Apple News Format, and ended up creating a programmatically generated type definition file from Apple's own documentation, which can be found in this gist. The code used to generate this file can be found here.
Using these definitions, we can create annotations for the Apple News format. I recommend that the renderer and source for Apple News use the same annotations. This means that if you want to render to Apple News, you should convert your schema into the Apple News annotations.
At a high level, I expect us to have annotations for Components and the AricleDocument. There's some additional parsing for handling inline formatting provided by HTML or markdown, which is fairly minimal (bold, italic, strikethrough, links, and a few others).
We can then use the React renderer to render a facsimile of how the article would appear on the News app. This would involve a bunch of work building out components for Apple News. For our teams at Condé, it would be very beneficial, because they can preview their content as it would (approximately) appear on Apple News before publishing or sending it to Apple.
In addition, they would be able to catch errors with the document because the preview would using the same toolchain as our service delivering the article.json
to Apple.
The goal here is to:
I was asked by @balaclark about writing a test for writing a test against a react component that maps to the Image annotation, and looked into the commonmark spec again for what the alt text should do. It turns out that commonmark expects the alt text to have markdown stripped when parsing.
Image#attributes.description
should be a string
.
Note: We are also seeing failures related to this on our nightly job that verifies that we can rewrite our content using converters that we've defined.
Image#attributes.description
is a Document
and renders markdown, which is supposed to be stripped according to https://spec.commonmark.org/0.29/#image-description
![Markdown **is stripped** from *this*](test.jpg)
{
"content": "",
"contentType": "application/vnd.atjson+commonmark",
"annotations": [
{
"type": "-commonmark-image",
"start": 0,
"end": 0,
"attributes": {
"-commonmark-src": "test.jpg",
"-commonmark-alt": "Markdown is stripped from this"
}
},
{
"type": "-commonmark-paragraph",
"start": 0,
"end": 0,
"attributes": {}
}
]
}
@neilius encountered an issue where it's impossible to insert text between two adjacent annotations, and have the text not covered by either.
Given two annotations, { start: 10, end: 20 }
and { start: 20, end: 30 }
, it should be possible to insert text at position 20
and have the resulting annotations be { start: 10, end: 20 }
and { start: 21, end: 31 }
This is not the default insertion behaviour, but it should be an option.
Instead, we're able to modify the boundary behaviour for one or the other annotations, but not both. The default behaviour of insertText
results in { start: 10, end: 21 }
and { start: 21, end: 31 }
(the first annotation is extended right, and the second annotation's coverage is unmodified), and the AdjacentBoundaryBehaviour.preserve
behaviour results in { start: 10, end: 20 }
and { start: 20, end: 31 }
(i.e., the first annotation is unmodified and second annotation is extended left to include the new text).
Before: https://files.slack.com/files-pri/T5Y8VC3HU-FT9GKNJ9M/before.json
After (with preserve
): https://files.slack.com/files-pri/T5Y8VC3HU-FT7MA4HMK/after.json
@tim-evans @neilius and I discussed, and agreed that a good possible solution is to add a (backwards-compatible) way to specify boundary behaviour for text insertion for adjacent annotations both before and after the insertion. This should clarify the relevant bit of code, since currently both before and after boundaries are handled in the same method, despite being subtly different in their handling.
A lot of websites have embed codes to embed their content directly into other apps, and we currently don't do anything smart about it for atjson.
The proposal here is to detect and convert a set of very common embed codes so folks can paste these codes directly in and have it be understood correctly.
<iframe width="560" height="315" src="https://www.youtube.com/embed/BriBDiBxaMY" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-permalink="https://www.instagram.com/p/B3M8RM-HqdP/?utm_source=ig_embed&utm_campaign=loading" data-instgrm-version="12" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:540px; min-width:326px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:16px;"> <a href="https://www.instagram.com/p/B3M8RM-HqdP/?utm_source=ig_embed&utm_campaign=loading" style=" background:#FFFFFF; line-height:0; padding:0 0; text-align:center; text-decoration:none; width:100%;" target="_blank"> <div style=" display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style=" background-color: #F4F4F4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div></div></div><div style="padding: 19% 0;"></div> <div style="display:block; height:50px; margin:0 auto 12px; width:50px;"><svg width="50px" height="50px" viewBox="0 0 60 60" version="1.1" xmlns="https://www.w3.org/2000/svg" xmlns:xlink="https://www.w3.org/1999/xlink"><g stroke="none" stroke-width="1" fill="none" fill-rule="evenodd"><g transform="translate(-511.000000, -20.000000)" fill="#000000"><g><path d="M556.869,30.41 C554.814,30.41 553.148,32.076 553.148,34.131 C553.148,36.186 554.814,37.852 556.869,37.852 C558.924,37.852 560.59,36.186 560.59,34.131 C560.59,32.076 558.924,30.41 556.869,30.41 M541,60.657 C535.114,60.657 530.342,55.887 530.342,50 C530.342,44.114 535.114,39.342 541,39.342 C546.887,39.342 551.658,44.114 551.658,50 C551.658,55.887 546.887,60.657 541,60.657 M541,33.886 C532.1,33.886 524.886,41.1 524.886,50 C524.886,58.899 532.1,66.113 541,66.113 C549.9,66.113 557.115,58.899 557.115,50 C557.115,41.1 549.9,33.886 541,33.886 M565.378,62.101 C565.244,65.022 564.756,66.606 564.346,67.663 C563.803,69.06 563.154,70.057 562.106,71.106 C561.058,72.155 560.06,72.803 558.662,73.347 C557.607,73.757 556.021,74.244 553.102,74.378 C549.944,74.521 548.997,74.552 541,74.552 C533.003,74.552 532.056,74.521 528.898,74.378 C525.979,74.244 524.393,73.757 523.338,73.347 C521.94,72.803 520.942,72.155 519.894,71.106 C518.846,70.057 518.197,69.06 517.654,67.663 C517.244,66.606 516.755,65.022 516.623,62.101 C516.479,58.943 516.448,57.996 516.448,50 C516.448,42.003 516.479,41.056 516.623,37.899 C516.755,34.978 517.244,33.391 517.654,32.338 C518.197,30.938 518.846,29.942 519.894,28.894 C520.942,27.846 521.94,27.196 523.338,26.654 C524.393,26.244 525.979,25.756 528.898,25.623 C532.057,25.479 533.004,25.448 541,25.448 C548.997,25.448 549.943,25.479 553.102,25.623 C556.021,25.756 557.607,26.244 558.662,26.654 C560.06,27.196 561.058,27.846 562.106,28.894 C563.154,29.942 563.803,30.938 564.346,32.338 C564.756,33.391 565.244,34.978 565.378,37.899 C565.522,41.056 565.552,42.003 565.552,50 C565.552,57.996 565.522,58.943 565.378,62.101 M570.82,37.631 C570.674,34.438 570.167,32.258 569.425,30.349 C568.659,28.377 567.633,26.702 565.965,25.035 C564.297,23.368 562.623,22.342 560.652,21.575 C558.743,20.834 556.562,20.326 553.369,20.18 C550.169,20.033 549.148,20 541,20 C532.853,20 531.831,20.033 528.631,20.18 C525.438,20.326 523.257,20.834 521.349,21.575 C519.376,22.342 517.703,23.368 516.035,25.035 C514.368,26.702 513.342,28.377 512.574,30.349 C511.834,32.258 511.326,34.438 511.181,37.631 C511.035,40.831 511,41.851 511,50 C511,58.147 511.035,59.17 511.181,62.369 C511.326,65.562 511.834,67.743 512.574,69.651 C513.342,71.625 514.368,73.296 516.035,74.965 C517.703,76.634 519.376,77.658 521.349,78.425 C523.257,79.167 525.438,79.673 528.631,79.82 C531.831,79.965 532.853,80.001 541,80.001 C549.148,80.001 550.169,79.965 553.369,79.82 C556.562,79.673 558.743,79.167 560.652,78.425 C562.623,77.658 564.297,76.634 565.965,74.965 C567.633,73.296 568.659,71.625 569.425,69.651 C570.167,67.743 570.674,65.562 570.82,62.369 C570.966,59.17 571,58.147 571,50 C571,41.851 570.966,40.831 570.82,37.631"></path></g></g></g></svg></div><div style="padding-top: 8px;"> <div style=" color:#3897f0; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:550; line-height:18px;"> View this post on Instagram</div></div><div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"><div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #F4F4F4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #F4F4F4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div></div><div style="margin-left: 8px;"> <div style=" background-color: #F4F4F4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style=" width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg)"></div></div><div style="margin-left: auto;"> <div style=" width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style=" background-color: #F4F4F4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style=" width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div></div></div></a> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/B3M8RM-HqdP/?utm_source=ig_embed&utm_campaign=loading" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">4 of 6 “I consider myself [to be] a crazy cat lady: I have six cats and a dog. I have a strong connection with them. They call me the ‘cat whisperer’. We understand each other without understanding each other. We don’t need language. It’s another level of love. ⠀ “I have a [tattoo of] cat on my arm that says: ‘Doing things? No thanks.’ I have an Arabic tattoo that says: ‘Why are you frowning?’ I have a cartoon by Coco Capitán that says: ‘Is it tomorrow yet?’ I have a lot of sarcastic tattoos. I know this is a bit macabre—but I know I’m going to die. I know that my body is something I can play with in the meantime.” ⠀ Follow Beirut-based model @noursaliba_ and her story this week on @vogue.</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by <a href="https://www.instagram.com/vogue/?utm_source=ig_embed&utm_campaign=loading" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> Vogue International</a> (@vogue) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2019-10-04T16:00:27+00:00">Oct 4, 2019 at 9:00am PDT</time></p></div></blockquote> <script async src="//www.instagram.com/embed.js"></script>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">i just keep getting hotter and smarter</p>— skelejenn (@jennschiffer) <a href="https://twitter.com/jennschiffer/status/708888255828250625?ref_src=twsrc%5Etfw">March 13, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<iframe src="https://www.facebook.com/plugins/post.php?href=https%3A%2F%2Fwww.facebook.com%2FCondeNastTraveler%2Fposts%2F10157914140568982&width=500" width="500" height="491" style="border:none;overflow:hidden" scrolling="no" frameborder="0" allowTransparency="true" allow="encrypted-media"></iframe>
<iframe>
code.We currently have a pretty dope profiling framework thanks mostly to @bachbui and @colinarobinson, which we'd like to use to profile other parts of the atjson ecosystem that we have. (Also for other JS libs that we'd like to profile!)
Tasks:
#269 introduced some checking to atjson that was resilient to whether the annotation constructor matched. We have also found that during our testing that instanceof
is a fairly expensive operation and we would like to do this as little as possible.
I had the idea to generalize the problem of isBold
/ isItalic
to an is
method:
function is<T extends AnnotationConstructor>(
annotation: Annotation<any>,
Class: T
): annotation is InstanceType<T> {
let AnnotationClass = annotation.constructor as AnnotationConstructor;
return AnnotationClass.vendorPrefix === Class.vendorPrefix &&
annotation.type === Class.type;
}
We could then use this like so:
import { is } from "@atjson/document";
if (is(annotation, Bold)) {
// This is now type narrowed to "Bold"!
}
Currently, when we're doing conversions, we have a big grab bag of attributes that are semi-messily put into the same property on the annotation— attributes
.
We've encountered a bunch of issues doing so, where we've added properties that aren't meaningful to the output to the annotation for metrics purposes, which have caused some issues with determining annotation equality. (Side note: we're using canonical document equality to determine whether we can edit a document in our rich text editor)
Another issue that has come up a few times is a TypeScript issue where we are casting an attribute as any
to get around the type system, because we're using one of these "grab bag" attributes.
The suggestion here is to add a graveyard / tombstoned attributes property to the Annotation class that will serve as a way to safely access these attributes. These attributes will not affect annotation equality (that is, if two annotations are identical except for their tombstoned attributes, they will be treated as if they were equal).
Add a new property to annotations that is set on initialization of an annotation called $attributes
. This will contain all attributes that are in a different "vendor space" than the current annotation. (This is the first pass of this feature, since doing this more correctly requires us to have a schema that we can directly read from in the code).
abstract class Annotation<Attributes = {}> {
id: string;
start: number;
end: number;
attributes: Attributes;
$attributes: { [key: string]: unknown };
}
$attributes
will be serialized as-is into JSON, resulting in a JSON object that look like: attributes & $attributes
.
For example:
import HTMLSource from "@atjson/source-html";
import OffsetSource from "@atjson/offset-annotations";
let doc = HTMLSource
.fromRaw(`<h1 style="text-align:center;">Guilt</h1>`)
.convertTo(OffsetSource);
let [heading] = doc.annotations.where({ type: "-offset-heading" });
console.log(heading);
// Heading {
// start: 0,
// end: 41,
// attributes: {},
// $attributes: {
// -html-style: "text-align:center;"
// }
// }
Following up on #307, we should handle a longer list of unicode whitespace characters:
Name | Code Point | Entity | Size |
---|---|---|---|
No Break Space | \u00A0 |
|
👉 👈 |
En Quad | \u2000 |
  |
👉 👈 |
Em Quad | \u2001 |
  |
👉 👈 |
En Space | \u2002 |
  |
👉 👈 |
Em Space | \u2003 |
  |
👉 👈 |
Thick Space | \u2004 |
  |
👉 👈 |
Mid Space | \u2005 |
  |
👉 👈 |
Six-per-em Space | \u2006 |
  |
👉 👈 |
Figure Space | \u2007 |
  |
👉 👈 |
Punctuation Space | \u2008 |
  |
👉 👈 |
Thin Space | \u2009 |
  |
👉 👈 |
Hair Space | \u200A |
  |
👉 👈 |
Zero Width Space | \u200B |
​ |
👉👈 |
Narrow No-break Space | \u202F |
  |
👉 👈 |
Medium Mathematical Space | \u205F |
  |
👉 👈 |
Ideographic Space | \u3000 |
  |
👉 👈 |
Zero Width No-break Space | \uFEFF |
 |
👉👈 |
I think this is a fairly exhaustive list of spaces, but if any more should be added, please comment 😄
Commented-out code has been removed from the codebase and has found a more appropriate location.
While looking at the text insertion, I noticed a few instances of commented-out code still lingering in the codebase. We should get rid of this, either relying on version control for instances where we need to reference this code in the future or moving it to documentation if it's intended to illustrate functionality for users of the library.
A few examples: in insertText and in deleteText
As per a discussion that we had here, we were finding that it's particularly annoying when unknown annotations occur, and when Rendering cases aren't handled.
Unknown annotations are created automatically by atjson, and we'd like to make this an explicit decision by the user, because often this is not intentional and causes bugs.
Sussing out what to do about this structurally will take some time, but for now, we'd like to do the following:
UnknownAnnotation
in a Renderer.@atjson/renderer-react
.We expect that these will make the ergonomics of rendering and converting a bit easier.
I'm curious about what, if anything, there is to do about no longer relevant parse tokens after a document has been modified.
Let's say we have the following Markdown:
{
"body": "I **feel** happy."
}
When we ingest it and look at the document annotations, we have:
[ { type: 'parse-token',
start: 0,
end: 1,
attributes: { type: 'paragraph_open' } },
{ type: 'parse-token',
start: 3,
end: 4,
attributes: { type: 'strong_open' } },
{ type: 'parse-token',
start: 8,
end: 9,
attributes: { type: 'strong_close' } },
{ type: 'bold', start: 3, end: 9, attributes: {} },
{ type: 'parse-token',
start: 16,
end: 17,
attributes: { type: 'paragraph_close' } },
{ type: 'paragraph', start: 0, end: 17, attributes: {} } ]
Now if we modify the document and delete the bold
type annotation, the strong_
parse-token
type annotations remain.
doc.where({ type: 'bold' }).transform(bold => doc.removeAnnotation(bold as Annotation));
[ { type: 'parse-token',
start: 0,
end: 1,
attributes: { type: 'paragraph_open' } },
{ type: 'parse-token',
start: 3,
end: 4,
attributes: { type: 'strong_open' } },
{ type: 'parse-token',
start: 8,
end: 9,
attributes: { type: 'strong_close' } },
{ type: 'parse-token',
start: 16,
end: 17,
attributes: { type: 'paragraph_close' } },
{ type: 'paragraph', start: 0, end: 17, attributes: {} } ]
This could lead to confusion or bloated documents down the road, but is this likely enough to be a concern?
Currently, our type checking around annotation constructors isn't strict enough and allows invalid construction of annotations.
Instantiating a new annotation should be a type error when the annotation requires attributes
and the attributes are omitted from the output.
When rendering a document to HTML, quoted strings aren't properly escaped.
Text should be escaped when rendering to HTML.
import OffsetSource, { Link } from "@atjson/offset-annotations";
import HTMLRenderer from "@atjson/renderer-html";
let doc = new OffsetSource({
content: "Malika Favre’s “Sweeping Into Fall”",
annotations: [
new Link({
start: 0,
end: 35,
attributes: {
url: "https://www.newyorker.com/culture/cover-story/cover-story-2019-09-09",
title: "Malika Favre’s \"Sweeping Into Fall\""
}
})
]
});
HTMLRenderer.render(doc);
Expected result:
<a
href="https://www.newyorker.com/culture/cover-story/cover-story-2019-09-09"
title="Malika Favre’s \"Sweeping Into Fall\"">Malika Favre’s “Sweeping Into Fall”</a>
I'm keen to learn how to use this - had spoken with @blaine about it at a recent p2p gathering. I've visited this repo a couple of times but got swamped reading the source.
Here's my current best attempt from reading and copying examples from in the code. It's not yet working / I don't know what it looks like when it is working:
import Document from '@atjson/document'
// Web components in the registry can't be redefined,
// so reload the page on every change
if (module.hot) {
module.hot.dispose(() => {
window.location.reload()
})
}
document.addEventListener('DOMContentLoaded', () => {
let editor = document.querySelector('offset-editor')
let doc = new Document({
content: 'Some text that is both bold and italic plus something after.',
annotations: [
{ type: 'bold', display: 'inline', start: 23, end: 31 },
{ type: 'link', display: 'inline', start: 20, end: 24, attributes: { url: 'https://google.com' } },
{ type: 'italic', display: 'inline', start: 28, end: 38 },
{ type: 'underline', display: 'inline', start: 28, end: 38 },
{ type: 'paragraph', display: 'block', start: 0, end: 61 }
]
})
editor.setDocument(doc)
console.log('done!')
})
I get an error:
TypeError: options.annotations is undefined
Are these modules still in use? Are they being deprecated?
I would love a demo repo showing me how to wire things together. My ideal use case is being able to have a rich accessible editor, a way to output the content/annotations, then a way to render content/annotations once it's "published" (to scuttlebutt).
I care about this because we need to move beyond markdown to be more accessible for a wider range of peers ... and the promise of tidy extensibility is really exciting.
Right now the Document is the primary typed entity in atjson, which works fairly well in practice but is conceptually a little confusing. When we convert documents between different sources what we're really converting is the collection of annotations associated with that document. I think it would be conceptually simpler if the notion of a type or annotation schema existed primarily at the level of the annotation.
I propose adding a few types to the library:
AnnotationCollection<T extends AnnotationSchema>
with public members AnnotationCollection<T>.convertTo(schema: <U extends AnnotationSchema>) : AnnotationCollection<U>
and AnnotationCollection<T>.in(schema: <U extends AnnotationSchema>) : AnnotationCollection<U>
convertTo(schema: <U extends AnnotationSchema> : AnnotationCollection<U>
is Document.convertTo
, just scoped to work on a list of annotations insteadin(schema: <U extends AnnotationSchema>) : AnnotationCollection<U>
returns a new AnnotationCollection where any unknown annotations in schema
from the original collection are made known, and all other annotations are made unknown.AnnotationSchema
is the parent type of schema definitions. This would take over much of the current role occupied by Document
and its subtypes. Converters would be defined between subtypes of AnnotationSchema rather than subtypes of Document. They would also obviously 'own' the definition of their annotations, and would have a content type string for marking their annotations during de/serialization.
This proposal doesn't give the library any additional power (since a document is just a piece of un-annotated media and an AnnotationCollection, and the media portion doesn't at all complicate the associated schema) but I think it would be helpful for explaining the library and would help generally separate independent concerns within the system
We have quite robust unit testing in atjson, but we have encountered quite a few cases where we want to establish systemic properties of atjson as a set of tools and have found difficulty in doing that in unit testing. A good example of us testing these properties is us testing that our commonmark rendering and parsing is compliant to the commonmark spec.
We'd also like to test additional properties of atjson in a full-mesh approach where we can test rigorousness of source libraries. This may be best to do property-based testing as suggested by @colinarobinson to make the testing sufficiently generic while having full coverage.
Attendees: @blaine / @balaclark / @gmedina / @tim-evans
Basic idea is to follow an open source model; submit a ticket / issue to Github, it's discussed and then we implement!
How do we manage versioning of the content format (change over time)?
props.caption
could be used directly in react instead of having to render them again)One lesson I learned in decades of working with data: without specification and validation, eventually data will not conform to an expected data format anymore. If atjson format is going to be used beyond its current software, it needs a formal specification.
I've started a JSON Schema in my specification branch. The format is quite simple so the schema is not complex. One option question I stumbled upon is whether the end
position is optional (I don't think so)?
To validate atjson documents against the schema I am looking for actual examples and use cases. Validation should best be added to unit tests at least.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.