Comments (12)
(We'll try to give more frequent updates going forward. Something was broken with my GitHub notifications, so I just noticed your question today.)
Status: I'm currently working on the initial architecture and prototype for the parser library. Once we have a working prototype, it will be in a good state to accept pull requests / feedback from the community. Until then it's a little hard to collaborate unless someone is in the Seattle area and wants to meet up.
I was making pretty good progress until I ran into some questions about how to model the AST, specifically handling of mutiline tokens whose "lines" are embedded inside a doc comment, for example:
/**
* `abc
* def`
*/
(Is abc def
a single lexical token? Is the *
part of that token?)
We met with the TypeScript compiler owners, and they gave a bunch of really helpful feedback. They also suggested that we build the tsdoc NPM package as an extension of the compiler's AST, rather than having it act as a standalone parser. It would still be its own library, but the input would be a TypeScript compiler data structure instead of a plain text string (as in the current approach).
Integrating with the compiler has some benefits:
- More involvement and help from the compiler team
- Leverage a sophisticated existing framework that includes a lot of helpful utilities
- Simpler roadmap for integrating TSDoc support into the compiler and VS Code
But there are also drawbacks:
- A bigger learning curve to get off the ground
- The tsdoc package would become coupled to a specific release of the compiler
- If we need to make patches to the compiler to get our problems solved, we'd have to depend on an unstable release of the compiler until our patches finally reached production
- Any changes we need to make in the compiler would demand a high quality bar, and force us to consider requirements that aren't immediately relevant to a documentation tool (e.g. IntelliSense)
It's a tough call. My coding progress stalled while I researched this decision, and then recently I got sidetracked by some other priorities. TSDoc is now my primary focus again for the rest of June, but then in July I will be away on vacation for a few weeks... so the short answer is that it's moving along, but probably won't take off until later this summer.
I will say that we now feel reasonably confident about the initial language spec for TSDoc, and also the overall parsing strategy.
I'd love to get involved in any way I can.
The easiest way to contribute right now is to open/answer GitHub issues that help us flesh out the requirements and syntax for the TSDoc notation. There are still plenty of interesting edge cases and cool feature ideas. If you maintain a documentation tool, you can also compare/contrast your syntax and features against what we're proposing for TSDoc and identify any gaps.
from tsdoc.
I have added a Where are we on the roadmap? section to the root-level README file, which people can use to track our progress until we get a proper newsfeed set up.
Here's what I posted there as of today:
Already completed:
- Write up all the interesting design questions as "RFC" GitHub issues to collect community feedback
- Arrive at an initial consensus on the basic approach and strategy for TSDoc
- Develop an initial feature-complete prototype of the @microsoft/tsdoc library and publish the NPM package
- Convert Microsoft's API Extractor tool to use @microsoft/tsdoc (replacing its proprietary AEDoc engine); this demonstrates that TSDoc can meet the needs of a large production documentation web site
What's next:
- Write up an initial draft of the TSDoc spec document, which outlines the proposed standard
- Collect community feedback and integrate it into the draft, then publish the first "official" 1.0 spec
- Review the @microsoft/tsdoc API with various integrators, including TypeScript and VS Code
- Publish the first "1.0.0" stable release of the @microsoft/tsdoc package
- Help onboard various partners
As such, I'm going to close this GitHub issue.
BTW I'm also excited to announce that we made a cool little TSDoc Playground with an interactive demo of the parser. (Big thanks to @iclanton and @KevinTCoughlin for their work on this!) Enjoy! :-P
from tsdoc.
Hi folks.. I'm just going jump in here as I've been following tsdoc
since the initial announcement. I've been on a long sabbatical, but am releasing a major overhaul of ESDoc aptly named TJSDoc as I'm supporting JSDoc and Typescript transparently or at least that is the goal. As things go the ESDoc maintainer is hostile to outside contributions hence a major fork. A considerable amount of work (8 months full time) has already commenced in '17 and I drastically overhauled the ESDoc infrastructure for the better reducing all technical debt and adding many new features. I took a break / sabbatical before finishing things off and public release and will be back on things in a month or so.
Anyway tsdoc
is an important project that really fills a hole in well formatted evaluation of doc comments / text only
. IMHO tsdoc
should take a text string (comment node) and parse it generating an AST; nothing else including no other dependencies such as markdown library, etc. It would still be its own library, but the input would be a TypeScript compiler data structure instead of a plain text string (as in the current approach)
- this will be no use to me and will likely limit adoption from a wider set of documentation tooling efforts. TJSDoc and Typescript support is based on Babylon 7. I'll gladly contribute to tsdoc
if things progress from a text parsing direction. My general plan is to create a "sister" project to tsdoc
and parse JSDoc text comments and generate an AST as well.
So what I see as most beneficial and generalized is tsdoc
accepting text / comment nodes which results in an AST for further parsing by the documentation tooling at hand. No outside dependencies.
from tsdoc.
So what I see as most beneficial and generalized is tsdoc accepting text / comment nodes which results in an AST for further parsing by the documentation tooling at hand. No outside dependencies.
Thank you for this feedback! I find it highly persuasive. We were unsure whether anyone would want to use the tsdoc library without first invoking the TypeScript compiler engine. Certainly you shouldn't be forced to write your own parser just because your tool isn't based on the TypeScript engine.
BTW if you plan to be contributing, feel free to create an PR to add your project to the "Who's involved?" list.
from tsdoc.
@pgonzal It has definitely been a pleasure following the progression thus far with the sussing out of potential features. I'd say in an ideal world the result of tsdoc
and any accompanying JSDoc related module would be a well defined and -shared- AST for both angles. Given that this is a major pain point in my efforts I'd be glad to get to work in earnest on tsdoc
and any potential JSDoc related module that outputs a common AST.
I must say that for expediency in the case of creating a JSDoc module I'd likely fork tsdoc
, tweak the regex processing as necessary which outputs the AST + type information from comments, etc. I guess the bonus is that I'd finally have to embrace TS myself to implement said module.. 😄
from tsdoc.
Another way to look at this is, the best projects I've participated in often took the form of a 2.0 reboot of a working 1.0 implementation. Only with the hindsight of having actually coded everything, do we learn how to code everything well. So if standalone-tsdoc requires us to later redo the entire parser inside the compiler's code base, perhaps that's a benefit and not a wasted duplication. :-)
from tsdoc.
@pgonzal I'll definitely do a PR when I get back into full swing of things which should be in August. TJSDoc is event based on inter-module communication. Based on the file type comment nodes would be dispatched to tsdoc
for .ts
files and for Javascript files a forthcoming JSDoc module which also generates a compatible AST for comment nodes which among other things should support type information in comments, etc. This would address #23 (hah you CCed me there too!) IE the AST nodes from Babylon 7 will provide the type information for TS files.
I bit my tongue w/ other discussion regarding potential markdown dependencies for tsdoc
in other issues, but definitely had to jump in w/ your response above regarding dependence on the Typescript compiler engine. IMHO regarding markdown the documentation tooling should decides on a markdown module and simply parse the AST of tsdoc AST markdown nodes
with the markdown module of choice. I was pleasantly happy to see Typescript / AST parsing in Babylon 7 emerge as possible during my sabbatical. I originally created things w/ TJSDoc to potentially have to support the Typescript compiler engine, but if I can pull things off for JS / TS using Babylon 7 transparently that is an ideal scenario.
Nonetheless a well defined Typescript comment format and module to generate an accompanying AST for consumption by documentation tooling is highly needed. This is one of the sore spots in ESDoc and certainly latent sore spot in TJSDoc (and I assume a lot of other documentation tooling!) as currently implementation is just a bunch of adhoc regex processing instead of separate modules that provide a well defined AST for processing comments.
As much as it is painful to accept regex processing as the way forward it is a generalized approach. It's certainly hard to get right, but worth the effort as if there is a standard module implementing things for Typescript comments and a separate module for JSDoc that generates a compatible AST for documentation tools to consume then all tooling can offload this aspect to common efforts, etc.
from tsdoc.
IMHO regarding markdown the documentation tooling should decides on a markdown module and simply parse the AST of tsdoc AST markdown nodes with the markdown module of choice
So far we've come to the same opinion, see #12 (comment) . Thanks again for sharing your use case, very informative!
from tsdoc.
Status update:
Today we published the first release of the @microsoft/tsdoc NPM package! . This is “alpha quality” and still pretty rough around the edges, but the major approach and model are worked out. The project is now (finally!) in a state where people can give useful feedback on the implementation and API design. I’ve put together a small demo project that illustrates the basic usage. If you get a chance, please try it out and open GitHub issues with your feedback/ideas.
In the coming weeks we’ll be working on updating API Extractor to use this library for parsing doc comments, and fixing lots of bugs and feature gaps along the way.
Thanks again to everyone who’s been contributing ideas and input!
from tsdoc.
And here's a quick summary of the major ideas:
-
We start with a TextRange that is similar to
ts.TextRange
-
Then the LineExtractor finds the text ranges for the lines within the comment.
-
Next the Tokenizer breaks these lines into primitive Token objects, that are really just symbol characters, newlines, whitespace, and blobs of text
-
The NodeParser uses a TokenReader to build an AST of DocNode subclasses.
-
The
DocNode
tree has two roles: For scenarios where you need to generate documentation comments and e.g. emit them into *.d.ts files, theDocNode
acts like a high-level DOM API for building up a conceptual tree and transforming it. But it also can provide a detailed grammatical analysis of a parsed input (e.g. tell me the coordinates of the=
for an HTML attribute). This is accomplished by including a bunch of DocParticle nodes in the tree. A visitor sees them when usingDocNode.getChildNodes()
, but otherwise the particles are invisible for the everyday API interactions. You never need to create them explicitly. -
Each
DocNode
has an optional associated Excerpt which tracks the corresponding source file coordinates for a parsed input. TheDocNode.excerpt
will be undefined for manually constructed nodes, and for abstract nodes (e.g. DocSection) which don't correspond to any input tokens. -
The
Excerpt
class trackscontent
and an optionalspacingAfterContent
(similar to TypeScript compiler trivia). These aren'tTextRange
orToken
objects, but instead represented as TokenSequence objects. TheTokenSequence
allows very precise highlighting, for example aTokenSequence
might correspond to"Hello,
andworld!"
in this example, but NOT the newline or*
in between:/** * <data-item description="Hello, * world!" /> */
-
All of these aspects come back in the ParserContext returned by the TSDocParser main API.
-
The root of the AST is a DocComment object that has everything nicely rolled up into summary/remarks/parameters/returns and a ModifierTagSet for checking for the presence of modifier tags such as
@readonly
. -
Error messages come in two variants: General errors are reported via
ParserContext.log.messages
. But if specific tokens cannot not be parsed as expected (e.g.<badTag abc=" />
), then the first misinterpreted characters (e.g.<
) will get represented as a DocErrorText node (i.e. treated as literal text) and parsing will resume with the next token. A log message will also be generated. (This is the "infinite lookahead" aspect of a Markdown parser.) -
In this initial prototype, the parsing is very strict. But we've included an optional TSDocParserConfiguration that (1) provides a place to define custom tags, and (2) in the future will provide a rich set of options and switches to flexibly handle degenerate inputs. Later on, we'll open up the
NodeParser
to support custom syntaxes via plug-in rules. (I'm eager to do this, but the engine will need to be relatively mature if we want a stable API.)
If you want to see some examples of all these pieces together, the Jest snapshots are somewhat informative:
- For high-level
DocComment
anatomy, you might take a look at: ParsingBasics.test.ts.snap - For lower-level node parsing: NodeParserTags.test.ts.snap
- For the tokenizer: Tokenizer.test.ts.snap
- For the line extractor: LineExtractor.test.ts.snap
from tsdoc.
@pgonzal How is memory efficiency and speed looking? Those are my two big concerns with a more "heavyweight" / precise solution like tsdoc
regarding potential integration with a documentation pipeline which can consume a lot of memory to begin with for large projects. IE it seems your / the projects main impetus is thoroughness / accuracy in parsing at expense to???
I've done a ton of optimization on both fronts over ESDoc for my forthcoming documentation effort, but it can still be a beast on very large projects. Perhaps tests can be added to stress tsdoc
for speed and memory consumption over successive executions?
from tsdoc.
Hey @typhonrt
Performance is certainly an important goal for this project, however we have not yet invested in measurements or optimization work in that area. The first priority is still to get the main usage scenarios to be feature complete and running. (If someone else wanted to set up some performance tests, that would be very appreciated!)
That said, the current architecture is designed with performance in mind. Like the TypeScript compiler, the core parser works primarily with integers (i.e. indexes into an array of tokens, or indexes into an array of characters) instead of allocating and comparing text strings. Here I am referring to the TextRange
, Token
, TokenSequence
, and Excerpt
classes. By contrast the DocNode
tree does allocate strings, because it is intended to support a builder scenario as well as parsing (i.e. DocNode
--> comment text, instead of comment text --> DocNode
). But it would be straightforward to optimize this for the parser-only scenario.
The parser's algorithm time complexity should be analogous to a CommonMark parser: It sometimes performs infinite lookahead, e.g. parsing "<tag1 abc="<tag2 abc="<tag3 />
" may require scanning to the end of the input two times before finally finding a well-formed <tag3 />
. But generally the grammar rules and human behavior tend to make those cases rare or even exotic. In the current approach, TSDoc does not support CommonMark nesting blocks at all, so it completely avoids the problem of walking up and down a list of scopes to find the right place to insert a node. (This isn't just an unimplemented feature: When I write up the spec, I will try to argue persuasively that these CommonMark features are counterproductive for a documentation comment, and can be reasonably avoided while still ensuring that TSDoc constructs will still generally be handled correctly by CommonMark implementations. Some initial ideas about that were posted in #29 .)
The last aspect that comes to mind from a performance standpoint is circular references in the DocNode
API. For example, ParserContext.docComment
has child nodes with DocNode.excerpt
properties that point back to the ParserContext
. This means that any reference held to any DocNode
object will keep the entire graph alive, which could lead to memory leaks. There are ways to break these loops, but I haven't had time to think about it yet in depth. However I believe it can be solved without significantly altering the current API design.
Hope that helps!
from tsdoc.
Related Issues (20)
- Validate invalid params through tsdoc eslint plugin HOT 1
- This repo is missing important files
- Documentation coverage? HOT 1
- JSDoc/TSDoc `@inherit[dD]oc` mismatch HOT 1
- Is it dead? HOT 4
- VS Code unable to load Schema
- tsdoc-unnecessary-backslash false positive
- Weird warnings about multiple declarations HOT 1
- Grouping classes under the same index name
- Support for defining custom tag syntax HOT 1
- tsdoc-html-tag-missing-equals erroneously flagging boolean attributes
- tsdoc-malformed-html-name erroneously flagging use of less than symbol HOT 1
- Unable to load schema from 'https://json.schemastore.org/tsconfig HOT 1
- `tsdoc-reference-unquoted-identifier` is incorrectly reported for certain identifiers
- [ Question ][ Playground ] Intended Behaviour? HOT 2
- eslint-plugin-tsdoc: Missing rule to check for absence of a TSDoc comment HOT 1
- ESLint flat configurations? HOT 1
- "extends": "@rushstack/heft-node-rig/profiles/default/config/jest.config.json"
- Can inherit doc also copy the TSDoc when I hover?
- Support tables in TSDoc comments HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tsdoc.