GithubHelp home page GithubHelp logo

kcartlidge / markdowner Goto Github PK

View Code? Open in Web Editor NEW
0.0 3.0 0.0 25 KB

(ALPHA) Fast-enough and efficient-enough Markdown parsing in pure C# for .Net Core and .Net Framework.

License: MIT License

C# 100.00%

markdowner's Introduction

Markdowner

Fast-enough and efficient-enough Markdown parsing in pure C# for .Net Core and .Net Framework.

This is written as a personal challenge. I've done Markdown parsers before and have used parser generators. I've even written my own equivalents to yacc, bison, etc (not just the grammars).

This was different, a challenge to do something by hand without using techniques such as recursive descent or compiler compilers.

It is not feature complete, but is perfectly fine for many simpler cases, supporting:

  • Inline (nested) formatting
    • Bold, Italics, Underline, Code
  • Headers from 1 to 6
  • Paragraphs
    • Adjacent lines are merged into single paragraphs
  • Ordered lists
  • Unordered lists
  • Preformatted blocks
  • Block quotes
  • Horizontal lines
  • Tracking line numbers
    • Line number from the original Markdown
    • Line number in the parsed lines
  • Unit tests

Sample usage

Check out the example project in the Example folder.

Progress

It's at an ALPHA stage and is a working proof of concept.

Missing:

  • Packaging for Nuget
  • Nested lists
  • Tables
  • Links
  • Images

Methodology

There is a 2-step process

  1. Convert the input text into a MarkdownDocument containing a collection of lines.
    • Each line is assigned a type, such as Paragraph or UnorderedList.
    • Each line has a Token collection holding all it's text and/or formatting.
  2. Generate an output from that document.
    • An HTML output generator is included.

The caller does not need to know it is a 2-step process; they just feed the output generator the original source. The reason for splitting into 2 steps is that parsing into lines and tokens is the computationally expensive bit so doing that separately (a) provides a form of advance compilation and (b) means the document can be fed through multiple output generators without the overhead of parsing again each time.

Parsing follows a simple flow

  • Ignore leading empty lines
  • Treat runs of empty lines as one single line
  • Derive the line types from the line start characters
  • Merge consecutive paragraphs (where not separated by empty lines)
  • Generate a set of tokens to represent the line content
  • Add line-level flags for start/stop markers in quote/pre blocks

markdowner's People

Contributors

kcartlidge avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.