GithubHelp home page GithubHelp logo

olinkcheck's Issues

Use the AST-based API to annotate broken links

Currently, there is a method annotate_in_str for all the formats which takes in a string and parses it to extract the links. However, when we want to replace the broken links annotated with their status back in the source (for eg: [link](http://www.google.com/does-not-exist) to [link](http://www.google.com/does-not-exist - [404 Not Found]), it is done directly in the source string using regular expression matching.
Ideally, it could be done by replacing it in the AST, and transforming the AST back to a string, but this poses some challenges:

  • Not all parsers may support converting their AST back to a string
  • Even if they do, they may discard whitespace characters which do not change the semantics. For example, in a markdown file
# Heading 1




# Heading 2

is the same as

# Heading 1
# Heading 2

for most practical purposes. But one of the use cases for us it to create a PR with the broken links annotated, and the parser discarding characters will generate a noisy diff with whitespace changes.

Ideally, every parser we use should give us some location information so that we could write a to_string preserving the original structure, but this is currently not the case.

PR in ocamlorg

  • Add dependency (as a pin, not an opam repo thing) from ocamlorg to olinkcheck
  • Add a makefile target to launch the tests

Chained parsers

Most md files in ocaml.org have a YAML header followed by either markdown or HTML. A chained parser should allow us to handle two different formats in the same file given a regex to separate them.

Refactor code

Tasks

Async requests

Making all the requests in sequence is way too slow. This needs to be parallelized.

Update README

Document all the functionality and command line options in the README.

Nested parsers

Some files have nested formats, eg: markdown in yaml, html in xml, etc. Nested parsers should allow us to handle these, given a function which identifies the text in the "inner" format.

Output formats

Several output formats should be considered.

  • Text input -> list of (faulty links * position)
  • Structured input -> list of faulty links * function of type : int-> link -> structured data

Here is how it could look like:

  let broken_links, f = Olinkcheck.parse_and_seach data in
  let fixed_links = List.map (fun link -> ... ) in
  snd (List.fold_left f (0, data) fixed_links)

The invariant of such an API would be

snd (List.fold_left f (0, data) broken_links) = data
val check_md : md -> (http_status * url) list * (int -> url -> md)

Use at least omd 2.0.0

You probably didn't even noticed opam installed an obsolete version of omd. The re's not Omd_representation module in the new version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.