GithubHelp home page GithubHelp logo

olinkcheck's People

Contributors

cuihtlauac avatar sabine avatar shreyas-21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

olinkcheck's Issues

Nested parsers

Some files have nested formats, eg: markdown in yaml, html in xml, etc. Nested parsers should allow us to handle these, given a function which identifies the text in the "inner" format.

Use at least omd 2.0.0

You probably didn't even noticed opam installed an obsolete version of omd. The re's not Omd_representation module in the new version.

Update README

Document all the functionality and command line options in the README.

Chained parsers

Most md files in ocaml.org have a YAML header followed by either markdown or HTML. A chained parser should allow us to handle two different formats in the same file given a regex to separate them.

Async requests

Making all the requests in sequence is way too slow. This needs to be parallelized.

PR in ocamlorg

  • Add dependency (as a pin, not an opam repo thing) from ocamlorg to olinkcheck
  • Add a makefile target to launch the tests

Use the AST-based API to annotate broken links

Currently, there is a method annotate_in_str for all the formats which takes in a string and parses it to extract the links. However, when we want to replace the broken links annotated with their status back in the source (for eg: [link](http://www.google.com/does-not-exist) to [link](http://www.google.com/does-not-exist - [404 Not Found]), it is done directly in the source string using regular expression matching.
Ideally, it could be done by replacing it in the AST, and transforming the AST back to a string, but this poses some challenges:

  • Not all parsers may support converting their AST back to a string
  • Even if they do, they may discard whitespace characters which do not change the semantics. For example, in a markdown file
# Heading 1




# Heading 2

is the same as

# Heading 1
# Heading 2

for most practical purposes. But one of the use cases for us it to create a PR with the broken links annotated, and the parser discarding characters will generate a noisy diff with whitespace changes.

Ideally, every parser we use should give us some location information so that we could write a to_string preserving the original structure, but this is currently not the case.

Refactor code

Tasks

Output formats

Several output formats should be considered.

  • Text input -> list of (faulty links * position)
  • Structured input -> list of faulty links * function of type : int-> link -> structured data

Here is how it could look like:

  let broken_links, f = Olinkcheck.parse_and_seach data in
  let fixed_links = List.map (fun link -> ... ) in
  snd (List.fold_left f (0, data) fixed_links)

The invariant of such an API would be

snd (List.fold_left f (0, data) broken_links) = data
val check_md : md -> (http_status * url) list * (int -> url -> md)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.