GithubHelp home page GithubHelp logo

link-parse's People

Contributors

mahanama94 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

link-parse's Issues

Parsing link formatted TimeMaps almost works with RegexLinkParser, but not quite

One of the largest problems we have is parsing link-formatted TimeMaps. The RegexLinkParser gets farther than any other implementation I've seen, besides the solution I implemented in AIU, and even AIU has issues with the occasional TimeMap.

It parses a TimeMap into the appropriate relationships, but with the following unexpected behaviors:

  • it adds a newline and extra space to the end of all but the first URI - a user can just strip this off, but they will not expect this
  • it provides a title key in the resulting dict -- I realize that this is part of RFC5988, but link-parse does not produce empty entries for anchor, rev, or other link parameters listed in that RFC so this is confusing

Here's what I did with RegexLinkParser:

  1. start ipython
  2. import the RegexLinkParser
  3. copy the TimeMap from Figure 28 of RFC 7089
  4. paste the TimeMap into a variable named timemap
  5. parse it with RegexLinkParser
  6. print the results as shown in the Usage section of the link-parse README

Here's the code in ipython:

In [1]: from linkparse.regex_parser import RegexLinkParser

In [2]: timemap = """  <http://a.example.org>;rel="original",
   ...:     <http://arxiv.example.net/timemap/http://a.example.org>
   ...:       ; rel="self";type="application/link-format"
   ...:       ; from="Tue, 20 Jun 2000 18:02:59 GMT"
   ...:       ; until="Wed, 09 Apr 2008 20:30:51 GMT",
   ...:     <http://arxiv.example.net/timegate/http://a.example.org>
   ...:       ; rel="timegate",
   ...:     <http://arxiv.example.net/web/20000620180259/http://a.example.org>
   ...:       ; rel="first memento";datetime="Tue, 20 Jun 2000 18:02:59 GMT"
   ...:       ; license="http://creativecommons.org/publicdomain/zero/1.0/",
   ...:     <http://arxiv.example.net/web/20091027204954/http://a.example.org>
   ...:        ; rel="last memento";datetime="Tue, 27 Oct 2009 20:49:54 GMT"
   ...:        ; license="http://creativecommons.org/publicdomain/zero/1.0/",
   ...:     <http://arxiv.example.net/web/20000621011731/http://a.example.org>
   ...:       ; rel="memento";datetime="Wed, 21 Jun 2000 01:17:31 GMT"
   ...:       ; license="http://creativecommons.org/publicdomain/zero/1.0/",
   ...:     <http://arxiv.example.net/web/20000621044156/http://a.example.org>
   ...:       ; rel="memento";datetime="Wed, 21 Jun 2000 04:41:56 GMT"
   ...:       ; license="http://creativecommons.org/publicdomain/zero/1.0/",
   ...:       """

In [3]: parser = RegexLinkParser()

In [4]: parser_results = parser.parse(timemap)

In [5]: from pprint import pprint

In [6]: for result in parser_results:
   ...:     pprint(result.__dict__)
   ...:
{'datetime': '',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'original',
 'title': '',
 'uri': 'http://a.example.org'}
{'datetime': '',
 'link_from': 'Tue, 20 Jun 2000 18:02:59 GMT',
 'link_type': 'application/link-format',
 'link_until': 'Wed, 09 Apr 2008 20:30:51 GMT',
 'relationship': 'self',
 'title': '',
 'uri': 'http://arxiv.example.net/timemap/http://a.example.org\n      '}
{'datetime': '',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'timegate',
 'title': '',
 'uri': 'http://arxiv.example.net/timegate/http://a.example.org\n      '}
{'datetime': 'Tue, 20 Jun 2000 18:02:59 GMT',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'first memento',
 'title': '',
 'uri': 'http://arxiv.example.net/web/20000620180259/http://a.example.org\n'
        '      '}
{'datetime': 'Tue, 27 Oct 2009 20:49:54 GMT',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'last memento',
 'title': '',
 'uri': 'http://arxiv.example.net/web/20091027204954/http://a.example.org\n'
        '       '}
{'datetime': 'Wed, 21 Jun 2000 01:17:31 GMT',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'memento',
 'title': '',
 'uri': 'http://arxiv.example.net/web/20000621011731/http://a.example.org\n'
        '      '}
{'datetime': 'Wed, 21 Jun 2000 04:41:56 GMT',
 'link_from': '',
 'link_type': '',
 'link_until': '',
 'relationship': 'memento',
 'title': '',
 'uri': 'http://arxiv.example.net/web/20000621044156/http://a.example.org\n'
        '      '}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.