GithubHelp home page GithubHelp logo

microformats / microformats-ruby Goto Github PK

View Code? Open in Web Editor NEW
99.0 15.0 29.0 502 KB

Ruby gem that parse HTML containing microformats/microformats2 and returns Ruby objects, a Ruby hash or a JSON hash

Home Page: https://rubygems.org/gems/microformats

License: Creative Commons Zero v1.0 Universal

Ruby 75.94% JavaScript 9.86% HTML 14.21%
indieweb microformat ruby parsing rubygems

microformats-ruby's People

Contributors

adactio avatar barnabywalters avatar brimil01 avatar calebhearth avatar ckruse avatar cweiske avatar dissolve avatar dpetersen avatar hugopeixoto avatar jeena avatar jessicard avatar jgarber623 avatar jlsuttles avatar martymcguire avatar olleolleolle avatar quady avatar shleeable avatar terreii avatar tommorris avatar veganstraightedge avatar ykzts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microformats-ruby's Issues

Add deprecation warnings

Add deprecation warnings and forward compatible versions of functions so users can upgrade their function calls early.

Implement nested microformat without associated property

http://microformats.org/wiki/microformats-2#h-card_org_h-card

a microformat that are not attached to a specific property:

<div class="h-card">
  <a class="p-name u-url"
     href="http://blog.lizardwrangler.com/" 
    >Mitchell Baker</a> 
  (<a class="h-card" 
      href="http://mozilla.org/"
     >Mozilla Foundation</a>)
</div>
{
  "items": [{ 
    "type": ["h-card"],
    "properties": {
      "name": ["Mitchell Baker"],
      "url": ["http://blog.lizardwrangler.com/"]
    },
    "children": [{
      "type": ["h-card"],
      "properties": {
        "name": ["Mozilla Foundation"],
        "url": ["http://mozilla.org/"]
      }  
    }]
  }]
}

Add spec for "p-foo h-foo" classes on same element

I don't think this case works right now.

<!DOCTYPE html>
<html>
<body>
  <div class="h-card">
    <div class="p-adr h-adr">
      <span class="p-street-address">
        123 Rainbow Lane
      </span>
      </a>
    </div>
  </div>
</body>
</html>

collection.entry.author.name.to_s gives a NoMethodError: undefined method `name' error

I'm not sure how to use it, it seems to have all the data parsed, because I can to .to_json and it looks right but I can't get the data out. I am trying to get the author name, his/her url, the content and the h-entry url, so nothing fancy.

I am trying to do it with: https://brid-gy.appspot.com/like/twitter/jeena/424554756917702656/109427493

I know it should be possible because this looks exactly like I want to have it: http://indiewebify.me/validate-h-entry/?url=https%3A%2F%2Fbrid-gy.appspot.com%2Flike%2Ftwitter%2Fjeena%2F424554756917702656%2F109427493

collection.entry.content.to_s works fine, I am getting "favorited this." and so does collection.entry.url.to_s. But when I try to get the author, then I get R Ç tag:twitter.com,2013:schnarfedwhich is understandable, it is the u-id, but then I can't get the collection.entry.author.name.to_s because I get:

NoMethodError: undefined method `name' for #<Microformats2::Property::Text:0x007fbbfc88ac60>

same with collection.entry.author.url.to_s. So Either I don't understand how this should be called or something is not working. Could someone help me out here?

Don't add time precision unless authored

HTML:

<div class="h-entry">
  <time class="dt-published" value="2018-01-12">January 12</time>
</div>

Expected result:

{
  "items": [
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "published": [
          "2018-01-12"
        ],
        "name": [
          "January 12"
        ]
      }
    }
  ],
  "rels": {
  },
  "rel-urls": {
  }
}

Current result:

{
  "items": [
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "published": [
          "2018-01-12 00:00:00"
        ],
        "name": [
          "January 12"
        ]
      }
    }
  ],
  "rels": {
  },
  "rel-urls": {
  }
}

http://microformats.org/wiki/value-class-pattern#Date_and_time_parsing
http://microformats.org/wiki/value-class-pattern##If+by+parsing+the+%22value%22+element(s)

Change gem name to microformats

  • write 3.1 post install note that future releases will be on the microformats gem
  • include explicit install / uninstall instructions:
The name of the Microformats Ruby Parser is changing from "microformats2" to "microformats". This is a one time change. (Thanks to @chrisjpowers for transferring the namespace to us.)

Follow these instructions to migrate.

1. Install the new gem. Uninstall the old gem.

    gem install microformats
    gem uninstall microformats2


2. Change any Gemfiles from:

    gem "microformats2"

to

    gem "microformats"
    bundle


3. Change any requires from:

    require "microformats2"

to 

    require "microformats"
  • release 3.1
  • change the release target from microformats2 to microformats
  • release 4.0 of microformats (exact same code base as 3.1, except for post install note)

Supported Ruby versions?

Based on some of the conversation in #84

Which versions of Ruby should we continue supporting?

Some data:

  • Ruby 2.2.10 and 2.3.7 are both in security maintenance phase and will EOL soon (according to ruby-lang.org.
  • Ruby 2.4.4 and 2.5.1 are listed as stable releases.
  • Ruby 2.6.0 is in preview with (I believe) a target release date around the end of December 2018.

Some related questions:

  • For folks using microformats-ruby, what versions of Ruby are they using in their relying projects?
  • Who, if anyone, would be left behind if we restricted microformats-ruby to more recent Ruby versions (e.g. >2.4)?
  • Which version should we develop against? (apart from the versions we test against with Travis CI)

Return only fragment of page

Had the idea of parsing a page and pulling out only a specific comment and how would that work (assuming it isn't posted from somewhere else). The idea would be to give a URL that has a fragment and the result items would contain anything from that id and below.

Would have to look at how this would work exactly, would likely need the whole page for rels and base and such.

Discussion: microformats-ruby logo

As part of #77, I originally introduced a logo for microformats-ruby. It's a derivation of the official logo mark (found on the Spread Microformats wiki page):

The Idea

Conceptually, this design marries the microformats logo mark with a color scheme similar to that found throughout the Ruby community. Basically: everything's red.

In the course of discussion on #77 (and in IRC/Slack), some licensing issues were brought up. Turns out those aren't an issue as the original author of the logo mark has dedicated it to the public domain, pursuant to the microformats wiki's licensing.

For Discussion

I'm opening this issue to track conversation around the logo:

  1. Do we add it to README.md as I initially did?
  2. Do we create a GitHub Pages site for the gem and use it there?
  3. Do we abandon this entirely? (totally fine with me, of course!)

Problems with the parser

I am having problems with what I get from the parser, it looks like there is some kind of a bug or something. I am parsing this website: http://snarfed.org/2014-02-03_re-barnaby-walters-latest-php-library-extracted-from-taproo and when you use the PHP parser mf2 it seems to parse the website like expected: http://pin13.net/mf2/?url=http://snarfed.org/2014-02-03_re-barnaby-walters-latest-php-library-extracted-from-taproo but with G5/microformats2 I get this JSON: https://jeena.net/t/mf2.json

You can see that h-entry here has two URLs:

    "url": [
      "http://snarfed.org/",
      "http://snarfed.org/2014-02-03_re-barnaby-walters-latest-php-library-extracted-from-taproo"
    ]

even if the first URL shouldn't be there because it belongs either to the h-card at the top or to to the author h-card in the middle.

Parse href and title attributes on <link> elements for u- and p- properties respectively.

The microformats parsing spec was updated yesterday to reflect a 2015 resolution on <link> elements. The Ruby parser should be updated to reflect this.

Example

An HTML document that uses an existing rel-canonical <link> element to communicate the page’s u-url and p-name:

<!doctype html>
<html class="h-entry">
  <head>
    <link rel="canonical" class="u-url p-name" href="https://example.com/" title="Example.com homepage">
  </head>
  <body></body>
</html>

Expected parser output (rels and rel-urls have been left out):

{
  "items": [
    {
      "type": [ "h-entry" ],
      "properties": {
        "name": [ "Example.com homepage" ], 
        "url": [ "https://example.com/" ]
      }
    }
  ]
}

Live example

CI doesn't accurately test everything

version 4.0.8 was able to be released and passed all CI tests, but it failed when testing on a different system as 'require "set"' was not in the code base... why do all tests pass despite this??

No author URL in 4.0.9 (Breaking change)

undefined method `url' for #<Microformats::ParserResult:0x00005556e27ddaa0>

when accessing mf2.entry.comment.author.url in 4.0.9. This is a breaking change from 4.0.7 where it worked fine.

blank value not returning blank string

<article class="h-entry"> <data class="p-content" value="">This should not be the value of the content attribute</data></article>

should have content=[""]
its not returning content at all

via @aaronpk

Implied name property for author is empty

The parser does not seem to implement implied properties correctly.

When parsing https://commentpara.de/comment/43.htm it does not fill the author name property with its value:

$ ./tmp/ruby-microformats2-test.rb https://commentpara.de/comment/43.htm
{
  "items": [
    {
      "type": [
        "h-entry"
      ],
      "properties": {
        "author": [
          {
            "value": "Anonymous",
            "type": [
              "h-card"
            ],
            "properties": {
              "photo": [
                "https://commentpara.de/img/anonymous.svg"
              ],
              "name": [
                ""
              ],
              "url": [
                "https://commentpara.de/user/3.htm"
              ]
            }
          }
        ],
        "url": [
          "https://commentpara.de/comment/43.htm"
        ],
        "content": [
          "Digikam is able to auto-tag images with the people that are on the photo. This already exists and is open source."
        ],
        "name": [
          "Digikam is able to auto-tag images with the people that are on the photo. This already exists and is open source."
        ]
      }
    }
  ],
  "rels": {
    "in-reply-to": [
      "https://jeena.net/notes/754"
    ],
    "u-in-reply-to": [
      "https://jeena.net/notes/754"
    ]
  }
}

filter out style tags

not properly removing style tags.

the tag is removed but the content of the style tag gets left in the results

Resolve relative URLs when parsing

When parsing a document, all URLs returned should be absolute URLs based off of the URL that is being parsed.

If running Microformats2.parse with a URL as an argument, just use that as the URL to compute relative URLs from. Otherwise, if parsing an HTML string directly, need an additional parameter to pass the URL into.

Question: Vendor'ed test suite as git submodule?

Now that this gem's test suite is back in sync with microformats/tests, would it be possible to directly include the code in that repo into this project via a git submodule or some mechanism other than copy/paste?

Git submodules aren't without their overhead, but we could mitigate potential problems by updating some documentation with details on how to handle initial project check out, updating the test suite version, etc.

Configure Travis CI / Code Climate coverage reporting

In order to report coverage to Code Climate, someone with access to this project's Code Climate and Travis CI configuration will need to add a CC_TEST_REPORTER_ID environment variable. To do that…

  1. Navigate to the settings page on Code Climate for microformats-ruby (via https://codeclimate.com/github/indieweb/microformats-ruby). The URLs aren't predictable, but it'll look something like https://codeclimate.com/repos/<some-long-string>/edit.
  2. Navigate to the "Test Coverage" page (beneath "Analysis" on the left) and grab the "Test Reporter ID" value from that page.
  3. Over on Travis Ci, navigate to the settings page (URL should be https://travis-ci.org/indieweb/microformats-ruby/settings).
  4. Scroll down to Environment Variables, fill in CC_TEST_REPORTER_ID for the name and paste in the value grabbed from Code Climate in step 2.
  5. Leave "display value in build log" OFF and click "Add."
  6. Cool! Test coverage should be submitted during the next build.

screen shot 2018-01-22 at 10 20 36 am

Improve code clarity

getting a code climate score of 0.7, which is pretty abysmal.
While i don't agree with all of the rules they use, there is a lot of work that can be done to improve the readability of the code base.

command line test tool

It'd be nice if there was a cli tool that I could pass an URL, and it would dump the extracted microformats.

Goal would be to have tool to quickly debug potential parsing problems.

How to install?

When I install the gem like you suggest with gem install microformats2 and then run the example code from this page (I have it in test.rb) then I am just getting an error. I see that it installs version 1.0.2 instead of 2.0.0, so how can I install version 2.x?

➜ jeena@Lala gem install microformats2
Successfully installed microformats2-1.0.2
Parsing documentation for microformats2-1.0.2
Done installing documentation for microformats2 after 0 seconds
1 gem installed
➜ jeena@Lala ruby test.rb
test.rb:6:in `<main>': undefined method `card' for #<Hash:0x007fe66bb2a3a0> (NoMethodError)

Breaking: Microformats.parse(url) explodes

require 'microformats'

doc = Microformats.parse("http://tantek.com")
puts JSON.pretty_generate(doc.to_h)

This worked before. Something changed / broke.

TypeError: no implicit conversion of nil into String

method value=	in parser.rb at line 35
method block in parse	in parser.rb at line 35
method traverse	in node.rb at line 585
method block in traverse	in node.rb at line 584
method block in each	in node_set.rb at line 187
method upto	in node_set.rb at line 186
method each	in node_set.rb at line 186
method traverse	in node.rb at line 584
method block in traverse	in node.rb at line 584
method block in each	in node_set.rb at line 187
method upto	in node_set.rb at line 186
method each	in node_set.rb at line 186
method traverse	in node.rb at line 584
method block in traverse	in node.rb at line 584
method block in each	in node_set.rb at line 187
method upto	in node_set.rb at line 186
method each	in node_set.rb at line 186
method traverse	in node.rb at line 584
method block in traverse	in node.rb at line 584
method block in each	in node_set.rb at line 187
method upto	in node_set.rb at line 186
method each	in node_set.rb at line 186
method traverse	in node.rb at line 584
method block in traverse	in node.rb at line 584
method block in each	in node_set.rb at line 187
method upto	in node_set.rb at line 186
method each	in node_set.rb at line 186
method traverse	in node.rb at line 584
method parse	in parser.rb at line 28
method parse	in microformats.rb at line 19
method <main>	in mf.rb at line 3

value-class-pattern parsing does not handle timezone in separate element

(See also microformats/php-mf2#126 for the same bug in php-mf2)

The following snippet has date, time and timezone offset in three different elements, using the value-class-pattern.

<div class="h-event">
 <span class="e-summary">HomebrewWebsiteClub Berlin</span> will be next on 
 <span class="dt-start">
  <span class="value">2017-05-31</span>, from
  <span class="value">19:00</span> (UTC<span class="value">+02:00</span>)
</span> to  <span class="dt-end">21:00</span>.</div>

I would have expected a value of 2017-05-31 19:00+02:00 for start, per http://microformats.org/wiki/value-class-pattern#Date_and_time_parsing

[…] the parser assembles the overall datetime value by concatenating the specific date, " " (space character) and specific time (if time was specified, with 00 minutes implied if no minutes are provided), and specific timezone (if timezone and a specific time was specified)

Instead the timezone is dropped:

"start": [ "2017-05-31 19:00" ]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.