microformats / php-mf2 Goto Github PK

View Code? Open in Web Editor NEW

181.0 32.0 37.0 1.17 MB

php-mf2 is a pure, generic microformats-2 parser for PHP. It makes HTML as easy to consume as JSON.

License: Creative Commons Zero v1.0 Universal

PHP 87.34% HTML 12.55% Dockerfile 0.11%

microformats indieweb microformats2 php parser

php-mf2's Introduction

php-mf2

php-mf2 is a pure, generic microformats-2 parser. It makes HTML as easy to consume as JSON.

Instead of having a hard-coded list of all the different microformats, it follows a set of procedures to handle different property types (e.g. p- for plaintext, u- for URL, etc). This allows for a very small and maintainable parser.

Installation

There are two ways of installing php-mf2. We highly recommend installing php-mf2 using Composer. The rest of the documentation assumes that you have done so.

To install using Composer, run

composer require mf2/mf2

If you can’t or don’t want to use Composer, then php-mf2 can be installed the old way by downloading /Mf2/Parser.php, adding it to your project and requiring it from files you want to call its functions from, like this:

<?php

require_once 'Mf2/Parser.php';

// Now all the functions documented below are available, for example:
$mf = Mf2\fetch('https://waterpigs.co.uk');

It is recommended to install the HTML5 parser for proper handling of HTML5 elements. Using composer, run

composer require masterminds/html5

If this library is added to your project, the php-mf2 parser will use it automatically instead of the built-in HTML parser.

Signed Code Verification

From v0.2.9, php-mf2’s version tags are signed using GPG, allowing you to cryptographically verify that you’re using code which hasn’t been tampered with. To verify the code you will need the GPG keys for one of the people in the list of code signers:

Barnaby Walters [email protected] 1C00 430B 19C6 B426 922F E534 BEF8 CE58 118A D524
Aaron Parecki [email protected] F384 12A1 55FB 8B15 B7DD 8E07 4225 2B5E 65CE 0ADD
Bear [email protected] 0A93 9BA7 8203 FCBC 58A9 E8B5 9D1E 0661 8EE5 B4D8

To import the relevant keys into your GPG keychain, execute the following command:

gpg --recv-keys 1C00430B19C6B426922FE534BEF8CE58118AD524 F38412A155FB8B15B7DD8E0742252B5E65CE0ADD 0A939BA78203FCBC58A9E8B59D1E06618EE5B4D8

Then verify the installed files like this:

# in your project root
cd vendor/mf2/mf2
git tag -v v0.3.0

If nothing went wrong, you should see the tag commit message, ending something like this:

gpg: Signature made Wed  6 Aug 10:04:20 2014 GMT using RSA key ID 2B2BBB65
gpg: Good signature from "Barnaby Walters <[email protected]>"
gpg:                 aka "[jpeg image of size 12805]"

Possible issues:

Git complains that there’s no such tag: check for a .git file in the source folder; odds are you have the prefer-dist setting enabled and composer is just extracting a zip rather than checking out from git.
Git complains the gpg command doesn’t exist: If you successfully imported my key then you obviously do have gpg installed, but you might have gpg2, whereas git looks for gpg. Solution: tell git which binary to use: git config --global gpg.program 'gpg2'

Usage

php-mf2 is PSR-0 autoloadable, so simply include Composer’s auto-generated autoload file (/vendor/autoload.php) and you can start using it. These two functions cover most situations:

To fetch microformats from a URL, call Mf2\fetch($url)
To parse microformats from HTML, call Mf2\parse($html, $url), where $url is the URL from which $html was loaded, if any. This parameter is required for correct relative URL parsing and must not be left out unless parsing HTML which is not loaded from the web.

All parsing functions return a canonical microformats 2 representation of any microformats found on the page, as an array. For a general guide to safely and successfully processing parsed microformats data, see How to Consume Microformats 2 Data.

Examples

Fetching Microformats from a URL

<?php

namespace YourApp;

require '/vendor/autoload.php';

use Mf2;

// (Above code (or equivalent) assumed in future examples)

$mf = Mf2\fetch('http://microformats.org');

// $mf is either a canonical mf2 array, or null on an error.
if (is_array($mf)) {
  foreach ($mf['items'] as $microformat) {
    // Note: in real code, never assume that a property exists, or that a particular property value is a string!
    echo "A {$microformat['type'][0]} called {$microformat['properties']['name'][0]}\n";
  }
}

Parsing Microformats from a HTML String

Here we demonstrate parsing of microformats2 implied property parsing, where an entire h-card with name and URL properties is created using a single h-card class.

<?php

$html = '<a class="h-card" href="https://waterpigs.co.uk/">Barnaby Walters</a>';
$output = Mf2\parse($html, 'https://waterpigs.co.uk/');

$output is a canonical microformats2 array structure like:

{
  "items": [
    {
      "type": ["h-card"],
      "properties": {
        "name": ["Barnaby Walters"],
        "url": ["https://waterpigs.co.uk/"]
      }
    }
  ],
  "rels": {},
  "rel-urls": {}
}

If no microformats are found, items will be an empty array.

Note that, whilst the property prefixes are stripped, the prefix of the h-* classname(s) in the "type" array are retained.

Parsing a Document with Relative URLs

Most of the time you’ll be getting your input HTML from a URL. You should pass that URL as the second parameter to Mf2\parse() so that any relative URLs in the document can be resolved. For example, say you got the following HTML from http://example.org/post/1:

<div class="h-card">
  <h1 class="p-name">Mr. Example</h1>
  <img class="u-photo" alt="" src="/photo.png" />
</div>

Parsing like this:

$output = Mf2\parse($html, 'http://example.org/post/1');

will result in the following output, with relative URLs made absolute:

{
  "items": [{
    "type": ["h-card"],
    "properties": {
      "name": ["Mr. Example"],
      "photo": [{
        "value": "http://example.org/photo.png",
        "alt": ""
      }]
    }
  }],
  "rels": {},
  "rel-urls": {}
}

php-mf2 correctly handles relative URL resolution according to the URI and HTML specs, including correct use of the <base> element.

Parsing Link `rel` Values

php-mf2 also parses any link relations in the document, placing them into two top-level arrays. For convenience and completeness, one is indexed by each individual rel value, and the other by each URL.

For example, this HTML:

<a rel="me" href="https://twitter.com/barnabywalters">Me on twitter</a>
<link rel="alternate etc" href="http://example.com/notes.atom" />

parses to the following canonical representation:

{
  "items": [],
  "rels": {
    "me": ["https://twitter.com/barnabywalters"],
    "alternate": ["http://example.com/notes.atom"],
    "etc": ["http://example.com/notes.atom"]
  },
  "rel-urls": {
    "https://twitter.com/barnabywalters": {
      "text": "Me on twitter",
      "rels": ["me"]
    },
    "http://example.com/notes.atom": {
      "rels": ["alternate", "etc"]
    }
  }
}

If you’re not bothered about the microformats2 data and just want rels and alternates, you can (very slightly) improve performance by creating a Mf2\Parser object (see below) and calling ->parseRelsAndAlternates() instead of ->parse(), e.g.

<?php

$parser = new Mf2\Parser('<link rel="…');
$relsAndAlternates = $parser->parseRelsAndAlternates();

Debugging Mf2\fetch

Mf2\fetch() will attempt to parse any response served with “HTML” in the content-type, regardless of what the status code is. If it receives a non-HTML response it will return null.

To learn what the HTTP status code for any request was, or learn more about the request, pass a variable name as the third parameter to Mf2\fetch() — this will be filled with the contents of curl_getinfo(), e.g:

<?php

$mf = Mf2\fetch('http://waterpigs.co.uk/this-page-doesnt-exist', true, $curlInfo);
if ($curlInfo['http_code'] == '404') {
  // This page doesn’t exist.
}

If it was HTML then it is still parsed, as there are cases where error pages contain microformats — for example a deleted h-entry resulting in a 410 Gone response containing a stub h-entry with an explanation for the deletion.

Getting more control by creating a Parser object

The Mf2\parse() function covers the most common usage patterns by internally creating an instance of Mf2\Parser and returning the output all in one step. For some advanced usage you can also create an instance of Mf2\Parser yourself.

The constructor takes two arguments, the input HTML (or a DOMDocument) and the URL to use as a base URL. Once you have a parser, there are a few other things you can do:

Selectively Parsing a Document

There are several ways to selectively parse microformats from a document. If you wish to only parse microformats from an element with a particular ID, Parser::parseFromId($id) is the easiest way.

If your needs are more complex, Parser::parse accepts an optional context DOMNode as its second parameter. Typically you’d use Parser::query to run XPath queries on the document to get the element you want to parse from under, then pass it to Parser::parse. Example usage:

$doc = 'More microformats, more microformats <div id="parse-from-here"><span class="h-card">This shows up</span></div> yet more ignored content';
$parser = new Mf2\Parser($doc);

$parser->parseFromId('parse-from-here'); // returns a document with only the h-card descended from div#parse-from-here

$elementIWant = $parser->query('an xpath query')[0];

$parser->parse(true, $elementIWant); // returns a document with only the Microformats under the selected element

Experimental Language Parsing

There is still ongoing brainstorming around how HTML language attributes should be added to the parsed result. In order to use this feature, you will need to set a flag to opt in.

$doc = '<div class="h-entry" lang="sv" id="postfrag123">
  <h1 class="p-name">En svensk titel</h1>
  <div class="e-content" lang="en">With an <em>english</em> summary</div>
  <div class="e-content">Och <em>svensk</em> huvudtext</div>
</div>';
$parser = new Mf2\Parser($doc);
$parser->lang = true;
$result = $parser->parse();

{
  "items": [
    {
      "type": ["h-entry"],
      "properties": {
        "name": ["En svensk titel"],
        "content": [
          {
            "html": "With an <em>english</em> summary",
            "value": "With an english summary",
            "lang": "en"
          },
          {
            "html": "Och <em>svensk</em> huvudtext",
            "value": "Och svensk huvudtext",
            "lang": "sv"
          }
        ]
      },
      "lang": "sv"
    }
  ],
  "rels": {},
  "rel-urls": {}
}

Note that this option is still considered experimental and in development, and the parsed output may change between minor releases.

Generating output for JSON serialization with JSON-mode

Due to a quirk with the way PHP arrays work, there is an edge case (reported by Tom Morris) in which a document with no rel values, when serialised as JSON, results in an empty object as the rels value rather than an empty array. Replacing this in code with a stdClass breaks PHP iteration over the values.

As of version 0.2.6, the default behaviour is back to being PHP-friendly, so if you want to produce results specifically for serialisation as JSON (for example if you run a HTML -> JSON service, or want to run tests against JSON fixtures), enable JSON mode:

// …by passing true as the third constructor:
$jsonParser = new Mf2\Parser($html, $url, true);

Classic Microformats Markup

php-mf2 has some support for parsing classic microformats markup. It’s enabled by default, but can be turned off by calling Mf2\parse($html, $url, false); or $parser->parse(false); if you’re instantiating a parser yourself.

If the built in mappings don’t successfully parse some classic microformats markup, please raise an issue and we’ll fix it.

Security

No filtering of content takes place in mf2\Parser, so treat its output as you would any untrusted data from the source of the parsed document.

Some tips:

All content apart from the 'html' key in dictionaries produced by parsing an e-* property is not HTML-escaped. For example, <span class="p-name"><code></span> will result in "name": ["<code>"]. At the very least, HTML-escape all properties before echoing them out in HTML
If you’re using the raw HTML content under the 'html' key of dictionaries produced by parsing e-* properties, you SHOULD purify the HTML before displaying it to prevent injection of arbitrary code. For PHP we recommend using HTML Purifier

Contributing

Issues and bug reports are very welcome. If you know how to write tests then please do so as code always expresses problems and intent much better than English, and gives me a way of measuring whether or not fixes have actually solved your problem. If you don’t know how to write tests, don’t worry :) Just include as much useful information in the issue as you can.

Pull requests very welcome, please try to maintain stylistic, structural and naming consistency with the existing codebase, and don’t be too upset if we make naming changes :)

How to make a Pull Request

Fork the repo to your github account
Clone a copy to your computer (simply installing php-mf2 using composer only works for using it, not developing it)
Install the dev dependencies with composer install.
Run PHPUnit with ./vendor/bin/phpunit
Add PHPUnit tests for your changes, either in an existing test file if suitable, or a new one
Make your changes
Make sure your tests pass (./vendor/bin/phpunit) and that your code is compatible with all supported versions of PHP (./vendor/bin/phpcs -p)
Go to your fork of the repo on github.com and make a pull request, preferably with a short summary, detailed description and references to issues/parsing specs as appropriate
Bask in the warm feeling of having contributed to a piece of free software (optional)

Testing

There are currently two separate test suites: one, in tests/Mf2, is written in phpunit, containing many microformats parsing examples as well as internal parser tests and regression tests for specific issues over php-mf2’s history. Run it with ./vendor/bin/phpunit. If you do not have a live internet connection, you can exclude tests that depend on it: ./vendor/bin/phpunit --exclude-group internet.

The other, in tests/test-suite, is a custom test harness which hooks up php-mf2 to the cross-platform microformats test suite. To run these tests you must first install the tests with ./composer.phar install. Each test consists of a HTML file and a corresponding JSON file, and the suite can be run with php ./tests/test-suite/test-suite.php.

Currently php-mf2 passes the majority of it’s own test case, and a good percentage of the cross-platform tests. Contributors should ALWAYS test against the PHPUnit suite to ensure any changes don’t negatively impact php-mf2, and SHOULD run the cross-platform suite, especially if you’re changing parsing behaviour.

Changelog

v0.5.0

Breaking changes:

Bumped minimum PHP version from 5.4 to 5.6 (#220)
#214 parse an img element for src and alt — i.e. all property values parsed as image URLs where the img element has an alt attribute will now be a {'value': 'url', 'alt': 'the alt value'} structure rather than a single URL string
Renamed master branch to main. Anyone who had been installing the latest development version with dev-master will need to change their requirements to dev-main

Other changes:

#195 Fix backcompat parsing for geo property
#182 Fix parsing for iframe.u-*[src]
#206 Add optional ID for h-* elements
#198 reduce instances where photo is implied
Internal: switched from Travis to Github Actions for CI

v0.4.6

Bugfixes:

Don't include img src attribute in implied p-name (#180)
Normalize ordinal dates in VCP values (#167)
Fix for accidental array access of stdClass in deeply nested structures (#196)
Reduce instances where u-url is implied according to a spec update (#183 and parsing issue #36)
Fix for wrongly implied photo property (#190)

Other Updates:

Adds a filter to avoid running tests that require a live internet connection (#194)
Refactor implied name code to match new implied name handling of photo and url (#193)
Moved this repo to the microformats GitHub organization (#179)

v0.4.5

2018-08-02

Bugfixes:

Fix for parsing empty e- elements

Other Updates:

Added .editorconfig to the project and cleaned up whitespace across all files

v0.4.4

2018-08-01

Bugfixes:

Ensure empty properties is an object {} rather than array [] (#171)
Ensure the parser does not mutate the DOMDOcument passed in (#174)
Fix for multiple class names in backcompat parsing (#156)

Microformats Parsing Updates:

New algorithm for plaintext values (#168 and parsing issue #15)
Always resolve URLs from u- properties even when not from a link element (Parsing issue #10)

Other Updates:

Improved test coverage

v0.4.3

2018-03-29

If the masterminds/html5 HTML5 parser is available, the Mf2 parser will use that instead of the built-in HTML parser. This enables proper handling of HTML5 elements such as <article>.

To include the HTML5 parser in your project, run:

composer require masterminds/html5

v0.4.2

2018-03-29

Fixes:

#165 - Prevents inadvertently adding whitespace to the html value
#158 - Allows numbers in vendor prefixed names
#160 - Ignores class names with consecutive dashes
#159 - Remove duplicate values from type and rels arrays
#162 - Improved rel attribute parsing

Backcompat:

#157 - Parse rel=tag as p-category for hEntry and hReview

v0.4.1

2018-03-15

Fixes:

#153 - Fixes parsed timestamps authored with a Z timezone offset
#151 - Adds back "value" of nested microformats when no matching property exists

v0.4.0

2018-03-13

Breaking changes:

#125 - Add rel-urls to parsed result. Removes alternates by default but still available behind a feature flag.
#142 - Reduce instances of implied p-name. See Microformats issue #6. This means it is now possible for the parsed result to not have a name property, whereas before there was always a name property on an object. Make sure consuming code can handle an object without a name now.

Fixes:

#124 - Fix for experimental lang parsing
#127 - Fix for parsing h-* class names containing invalid characters.
#131 - Improved dt- parsing. Issues #126 and #115.
#130 - Fix for implied properties with empty attributes.
#135 - Trim leading and tailing whitespace from HTML value as well as text value.
#137 - Fix backcompat hfeed parsing.
#134 - Fix rel=bookmark backcompat parsing.
#116 - Fix backcompat parsing for summary property in hreview
#149 - Fix for datetime parsing, no longer tries to interpret the value and passes through instead

v0.3.2

2017-05-27

Fixed how the Microformats tests repo is loaded via composer
Moved experimental language parsing feature behind an opt-in flag
#121 Fixed language detection to support parsing of HTML fragments

v0.3.1

2017-05-24

#89 - Fixed parsing empty img alt="" attributes
#91 - Ignore rel values from HTML tags that don't allow rel values
#57 - Implement hAtom rel=bookmark backcompat
#94 - Fixed HTML output when parsing e-* properties
#97 - Experimental language parsing
#88 - Fix for implied photo parsing
#102 - Ignore classes with numbers or capital letters
#111 - Improved backcompat parsing
#106 - Send Accept: text/html header when using the fetch method
#114 - Parse poster attribute for video tags
#118 - Fixes parsing elements with missing attributes
Tests now use microformats/tests repo

Many thanks to @gRegorLove for the major overhaul of the backcompat parsing!

v0.3.0

2016-03-14

Requires PHP 5.4 at minimum (PHP 5.3 is EOL)
Licensed under CC0 rather than MIT
Merges Pull requests #70, #73, #74, #75, #77, #80, #82, #83, #85 and #86.
Variety of small bug fixes and features including improved whitespace support, removal of style and script contents from plaintext properties
All PHPUnit tests passing finally

Many thanks to @aaronpk, @diplix, @dissolve, @dymcx @gRegorLove, @jeena, @veganstraightedge and @voxpelli for all your hard work opening issues and sending and merging PRs!

v0.2.12

2015-07-12

Merges pull requests #65, #66 and #67.
Fixes issue #64.

Many thanks to @aaronpk, @gRegorLove and @kylewm for contributions, @aaronpk and @kevinmarks for PR management and @tantek for issue reporting!

v0.2.11

2015-07-10

v0.2.10

2015-04-29

Merged #58, fixing some parsing bugs and adding support for area element parsing. Thanks so much for your hard work and patience, Ben!

v0.2.9

2014-08-06

Added backcompat classmap for hProduct, associated tests
Started GPG signing version tags as [email protected], fingerprint CBC7 7876 BF7C 9637 B6AE 77BA 7D49 834B 0416 CFA3

v0.2.8

2014-07-17

Fixed issue #51 causing php-mf2 to not work with PHP 5.3
Fixed issue #52 correctly handling the <template> element by ignoring it
Fixed issue #53 improving the plaintext parsing of <img> elements

v0.2.7

2014-06-18

Added Mf2\fetch() which fetches content from a URL and returns parsed microformats
Added implied dt-end discovery (thanks for all your hard work, @gRegorLove!)
Fixed issue causing classnames like blah e- blah to produce properties with numeric keys (thanks @aaronpk and @gRegorLove)
Fixed issue causing resolved URLs to not include port numbers (thanks @aaronpk)

v0.2.6

Added JSON mode as long-term fix for #29
Fixed bug causing microformats nested under multiple property names to be parsed only once

v0.2.5

Removed conditional replacing empty rel list with stdclass. Original purpose was to make JSON-encoding the output from the parser correct but it also caused Fatal Errors due to trying to treat stdclass as array.

v0.2.4

v0.2.3

Made p-* parsing consistent with implied name parsing
Stopped collapsing whitespace in p-* properties
Implemented unicodeTrim which removes characters as well as regex \s
Added support for implied name via abbr[title]
Prevented excessively nested value-class elements from being parsed incorrectly, removed incorrect separator which was getting added in some cases
Updated u-* parsing to be spec-compliant, matching [href] before value-class and only attempting URL resolution for URL attributes
Added support for input[value] parsing
Tests for all the above

v0.2.2

Made resolveUrl method public, allowing advanced parsers and subclasses to make use of it
Fixed bug causing multiple duplicate property values to appear

v0.2.1

Fixed bug causing classic microformats property classnames to not be parsed correctly

v0.2.0 (BREAKING CHANGES)

Namespace change from mf2 to Mf2, for PSR-0 compatibility
Mf2\parse() function added to simplify the most common case of just parsing some HTML
Updated e-* property parsing rules to match mf2 parsing spec — instead of producing inconsistent HTML content, it now produces dictionaries like
```
{
"html": "The Content",
"value: "The Content"
}
```
Removed htmlSafe options as new e-* parsing rules make them redundant
Moved a whole load of static functions out of the class and into standalone functions
Changed autoloading to always include Parser.php instead of using classmap

v0.1.23

Made some changes to the way back-compatibility with classic microformats are handled, ignoring classic property classnames inside mf2 roots and outside classic roots
Deprecated ability to add new classmaps, removed twitter classmap. Use php-mf2-shim instead, it’s better

v0.1.22

Converts classic microformats by default

v0.1.21

Removed webignition dependency, also removing ext-intl dependency. php-mf2 is now a standalone, single file library again
Replaced webignition URL resolving with custom code passing almost all tests, courtesy of Aaron Parecki

v0.1.20

Added in almost-perfect custom URL resolving code

v0.1.19 (2013-06-11)

Required stable version of webigniton/absolute-url-resolver, hopefully resolving versioning problems

v0.1.18 (2013-06-05)

Fixed problems with isElementParsed, causing elements to be incorrectly parsed
Cleaned up some test files

v0.1.17

Rewrote some PHP 5.4 array syntax which crept into 0.1.16 so php-mf2 still works on PHP 5.3
Fixed a bug causing weird partial microformats to be added to parent microformats if they had doubly property-nested children
Finally actually licensed this project under a real license (MIT, in composer.json)
Suggested barnabywalters/mf-cleaner in composer.json

v0.1.16

Ability to parse from only an ID
Context DOMElement can be passed to $parse
Parser::query runs XPath queries on the current document
When parsing e-* properties, elements with @src, @data or @href have relative URLs resolved in the output

v0.1.15

Added html-safe options
Added rel+rel-alternate parsing

License

php-mf2 is dedicated to the public domain using Creative Commons -- CC0 1.0 Universal.

http://creativecommons.org/publicdomain/zero/1.0

php-mf2's People

Contributors

Stargazers

Watchers

php-mf2's Issues

Problems with implicit "name" and h-feed

It seems that the parser adds the whole content of the h-feed node as "implicit name". i tested it on http://notizblog.org/

Ignore class names that end in a hyphen

Currently this html ends up parsing with an array with a numeric key:

<span class="h-entry">
  <span class="e-">foo</span>
</span>

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "0": [
                    {
                        "html": "foo",
                        "value": "foo"
                    }
                ],
                "name": [
                    "foo"
                ]
            }
        }
    ],
    "rels": {

    }
}

It should just ignore the bad e- property

Already-parsed properties of nested items cause partial arrays to be added to the parent array for that property

E.G:

.h-entry > .p-in-reply-to.h-entry > .p-author.h-card

The p-in-reply-to h-entry’s author values array will contain the full p-author microformat, but the parent h-entry will contain a partial array containing only the value key.

cc @sandeepshetty

Implement include pattern

Implement classic hAtom rel-bookmark backcompat

Came up against some sites where this is needed to work properly with feedreaders, so time to finally implement this.

Easiest technique is going to be just adding class="u-url" to each hAtom-scoped a[rel~=bookmark]

Strip out leading and trailing spaces in u-* properties

As per https://twitter.com/jkphl/status/424165079664578560 and http://developers.whatwg.org/urls.html#valid-url-potentially-surrounded-by-spaces, when parsed as URLs (i.e. relative URL resolution is done) u-* properties should have leading and trailing spaces stripped.

Resolve relative URLs inside e-* properties?

Should we do this? Seems like a no-brainer but there may be implications I have not considered.

No rels leads to empty array rather than empty object in output

This is probably unfixable because of PHP, but I wanted to file it so it is on record.

If you parse a document that has no rels, you get back an empty JSON array in the json_encoded output. If you parse a document that has rels, you get back a JSON object. The specification says you should get back an empty object.

Obviously, in PHP lists and dictionaries (in Python terminology) are unified into array which can represent both list-style arrays (numerically-keyed arrays in PHP), dictionary-style arrays (associative arrays) and mixed list and dictionary style. json_encode lets you specify a flag for how you encode empty arrays.

One possible solution to this is to json_encode each piece separately and then stitch them together. Or just ignore the problem and operate on the basis that PHP will never quite be spec-compliant.

Only convert classic microformats classnames if there aren’t microformats2 classnames

In theory this had already been fixed as of #32, but it appears there are some residual cases causing problems.

Example: http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/

Somewhat reduced case:

<body class="h-entry hentry h-as-article" itemscope="" itemtype="http://schema.org/BlogPosting">
<div id="page">
 <article id="post-7546" class="post-7546 post type-post status-publish format-standard category-web tag-dezentral tag-email-to-id tag-facebook tag-whatsapp tag-xmpp">
  <header class="entry-header">
    <h1 class="entry-title p-name" itemprop="name headline"><a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/" class="u-url url" title="Permalink to Wir brauchen Metadaten für Telefonnummern" rel="bookmark" itemprop="url">Wir brauchen Metadaten für Telefonnummern</a></h1>

        <div class="entry-meta">      
      <span class="sep">Ver&ouml;ffentlicht am </span><a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/" title="10:30" rel="bookmark" class="url u-url"><time class="entry-date updated published dt-updated dt-published" datetime="2014-02-20T10:30:40+00:00" itemprop="dateModified">20. Februar 2014</time></a><address class="byline"> <span class="sep"> von </span> <span class="author p-author vcard hcard h-card" itemprop="author" itemscope itemtype="http://schema.org/Person"><img alt='' src='http://1.gravatar.com/avatar/b36983a5651df2c413e264ad4d5cc1a1?s=40&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D40&amp;r=G' class='u-photo avatar avatar-40 photo' height='40' width='40' /> <a class="url uid u-url u-uid fn p-name" href="http://notizblog.org/author/matthias-pfefferle/" title="Alle Beitr&auml;ge von Matthias Pfefferle ansehen" rel="author" itemprop="url"><span itemprop="name">Matthias Pfefferle</span></a></span></address>    </div><!-- .entry-meta -->
      </header><!-- .entry-header -->

      <div class="entry-content e-content" itemprop="description articleBody">
    <p><a href="http://netzwertig.com/2014/02/19/zuckerberg-bekommt-wieder-was-er-will-facebook-uebernimmt-whatsapp-fuer-bis-zu-19-milliarden-dollar/">Facebook kauft WhatsApp</a> und ich hab nur wenig Möglichkeiten meine Konsequenzen daraus zu ziehen. Leider sind alle aktuell populären &#8220;Chat&#8221; Systeme direkt an die App gekoppelt und ich &#8220;muss&#8221; zwangsläufig die App benutzen die mein Freundeskreis bevorzugt.</p>
<p><a href="http://www.whatsapp.com/">WhatsApp</a> benutzt intern das <a href="http://de.wikipedia.org/wiki/WhatsApp#cite_note-10">XMPP-Protokoll</a> und arbeitet dadurch ja theoretisch dezentral und auch <a href="https://telegram.org">Telegram</a> hat beispielsweise <a href="https://core.telegram.org/mtproto">eine Art offenes Protokoll</a> gebaut&#8230; Das Problem: Woher wissen auf welchem Server der Andere angemeldet ist.</p>
<p>Seit WhatsApp die Identifizierung über die Telefonnummer (statt einer z.B. E-Mail Adresse) eingeführt hat, sind viele anderen diesem Beispiel gefolgt und es gibt nichts Verwerfliches daran. Jeder der eine solche App nutzt hat zwangsläufig ein Telefon, was bedeutet dass er auch eine Telefonnummer hat und die Wahrscheinlichkeit dass in seinem (Telefon-)Adressbuch mehr Telefonnummern als E-Mail Adressen stehen ist auch sehr hoch. Prinzipiell also eine gute Idee! Leider kann man aber anhand einer Telefonnummer nicht auf einen Server (mal abgesehen vom Telekommunikations-unternehmen) schließen und das bedeutet, dass das Verfahren leider auch nur zentral funktionieren kann. Nutze ich WhatsApp, kann man mich nur über die WhatsApp-Server erreichen, für Telegram läuft die Kommunikation nur über die Telegram-Server usw.</p>
<p>Um mit XMPP oder anderen Protokollen wirklich dezentral arbeiten zu können, müsste man über die Telefonnummer erfahren können welchen Chat-Server der Andere benutzt. Vielleicht über so eine Art <a href="http://notizblog.org/2008/07/27/email-address-to-url-transformation/"><em>Tel to Id</em></a> &#8211; Service oder über andere Protokolle wie z.B. SMS. Damit könnte sich jeder selbst den Client seines Vertrauens aussuchen und alles wäre <del datetime="2014-02-20T08:59:56+00:00">gut</del> <ins datetime="2014-02-20T08:59:56+00:00">besser</ins> <img src="http://notizblog.org/wp-includes/images/smilies/icon_wink.gif" alt=";)" class="wp-smiley" /> </p>

<div class="social-buttons">
  <a class="FlattrButton" style="display:none;"
     data-flattr-button="compact"
     data-flattr-uid="pfefferle"
     data-flattr-category="text"
     data-flattr-language="de_DE"
     href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/"
     rel="donation payment"></a>

  <div class="g-plusone" data-size="medium" data-lang="de-DE" data-href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/"></div>
</div>      </div><!-- .entry-content -->

  <footer class="entry-meta">
  Ver&ouml;ffentlicht    <span class="cat-links">
    in <a href="http://notizblog.org/category/web/" title="Alle Beiträge in Open Web ansehen" rel="category tag">Open Web</a>  </span>

    <span class="sep"> | </span>
  <span class="tag-links" itemprop="keywords">
    Tags: <a href="http://notizblog.org/tag/dezentral/" rel="tag">dezentral</a>, <a href="http://notizblog.org/tag/email-to-id/" rel="tag">Email to ID</a>, <a href="http://notizblog.org/tag/facebook/" rel="tag">Facebook</a>, <a href="http://notizblog.org/tag/whatsapp/" rel="tag">WhatsApp</a>, <a href="http://notizblog.org/tag/xmpp/" rel="tag">XMPP</a>  </span>

    <span class="sep"> | </span>
  <span class="comments-link"><a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comments" title="Kommentiere Wir brauchen Metadaten für Telefonnummern">7 Meinungen</a></span>

  </footer><!-- #entry-meta --></article><!-- #post-7546 -->
          <nav id="nav-below">
    <h1 class="assistive-text section-heading">Beitragsnavigation</h1>


    <div class="nav-previous"><a href="http://notizblog.org/2014/02/13/amber-case-ueber-privacy-und-das-indieweb/" rel="prev"><span class="meta-nav">&larr;</span> Amber Case über Privacy und das IndieWeb</a></div>    

  </nav><!-- #nav-below -->

          <div id="comments">


      <h2 id="comments-title">
      7 Gedanken zu &ldquo;<span>Wir brauchen Metadaten für Telefonnummern</span>&rdquo;    </h2>


    <ol class="commentlist">
        <li class="comment even thread-even depth-1 h-as-comment p-comment h-entry" id="li-comment-466758">
    <article id="comment-466758" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://1.gravatar.com/avatar/1d6a0566df7760e7d1507810b71a363e?s=50&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://dentaku.wazong.de' rel='external' class='u-url url'>Dentaku</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-466758"><time class="updated published u-updated u-published" datetime="2014-02-20T10:36:01+00:00" itemprop="commentTime">
          20. Februar 2014 bei 10:36          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>ENUM (<a href="https://tools.ietf.org/html/rfc6116">RFC6116</a>) macht genau das. Ist zwar für SIP gedacht, passt aber auch auf diese Anforderung.</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=466758#respond' onclick='return addComment.moveForm("comment-466758", "466758", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  <ul class="children">
  <li class="comment byuser comment-author-matthias-pfefferle bypostauthor odd alt depth-2 h-as-comment p-comment h-entry" id="li-comment-466780">
    <article id="comment-466780" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://1.gravatar.com/avatar/b36983a5651df2c413e264ad4d5cc1a1?s=50&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://notizblog.org' rel='external' class='u-url openid_link url'>Matthias Pfefferle</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-466780"><time class="updated published u-updated u-published" datetime="2014-02-20T10:41:31+00:00" itemprop="commentTime">
          20. Februar 2014 bei 10:41          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>Das war ne schnelle Antwort <img src="http://notizblog.org/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> </p>
<p>Vielen Dank für den Tipp mit ENUM (noch nie davon gehört) und den Link&#8230; werde mich später mal durch das RFC kämpfen&#8230;</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=466780#respond' onclick='return addComment.moveForm("comment-466780", "466780", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  </li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
  <li class="comment even thread-odd thread-alt depth-1 h-as-comment p-comment h-entry" id="li-comment-466867">
    <article id="comment-466867" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://0.gravatar.com/avatar/2d4d94afbc593569446625c02e7a2f73?s=50&amp;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://lukasrosenstock.net/' rel='external' class='u-url url'>Lukas Rosenstock</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-466867"><time class="updated published u-updated u-published" datetime="2014-02-20T11:04:26+00:00" itemprop="commentTime">
          20. Februar 2014 bei 11:04          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>Wollte auch gerade ENUM sagen. Dabei wird die Telefonnummer in einen DNS-Namen konvertiert. Wenn du damit spielen willst, kannst du dir unter <a href="http://www.portunity.de/access/produkte/telefonie/enum-domains.html" >http://www.portunity.de/access/produkte/telefonie/enum-domains.html</a> kostenlos eine deutsche Nummer in ENUM eintragen lassen.</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=466867#respond' onclick='return addComment.moveForm("comment-466867", "466867", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  <ul class="children">
  <li class="comment byuser comment-author-matthias-pfefferle bypostauthor odd alt depth-2 h-as-comment p-comment h-entry" id="li-comment-466932">
    <article id="comment-466932" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://1.gravatar.com/avatar/b36983a5651df2c413e264ad4d5cc1a1?s=50&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://notizblog.org' rel='external' class='u-url openid_link url'>Matthias Pfefferle</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-466932"><time class="updated published u-updated u-published" datetime="2014-02-20T11:18:25+00:00" itemprop="commentTime">
          20. Februar 2014 bei 11:18          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>Krass dass das so vollkommen an mit vorbei gegangen ist&#8230; Gibt es da produktive Anwendungen die ENUM zum Beispiel für Chats o.Ä. verwenden?</p>
<p>&#8230;ich sollte echt mehr bloggen!</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=466932#respond' onclick='return addComment.moveForm("comment-466932", "466932", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  <ul class="children">
  <li class="comment even depth-3 h-as-comment p-comment h-entry" id="li-comment-467106">
    <article id="comment-467106" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://0.gravatar.com/avatar/2d4d94afbc593569446625c02e7a2f73?s=50&amp;d=http%3A%2F%2F0.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://lukasrosenstock.net/' rel='external' class='u-url url'>Lukas Rosenstock</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-467106"><time class="updated published u-updated u-published" datetime="2014-02-20T12:01:54+00:00" itemprop="commentTime">
          20. Februar 2014 bei 12:01          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>ENUM wurde bisher nur als Möglichkeit zur Umgehung der Carrier/Kostenersparnis gesehen, dementsprechend natürlich von Carriern und nahestehenden Hard-/Softwareherstellern nicht unterstützt. Somit kommt es nicht in den Mainstream. Ich sehe es zur Zeit (leider) als reines &#8220;Nerd-Tool&#8221;, genau wie Diaspora, OpenID, IndieWeb &#8230;<br />
Aber der Gedanke eines &#8220;dezentralen WhatsApp&#8221; auf ENUM-Basis kam mir auch schon. Interessantes Projekt, aber auch nicht massentauglich wegen Huhn&amp;Ei-Problemen.</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=467106#respond' onclick='return addComment.moveForm("comment-467106", "467106", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  <ul class="children">
  <li class="comment byuser comment-author-matthias-pfefferle bypostauthor odd alt depth-4 h-as-comment p-comment h-entry" id="li-comment-467346">
    <article id="comment-467346" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://1.gravatar.com/avatar/b36983a5651df2c413e264ad4d5cc1a1?s=50&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://notizblog.org' rel='external' class='u-url openid_link url'>Matthias Pfefferle</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-467346"><time class="updated published u-updated u-published" datetime="2014-02-20T13:22:57+00:00" itemprop="commentTime">
          20. Februar 2014 bei 13:22          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>Hmmm&#8230; Eine Unterstützung von Seiten aller Carrier wäre natürlich wirklich notwendig um massentaugliche Produkte zu bauen&#8230;</p>
<p>Wäre großartig wenn jede Nummer automatisch ne URI bekäme und unter dieser URI ne Art &#8220;Registry&#8221; zu finden wäre, die auch von Apps erweitert werden kann. So ne Art WebFinger für Telefonnummern quasi&#8230;</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=467346#respond' onclick='return addComment.moveForm("comment-467346", "467346", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  </li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
</ul><!-- .children -->
</li><!-- #comment-## -->
  <li class="comment even thread-even depth-1 h-as-comment p-comment h-entry" id="li-comment-505365">
    <article id="comment-505365" class="comment " itemprop="comment" itemscope itemtype="http://schema.org/UserComments">
      <footer>
        <address class="comment-author p-author author vcard hcard h-card" itemprop="creator" itemscope itemtype="http://schema.org/Person">
          <img alt='' src='http://1.gravatar.com/avatar/f7a7b6a59e64d4b8c4a3ded1f85a9879?s=50&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D50&amp;r=G' class='u-photo avatar avatar-50 photo' height='50' width='50' />          <cite class="fn p-name" itemprop="name"><a href='http://www.maexoticde/' rel='external' class='u-url url'>Markus Stumpf</a></cite> <span class="says">meant:</span>        </address><!-- .comment-author .vcard -->

        <div class="comment-meta commentmetadata">
          <a href="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#comment-505365"><time class="updated published u-updated u-published" datetime="2014-03-08T21:45:26+00:00" itemprop="commentTime">
          8. März 2014 bei 21:45          </time></a>
                  </div><!-- .comment-meta .commentmetadata -->
      </footer>

      <div class="comment-content e-content p-summary p-name" itemprop="commentText name description"><p>Diese Interoperabilität nennt sich gemeinhin &#8220;Federation&#8221;: <a href="http://en.wikipedia.org/wiki/Federation_(information_technology)" >http://en.wikipedia.org/wiki/Federation_(information_technology)</a></p>
<p>WhatsApp verwendet kein XMPP. XMPP ist für Mobiles der absolute Horror, denn es basiert auf TCP und damit braucht der Client eine stehende TCP-Verbindung, was massiv auf den Akku geht. Außerdem kommt es permanent zu reconnects, wenn sich laufend die IP-Adresse des Clients ändert.<br />
Aus diesem Grund will man ein verbindungsloses Push-System dahinter haben.</p>
<p>Google und Facebook verwenden XMPP, Facebook hat sich aber noch nie an s2s (Server to Server) Verbindungen beteiligt, Google hat es vor ca 1 Jahr abgeschaltet, damit kann man sich zB. von eigenen XMPP-Servern und damit eigenen XMPP-Accounts nicht mehr mit Google-Usern unterhalten, sonern muss den Google Account verwenden.<br />
Ich habe zB. sowohl meine Facebook als auch Google-Account in meinem pidgin konfiguriert.</p>
<p>TextSecure (clients momentan nur für Android) ist momentan das IMHO beste System in diesem Bereich:<br />
- open source<br />
- harte crypto<br />
- multi device (man kann einen Account auf meheren Devices nutzen)<br />
- bald für iOS und Desktop<br />
und: es unterstützt Federation, man kann sich also seinen eigenen Server hinstellen und es darüber machen.<br />
Siehe: <a href="https://whispersystems.org/blog/the-new-textsecure/" >https://whispersystems.org/blog/the-new-textsecure/</a></p>
<p>Ich muss natürlich immer noch den Account des anderen Teilnehmers kennen &#8230;</p>
</div>

      <div class="reply">
        <a class='comment-reply-link' href='/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/?replytocom=505365#respond' onclick='return addComment.moveForm("comment-505365", "505365", "respond", "7546")'>Antworten</a>      </div><!-- .reply -->
    </article><!-- #comment-## -->
  </li><!-- #comment-## -->
    </ol>




                                <div id="respond" class="comment-respond">
                <h3 id="reply-title" class="comment-reply-title">Hinterlasse eine Antwort <small><a rel="nofollow" id="cancel-comment-reply-link" href="/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/#respond" style="display:none;">Antworten abbrechen</a></small></h3>
                                    <form action="http://notizblog.org/wp-comments-post.php" method="post" id="commentform" class="comment-form" novalidate>
                                                                            <p class="comment-notes">Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert <span class="required">*</span></p>                            <p class="comment-form-author"><label for="author">Name <span class="required">*</span></label> <input autocomplete="nickname name"  id="author" name="author" type="text" value="" size="30" aria-required='true' /></p>
<p class="comment-form-email"><label for="email">E-Mail <span class="required">*</span></label> <input autocomplete="email"  id="email" name="email" type="email" value="" size="30" aria-required='true' /></p>
<p class="comment-form-url"><label for="url">Website</label> <input autocomplete="url"  id="url" name="url" type="url" value="" size="30" /></p>
                                                <p class="comment-form-comment"><label for="comment">Kommentar</label> <textarea id="comment" name="comment" cols="45" rows="8" aria-required="true"></textarea></p>                        <p class="form-allowed-tags">Du kannst folgende <abbr title="HyperText Markup Language">HTML</abbr>-Tags benutzen:  <code>&lt;a href=&quot;&quot; title=&quot;&quot;&gt; &lt;abbr title=&quot;&quot;&gt; &lt;acronym title=&quot;&quot;&gt; &lt;b&gt; &lt;blockquote cite=&quot;&quot;&gt; &lt;cite&gt; &lt;code&gt; &lt;del datetime=&quot;&quot;&gt; &lt;em&gt; &lt;i&gt; &lt;q cite=&quot;&quot;&gt; &lt;strike&gt; &lt;strong&gt; </code></p>                       <p class="form-submit">
                            <input name="submit" type="submit" id="submit" value="Kommentar abschicken" />
                            <input type='hidden' name='comment_post_ID' value='7546' id='comment_post_ID' />
<input type='hidden' name='comment_parent' id='comment_parent' value='0' />
                        </p>
                        <p style="display: none;"><input type="hidden" id="akismet_comment_nonce" name="akismet_comment_nonce" value="9552e566de" /></p><p class="comment-subscription-form"><input type="checkbox" name="subscribe_comments" id="subscribe_comments" value="subscribe" style="width: auto; -moz-appearance: checkbox; -webkit-appearance: checkbox;" /> <label class="subscribe-label" id="subscribe-label" for="subscribe_comments">Benachrichtige mich über nachfolgende Kommentare via E-Mail.</label></p><p class="comment-subscription-form"><input type="checkbox" name="subscribe_blog" id="subscribe_blog" value="subscribe" style="width: auto; -moz-appearance: checkbox; -webkit-appearance: checkbox;" /> <label class="subscribe-label" id="subscribe-blog-label" for="subscribe_blog">Benachrichtige mich über neue Beiträge via E-Mail.</label></p><script type='text/javascript' src='http://notizblog.org/wp-content/plugins/akismet/_inc/form.js?ver=3.0.0'></script>
<p style="display: none;"><input type="hidden" id="ak_js" name="ak_js" value="76"/></p>                 </form>
                            </div><!-- #respond -->
                <form id="webmention-form" action="http://notizblog.org/?webmention=endpoint" method="post">
      <p>
        <label for="webmention-source">Responding with a post on your own blog? Send me a <a href="http://indiewebcamp.com/webmention">WebMention</a> <sup>(<a href="http://adactio.com/journal/6469/">?</a>)</sup></label>
        <input id="webmention-source" type="url" name="source" placeholder="URL/Permalink of your article" />
      </p>
      <p>
        <input id="webmention-submit" type="submit" name="submit" value="Ping me!" />
      </p>
      <input id="webmention-target" type="hidden" name="target" value="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/" />
    </form>
    <p>
    <label for="cite-shortlink">Shortlink</label>
    <input id="cite-shortlink" class="u-url url shortlink" type="text" value="http://notizblog.org/b/25m" />
  </p>
  <p>
    <label for="cite-permalink">Permalink</label>
    <input id="cite-permalink" class="u-url url u-uid uid bookmark" type="text" value="http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/" />
  </p>
  <p>
    <label for="cite-cite">HTML</label>
    <input id="cite-cite" class="code" type="text" size="70" value="&lt;cite class=&quot;h-cite&quot;&gt;&lt;a class=&quot;u-url p-name&quot; href=&quot;http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/&quot;&gt;Wir brauchen Metadaten für Telefonnummern&lt;/a&gt; (&lt;span class=&quot;p-author h-card&quot; title=&quot;Matthias Pfefferle&quot;&gt;Matthias Pfefferle&lt;/span&gt; &lt;time class=&quot;dt-published&quot; datetime=&quot;2014-02-20T10:30:40+00:00&quot;&gt;20. Februar 2014&lt;/time&gt;)&lt;/cite&gt;">
  </p>

</div><!-- #comments -->
</body>

Add support for classic microformats

The only approach which will allow php-mf2 to continue to be a true generic parser but still support classic microformats (which combine syntax and vocabulary) is to convert classic classnames to their µf2 equivalents as per the BC tables on http://microformats.org/wiki/microformats2#combining_microformats

Use implied date for dt-end when none is specified

This is based on the proposed spec here http://microformats.org/wiki/value-class-pattern#microformats2_parsers.

Note that in mf2py, I chose to only implement the first bullet point — use most recently parsed date as the default. The second, using the next date if there was no previous date, is sufficiently difficult that I thought it would make sense to wait for an actual example in the wild.

Here are a couple of test cases that I pulled from @gRegorLove's work on event templates.

 <div class="h-event">
   <h1 class="p-name">Implied Date wo Timezone</h1>
   This test case and the next are courtesy of event templates on
   http://indiewebcamp.com/User:Gregorlove.com/sandbox
   <p> When:
     <span class="dt-start">
       <span class="value" title="May 21, 2014">2014-05-21</span>
       <span class="value" title="18:30">18:30</span>
       –
       <span class="dt-end">19:30</span>
     </span> (local time)
   </p>
 </div>
 <div class="h-event">
   <h1 class="p-name">Implied Date w/ Timezone</h1>
   <p> When:
     <span class="dt-start">
       <span class="value" title="June 1, 2014">2014-06-01</span>
       <span class="value" title="12:30">12:30<span style="display: none;">-06:00</span></span>
       –
       <span class="dt-end">19:30<span style="display: none;">-06:00</span></span>
     </span> (-06:00 <abbr>UTC</abbr>)
   </p>

 </div>

Write own relative URL resolving code

The webignition/url dependency demands i18n extension, which, whilst valid, can be a pain, so it’d be nice to write our own native implementation of that routine.

Make classic microformats support much stricter

If any mf2 classnames are found in a document, don’t convert any classic mf classnames

E.G. currently parsing this URL with classic support turned on produces bad results due to classic mf classnames being on different elements to mf2 classnames — with stricter checking this would not be a problem.

Relative URL parsing code leaves port number out of resolved URL

e.g. see photo property of http://pin13.net/mf2/?url=http%3A%2F%2Fdev.bdesham.info%3A8080%2Fabout.html

cc @aaronpk

phpunit testing fails

phpunit tests are failing in two steps, the first (IgnoresUppercaseClassnames) looks to be in incorrect test as the code of the parser clearly has strtolower() on classnames before comparing them. Also, is that even correct Mf2 parsing to care about case?

The second error seems to be an issue a change in PHP as I only get it with certain versions

phpunit 
PHPUnit 3.7.28 by Sebastian Bergmann.

Configuration read from /var/www/mf2-master/phpunit.xml

...............................................................  63 / 166 ( 37%)
......................F........................................ 126 / 166 ( 75%)
..F.....................................

Time: 3.06 seconds, Memory: 5.25Mb

There were 2 failures:

1) Mf2\Parser\Test\ParserTest::testMicroformatNamesFromClassIgnoresUppercaseClassnames
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
+    0 => 'H-ENTRY'
 )

/var/www/mf2-master/tests/Mf2/ParserTest.php:77

2) Mf2\Parser\Test\UrlTest::testReturnsUrlIfAbsolute with data set #16 ('relative add scheme host user from base', 'http://user:@www.example.com', 'server.php', 'http://user:@www.example.com/server.php')
relative add scheme host user from base
Failed asserting that two strings are equal.
--- Expected
+++ Actual
@@ @@
-'http://user:@www.example.com/server.php'
+'http://[email protected]/server.php'

/var/www/mf2-master/tests/Mf2/URLTest.php:158

FAILURES!
Tests: 166, Assertions: 287, Failures: 2.



php --version
PHP 5.5.9-1ubuntu4.5 (cli) (built: Oct 29 2014 11:59:10) 
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies

problem with e-content

It seems that the parser has a bug with e-content if it is the same tag as p-summary or/and p-name

http://pin13.net/mf2/?url=http://notizblog.org/replies/converspace-activity-syntax/

Update e-* parsing to new parsing spec

As outlined here: https://etherpad.mozilla.org/microformats2parsing (will be wikified later)

Basically, parsing an e-* property doesn’t add a string to the values array for that property, it instead adds a hash like:

{
   "value": "Plaintext < > &",
  "html": "<p>Plaintext &lt; &gt; &amp;</p>"
}

rel-values shouldn't be global

In the current version of the parser, the rel values are part of the root element, that makes it impossible to assign them to an h-entry or h-card.

value-class being incorrectly parsed

On http://tantek.com/2013/317/t1/awake-second-novemberproject — the datetime is being parsed as "2013-11-13T", which is wrong.

Value Class Pattern for date and time values not working/implemented?

Not seeing dt-published in tantek's feed:
http://pin13.net/mf2/?url=http://tantek.com

Markup on tantek.com:

<a href="2013/178/t1/surreal-meeting-dpdpdp-trondisc" rel="bookmark" class="dt-published published dt-updated updated u-url u-uid"><time class="value">10:17</time> on <time class="value">2013-06-27</time></a>

Don't know enough about this but Tantek said php-mf2

requires support of: http://microformats.org/wiki/value-class-pattern#Date_and_time_values
which is required by http://microformats.org/wiki/microformats2-parsing#parsing_a_dt-_property

Twitter mapping

I can't seem to get the Twitter mapping to work. I tried using convertTwitter() and addTwitterClassMap() but neither seemed to work. (the convertTwitter method doesn't seem to exist: does the readme need to be updated?)

Prevent nested value-class pattern elements from being parsed incorrectly

E.G. given this snippet:

<div class="h-card"><span class="p-tel"><span class="value">1234</span><span class="h-card"><span class="p-tel"><span class="value">5678</span>

The first h-card has a tel property of 1234 5678, the nested one of 5678. The nested element should be marked as already parsed to prevent it leaking up the tree.

Relative URL parsing code doesn’t handle protocol-relative URLs

E.G. <a class="u-url" href="//domain.com/"> on http://example.com/page.html gets resolved to http://example.com/page.html//domain.com/

cc @aaronpk

Ditch PHP Datetime in value-title and value-class pattern

PHP DateTime doesn’t support the breadth and fuzzyness of datetime values which might be authored in HTML5. Ditch it in parsing and instead use regexes based on the spec here: http://microformats.org/wiki/value-class-pattern#Date_and_time_parsing

Array notation / PHP version

The composer.json allows for 5.3, in line 126 of Parser.php the new array notation that breaks in PHP < 5.4 is being used.

value-title parsing broken

Seems php-mf2 does not parse value-title correctly:

<div class="h-entry">
 <h1 class="p-name">test</h1>
 <span class="dt-published"><span class="value-title" title="2012-02-16T16:14:47+00:00"> </span>16.02.2012</span>
</div>

http://waterpigs.co.uk/php-mf2/ tells me the date is

                "published": [
                    " 16.02.2012"
                ]

instead of the machine-readable form.

see http://microformats.org/wiki/value-class-pattern#Parsing_value_from_a_title_attribute

Not all h-feeds are found

As can be seen in the following gists:

real site waterpigs.co.uk, no h-feed found, https://gist.github.com/jonnybarnes/4ddf5606867db75fb0f0
very simply html, h-feed found, https://gist.github.com/jonnybarnes/61a2bc9e360b12d53b1f

Add support for implied abbr p-name

As per http://microformats.org/wiki/microformats2-parsing#parsing_for_implied_properties

it appears that some e-* properties are not being correctly parsed

…in the latest dev version. This is probably due to the changes I made when fixing the partial nested microformat property problem, but apparently my testsuite didn’t cover these particular cases.

Does not work with PHP 5.3

Upgraded to the latest version:

Loading composer repositories with package information
Updating dependencies (including require-dev)

Removing mf2/mf2 (v0.1.14)

Installing mf2/mf2 (v0.1.16)
Downloading: 100%

Got got this error:

PHP Parse error: syntax error, unexpected '[' in vendor/mf2/mf2/mf2/Parser.php on line 671

and on Line #671: 'rel' => implode(' ', array_diff($linkRels, ['alternate']))

That's using the [] short syntax for arrays. That's not supported in PHP 5.3.

from vendor/mf2/mf2/composer.json

"require": {
"php": ">=5.3.0",

Which is incompatible.

This breaks support for PHP 5.3 (which is what I have on AppFog).

Add option to html-encode non e-* properties

Otherwise there is no way for the library user to figure out what level of encoding any given property is at. If we provide the option to html-encode (< > & ") characters on non e-* properties, all the output will be at the same level and can be treated as HTML instead of a mixture of HTML and plain text.

Add option to only parse rel values

When using this library to look for rel values in the HTML body such as a webmention or auth server, there is no need to parse for the full microformats vocabulary. Would be great to be able to optionally only return the "rel" object.

Put nested µf with an associated property name under that property name

At the moment all nested µf are just put into .children

Add support for partial datetimes

As per IRC conversation on #microformats

Doesn’t yet support value-title

Implement microformats test suite

@barnabywalters mentioned php-mf2 should run tests against the microformats test suite: https://github.com/tobiastom/tests. I have volunteered to take on this task.

Return everything as a string

PHP DateTime does not support the same breadth of values as might be specified by authors, so leave it up to consuming apps to parse the datetimes in whatever way they want

Relative URL resolution probably breaks URLs with weird schemes

OTTOMH not sure just how much effect this will have, certainly work adding tests for (geo:, tel:)

Make implied name parsing consistent with p- property parsing

Currently whitespace is being collapsed on implied names, not so on p-* properties.

Update u-* parsing to support the spec

Currently value-class parsing is applied before looking in @href, which can cause problems when post publishing datetimes are nested within a.u-url but use the value-class pattern, causing u-* properties to be malformed datetimes.

Example: http://tantek.com

Solution: update to follow the mf2 parsing spec more closely. Specifically, look in @href, @DaTa, @src etc before parsing for value-class.

Add support for parsing only portions of a document

By, for example, passing and ID and only parsing that element, or passing a DOMElement.

Easiest just to pass an element to use as context when doing initial .h-* xpath query.

CC @tantek

classic conversion problems when using mf and mf2 on the same html-tag

discussion: http://indiewebcamp.com/irc/2014-05-20/line/1400594738

it seems that the following code:

<span class="author p-author vcard hcard h-card" itemprop="author" itemscope="" itemtype="http://schema.org/Person">
  <img alt="" src="http://1.gravatar.com/avatar/b36983a5651df2c413e264ad4d5cc1a1?s=40&amp;d=http%3A%2F%2F1.gravatar.com%2Favatar%2Fad516503a11cd5ca435acc9bb6523536%3Fs%3D40&amp;r=G" class="u-photo avatar avatar-40 photo" height="40" width="40"> 
  <a class="url uid u-url u-uid fn p-name" href="http://notizblog.org/author/matthias-pfefferle/" title="Alle Beiträge von Matthias Pfefferle ansehen" rel="author" itemprop="url">
    <span itemprop="name">Matthias Pfefferle</span>
  </a>
</span>

is parsed twice http://pin13.net/mf2/?url=http://notizblog.org/2014/02/20/wir-brauchen-metadaten-fuer-telefonnummern/

Add support for multiple h-* names in `type`

As per http://microformats.org/wiki/microformats2#h-card_org_h-card

<div class="h-card">
  <a class="p-name u-url"
     href="http://blog.lizardwrangler.com/" 
    >Mitchell Baker</a> 
  (<a class="p-org h-card h-org" 
      href="http://mozilla.org/"
     >Mozilla Foundation</a>)
</div>

Should return:

{
  "items": [{ 
    "type": ["h-card"],
    "properties": {
      "name": ["Mitchell Baker"],
      "url": ["http://blog.lizardwrangler.com/"],
      "org": [{
        "value": "Mozilla Foundation",
        "type": ["h-card", "h-org"],
        "properties": {
          "name": ["Mozilla Foundation"],
          "url": ["http://mozilla.org/"]
      }]
    }
  }]
}

Parsing <img> in e-content

The value key of parsed e-content should replace img elements with their alt attribute or their src attribute.

Reference:
http://microformats.org/wiki/microformats2-parsing#parsing_an_e-_property
http://indiewebcamp.com/irc/2014-07-15#t1405450308

Need to handle <template> tag according to the HTML parsing rules

The new template tag in HTML needs to be parsed according to the HTML parsing rules and not treated as part of the DOM for the purpose of microformat parsing.

I've created a failing test case for mf2py. There's IRC discussion logged here.

Add support for input[value] parsing

As per mf2 parsing spec.

Multiple properties for a nested object not parsed correctly

Simple HTML example:

<article class="h-entry"><div class="p-like-of p-in-reply-to h-cite"></div></article>

Should result in the h-cite object appearing in both a "like-of" and "in-reply-to" property:

{
    "items": [
        {
            "type": [
                "h-entry"
            ],
            "properties": {
                "like-of": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                ""
                            ]
                        },
                        "value": ""
                    }
                ],
                "in-reply-to": [
                    {
                        "type": [
                            "h-cite"
                        ],
                        "properties": {
                            "name": [
                                ""
                            ]
                        },
                        "value": ""
                    }
                ],
                "name": [
                    ""
                ]
            }
        }
    ],
    "rels": [

    ]
}

microformats / php-mf2 Goto Github PK

php-mf2's Introduction

php-mf2

Installation

Signed Code Verification

Usage

Examples

Fetching Microformats from a URL

Parsing Microformats from a HTML String

Parsing a Document with Relative URLs

Parsing Link rel Values

Debugging Mf2\fetch

Getting more control by creating a Parser object

Selectively Parsing a Document

Experimental Language Parsing

Generating output for JSON serialization with JSON-mode

Classic Microformats Markup

Security

Contributing

How to make a Pull Request

Testing

Changelog

v0.5.0

v0.4.6

v0.4.5

v0.4.4

v0.4.3

v0.4.2

v0.4.1

v0.4.0

v0.3.2

v0.3.1

v0.3.0

v0.2.12

v0.2.11

v0.2.10

v0.2.9

v0.2.8

v0.2.7

v0.2.6

v0.2.5

v0.2.4

v0.2.3

v0.2.2

v0.2.1

v0.2.0 (BREAKING CHANGES)

v0.1.23

v0.1.22

v0.1.21

v0.1.20

v0.1.19 (2013-06-11)

v0.1.18 (2013-06-05)

v0.1.17

v0.1.16

v0.1.15

License

php-mf2's People

Contributors

Stargazers

Watchers

Forkers

php-mf2's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Parsing Link `rel` Values