GithubHelp home page GithubHelp logo

ampproject / ampbench Goto Github PK

View Code? Open in Web Editor NEW
66.0 17.0 39.0 7.14 MB

AMPBench: AMP URL validation and troubleshooting tools (DEPRECATED)

License: Apache License 2.0

JavaScript 64.87% CSS 0.07% HTML 21.97% TypeScript 12.86% Shell 0.23%

ampbench's Introduction

Build Status

AMPBench: AMP URL Validation and Troubleshooting

Status

The hosted version of ampbench was shut down in February 2020. For the rationale and alternatives, see issue #126.

There are no plans to remove the source code from its canonical location at https://github.com/ampproject/ampbench, but if it is important to you, we recommend making a copy.

Guides

Walkthrough article: Debug AMP pages with AMPBench, an open source app from the AMP Project.

What does it do?

AMPBench is a web application and service that validates AMP URLs + their associated Structured Data.

During AMP URL validation, it builds referable, support-friendly sharable URLs such as the following:

AMPBench in action

License

AMPBench is licensed under the Apache 2.0 LICENSE.

Disclaimer

AMPBench is not an official Google product.

Getting the code and running it

Install the Node.js Active LTS version on your system. E.g., by downloading or by using a package manager or by using NVM.

Now do the following from a terminal command-line session:

$ git clone https://github.com/ampproject/ampbench.git
$ cd ampbench
$ npm install
$ node ampbench_main.js

Also try navigating to these links from your web browser:

Even try this from the command-line:

$ curl http://localhost:8080/version/
$ curl http://localhost:8080/raw?url=https://ampbyexample.com/
$ curl http://localhost:8080/api?url=https://ampbyexample.com/
$ curl http://localhost:8080/api1?url=https://ampbyexample.com/
$ curl http://localhost:8080/api2?url=https://ampbyexample.com/

Utilities

AMPBench includes some useful debug utility commands that can in some cases help with troubleshooting, such as when a full validation fails on a URL by returning unexpected server responses.

The /debug... commands attempt to follow fetch requests and display relevant request and response details in a similar spirit to the curl -I [--head]... utility.

Use these as follows in the browser:

and:

Or, with the command-line compatible _cli equivalents, in a terminal session:

$ curl https://ampbench.appspot.com/debug_cli?url=https://ampbyexample.com

and:

$ curl https://ampbench.appspot.com/debug_curl_cli?url=https://ampbyexample.com

The /debug and /debug_cli versions use a smartphone HTTP User Agent. The /debug_curl and /debug_curl_cli variants use the curl (desktop and server-side) User Agent.

The applied User Agent is reported in the output and can be seen in the resulting HTTP request headers as in the following examples.

For /debug...:

{"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2725.0 Mobile Safari/537.36","host":"ampbyexample.com"}

and for /debug_curl...:

{"User-Agent":"curl/7.43.0","host":"ampbyexample.com"}

Deploying AMPBench to the Cloud

Deploying AMPBench to Google Compute Engine

To deploy AMPBench to the App Engine flexible environment, you need to have a Google Cloud Platform Console project.

Please review the following documentation:

From within the ampbench source root folder, deplyoment to Google Compute Engine, App Engine flexible environment, should be similar to the following sequence.

Run gcloud init:

$ gcloud init

Deploy and run:

$ gcloud app deploy 

Deploying AMPBench to Amazon Web Services (AWS)

AWS Elastic Beanstalk uses highly reliable and scalable services that are available in the AWS Free Usage Tier and supports apps developed in Node.js, such as AMPBench, out-of-the-box.

Please review the following documentation:

Make sure to set up AWS with your account credentials:

The Elastic Beanstalk Command Line Interface (EB CLI) is configured as follows:

From within the ampbench source root folder, deplyoment to AWS Elastic Beanstalk environment should be similar to the following:

$ eb init # only initially or when the configuration changes
$ eb deploy

Configuring AMPBench via the environment variables

AMPBench supports configuration of the port to listen on via the PORT environment variable. e.g.

    PORT=8080 npm start

AMPBench also supports Google Analytics tracking using gtag configuration. To enable this simply define the GTAG_ID environment variable. e.g.

    GTAG_ID=UA-123456789-1 npm start

ampbench's People

Contributors

alannawalton avatar andreban avatar chenshay avatar dakkad avatar erwinmombay avatar gianmarcobrunialtimrf avatar ithinkihaveacat avatar jeffjose avatar juanchaur avatar mrjoro avatar nygellyndley avatar philkrie avatar pietergreyling avatar powdercloud avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ampbench's Issues

Integrate cache validation

We see it pretty frequently that hosts accidentally "cloak" (based on UA) content when delivering to the AMP Cache.

It would be very useful to also check whether content passes validation on the cache for that reason.

Publisher Logo and Article Image are reported as "FAIL" while it passed SDTT

AMPBench fails to validate structured data for ImageGallery

Looks like the ampbench is failing to validate valid ld+json structured data for ImageGalleries. The goal is for us to get image gallery news content to show up in the carousel, but it seems as if we have errors that prevent this from happening.

The structured data for our AMP image galleries contains an array with two objects, one for NewsArticle and one for ImageGallery. According the structured data testing tool results this is perfectly valid. But, according to the ampbench results it is not.

I couldn't tell from the source code whether structured data with arrays are supported.

How come AMPBench validation fails for us? Is the AMPBench validation the same one Google uses for the Top News carousel?

Thanks!

AMPBench is using the old Google Cache URL format. It should use the new format

Canonical AMP page with amphtml and canonical links triggering infinite redirects

Triggering validation for a canonical AMP page, which has the canonical link and amphtml links pointing to itself is making AMPBench looks infinitely.

The issue is being triggered by the test in this part of the code https://github.com/ampproject/ampbench/blob/master/ampbench_handlers.js#L218-L219

It tests if the canonical like is the same as the validation url and if the amphtml attribute exists, and then triggers a redirect to the url on the amphtml link.

In this case, since the amphtml link is the same as the url being validated, we end up in an infinite redirect.

amp-compare-slider implementation

These are the things that will be implemented for right now

  • Horizontal slider touch interaction
  • Disappearing slider education message
  • Divider styling options
  • Tap to snap on Divider

Work with amphtml-validator Node.js package to make it a drop-in compatible validator backend for AMPBench

Validator returns fail in mobile UA if page has a redirection on mobile device

Hi,

If there is a desktop to mobile redirection on an url, ampbench validator returns "AMP link in Canonical page does not refer to the current AMP page" error. On desktop everything is ok.

Mobile :
https://ampbench.appspot.com/validate?url=https%3A%2F%2Fm.mynet.com%2Fsampiyonlar-ligi-yari-final-eslesmeleri-belli-oldu-spor-155570 (we redirect desktop request to mobile page if user agent detected as mobile)

Desktop :
https://ampbench.appspot.com/validate_ua_desktop?url=https%3A%2F%2Fm.mynet.com%2Fsampiyonlar-ligi-yari-final-eslesmeleri-belli-oldu-spor-155570 (no canonical refer error)

Mobile with no redirection:
https://ampbench.appspot.com/validate?url=https%3A%2F%2Fm.mynet.com%2Fdoganin-agir-sikletleri-trend-1115366 (again no canonical refer error)

crash with url returning status Code != 200

Using "/api" route with an url that return statusCode != 200, the server crashes with the following error message:

[AMPBench:v.1.0][2017-02-04T13:01:20.863Z] [validator-signature:96c9247f6c997613302517643841f1f96e17bcf14ca3d51ebffae43b6274e648][HTTP:401] /api https://passwordprotected.com/amp.htm
_http_outgoing.js:356
    throw new Error('Can\'t set headers after they are sent.');
    ^

Error: Can't set headers after they are sent.
    at ServerResponse.OutgoingMessage.setHeader (_http_outgoing.js:356:11)
    at ServerResponse.header (/home/gpaes/git/ampbench/node_modules/express/lib/response.js:719:10)
    at ServerResponse.send (/home/gpaes/git/ampbench/node_modules/express/lib/response.js:164:12)
    at ServerResponse.json (/home/gpaes/git/ampbench/node_modules/express/lib/response.js:250:15)
    at on_output (/home/gpaes/git/ampbench/ampbench_routes.js:442:29)
    at IncomingMessage.res.on (/home/gpaes/git/ampbench/ampbench_lib.js:926:21)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
    at endReadableNT (_stream_readable.js:974:12)
    at _combinedTickCallback (internal/process/next_tick.js:74:11)

To fix it, it's necessary to prevent double on_output_callback call at ampbench_lib.js, at line 911 and 926. Maybe a return or else.

if (!http_response.statusIsOK()) { // NOT (200 == this.http_response_code)
    http_response.http_response_body = '';
    return on_output_callback(http_response, [CHECK_FAIL]); // !!! RETURN to front-end  - - - - - - - - - - - -
}

Fix proposal - encodeUri used twice in ampbench_routes

The bad URL check is performed twice.

Globally through app.use(), and locally, through the assert_url() private function called within each and every validation endpoint. This is causing the queried URL to be encoded twice, thus resulting in 404 or 500 errors.

As assert_url seems to be always used in a "rule-of-thumb" fashion and contains more in-depth checks for the URL's to be validated, I'd propose to remove the global one.

I've opened a PR for it.

Server Error (502 Bad Gateway) Analysing Blogger Content

Actual:

When validating a non AMP page https://www.damienallison.com/ the AMP bench tool returns a 502 bad gateway.

Steps to reproduce:

  1. Goto ampbench.appspot.com
  2. Enter https://www.damienallison.com/ into the validate URL.
  3. https://ampbench.appspot.com/validate?url=https%3A%2F%2Fwww.damienallison.com%2F shows 502 Bad Gateway.

Expected:

Expected the analysis to show that the page is not an AMP page.

Debug output:

https://ampbench.appspot.com/debug?url=https%3A%2F%2Fwww.damienallison.com%2F

==> GET: https://www.damienallison.com/

{"User-Agent": UA_MOBILE_ANDROID_CHROME_52}


==> REQUEST: https://www.damienallison.com/

{"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2725.0 Mobile Safari/537.36","host":"www.damienallison.com"}


==> REDIRECT: 302 https://www.damienallison.com/?m=1

{"location":"https://www.damienallison.com/?m=1","content-type":"text/html; charset=UTF-8","date":"Tue, 21 Aug 2018 11:52:32 GMT","expires":"Tue, 21 Aug 2018 11:52:32 GMT","cache-control":"private, max-age=0","x-content-type-options":"nosniff","x-frame-options":"SAMEORIGIN","x-xss-protection":"1; mode=block","server":"GSE","accept-ranges":"none","vary":"Accept-Encoding","connection":"close"}


==> REQUEST: https://www.damienallison.com/?m=1

{"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2725.0 Mobile Safari/537.36","referer":"https://www.damienallison.com/","host":"www.damienallison.com"}


==> RESPONSE: 200

{"content-type":"text/html; charset=UTF-8","expires":"Tue, 21 Aug 2018 11:52:32 GMT","date":"Tue, 21 Aug 2018 11:52:32 GMT","cache-control":"private, max-age=0","last-modified":"Thu, 16 Aug 2018 10:36:11 GMT","x-content-type-options":"nosniff","x-xss-protection":"1; mode=block","server":"GSE","accept-ranges":"none","vary":"Accept-Encoding","connection":"close"}


Checker for multiple amphtml links in the canonical

I've a somehow usual mistake that some websites do: they have multiple <link rel="amphtml"> tags in their canonical site.

One of the main reasons I've seen is that those websites are Wordpress based sites that have two or more plugins which add the markup.

Here is an example

I think makes sense that ampbench warns about this in the canonical check, and I'd like to send a PR

Can't send headers after they are sent

_http_outgoing.js:344
throw new Error('Can't set headers after they are sent.');
^

Error: Can't set headers after they are sent.
at ServerResponse.OutgoingMessage.setHeader (_http_outgoing.js:344:11)
at ServerResponse.header (D:\Projects\Main\ampbench\node_modules\express\lib\response.js:719:10)
at ServerResponse.send (D:\Projects\Main\ampbench\node_modules\express\lib\response.js:164:12)
at ServerResponse.json (D:\Projects\Main\ampbench\node_modules\express\lib\response.js:250:15)
at on_output (D:\Projects\Main\ampbench\ampbench_routes.js:418:29)
at IncomingMessage. (D:\Projects\Main\ampbench\ampbench_lib.js:915:21)
at emitNone (events.js:72:20)
at IncomingMessage.emit (events.js:166:7)
at endReadableNT (_stream_readable.js:913:12)
at nextTickCallbackWith2Args (node.js:442:9)

npm ERR! Windows_NT 6.1.7601
npm ERR! argv "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" "start"
npm ERR! node v4.4.4
npm ERR! npm v2.15.1
npm ERR! code ELIFECYCLE
npm ERR! [email protected] start: node ampbench_main.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script 'node ampbench_main.js'.
npm ERR! This is most likely a problem with the ampbench package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! node ampbench_main.js
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs ampbench
npm ERR! Or if that isn't available, you can get their info via:
npm ERR!
npm ERR! npm owner ls ampbench
npm ERR! There is likely additional logging output above.

npm ERR! Please include the following file with any support request:
npm ERR! D:\Projects\Main\ampbench\npm-debug.log

Happening while API call.

Update deploy process to use progressive rollout

Refactor dynamic includes of AMP validate to use a fixed version. Also change the roll-out process for the pinned version to support migrating traffic to new versions to test rolling forward.

The process should be repeated regularly to include ongoing changes to dependencies like the AMP validate dependencies rather than run-time updates which could result in unintended issues.

Schema Markup using URLs for article images fail validation.

On the Article specification, the documentation says that the image field can be "Repeated field of ImageObject or URL".

AMPBench validation for the field currently fails if an URL is used, of if a sequence of objects is used. The validation seems to only accept ImageObjects (as per this code section).

The example below, extracted from the docs fails:

{
  "@context": "http://schema.org",
  "@type": "NewsArticle",
  "image": [
    "https://example.com/photos/1x1/photo.jpg",
    "https://example.com/photos/4x3/photo.jpg",
    "https://example.com/photos/16x9/photo.jpg"
  ]
}

Cleanup AMPBench UI

Here are a few suggestions (collecting these into a single issue to make it easier to discuss):

  • clean up homepage (remove intro, only show the URL input field in the center)
  • don't show AMP Story section if it's not an amp story
  • don't embed validator.ampproject.org (duplicates the normal validation results). Add a link instead.
  • don't show server response time as it's confusing to users (see latest discussion on slack)
  • remove Structured Data Tester and link to the tool instead.
  • rename "Indexed in Google AMP Cache" to "Google AMP Cache". Indexing gives the wrong impression.
  • cleanup report messages (e.g. remove brackets as they're hard to read).
  • Don't show a warning for canonical amp pages: WARNING | [Canonical URL is reachable][WARNING: AMP link not found in the Canonical page]

amp-mustache src was updated

I am getting this error message:

: line 26, col 4: The attribute 'src' in tag 'amp-mustache extension .js script' is set to the invalid value 'https://cdn.ampproject.org/v0/amp-mustache-0.2.js'. (see https://www.ampproject.org/docs/reference/components/amp-mustache)

But looking at https://www.ampproject.org/docs/reference/components/amp-mustache:

Required Script | <script async custom-template="amp-mustache" src="https://cdn.ampproject.org/v0/amp-mustache-0.2.js"></script>

Add Custom Error Handler Pages

In order to track common errors like 404, 500 etc add a custom error handler to the project.

The custom error handler should offer useful feedback to the customer and can also be tracked in Analytics.

Check for amphtml link showing a PASS when it doesn't exist

Desktop page:
https://www.alibaba.com/photo-detail/Antiqued-Copper-Plated-Lobster-Claw-Swivel_60532600944.html

  • Has canonical link pointing to itself, and amphtml link pointing to the m. page
  • 302 redirects to m. page on a mobile user agent

Mobile page (that has been AMP'ed):
https://m.alibaba.com/photo-detail/60532600944/Antiqued-Copper-Plated-Lobster-Claw-Swivel.html

  • Has canonical link pointing to www. page, and does not have amphtml link

AMP Bench run on Desktop page with a mobile user agent shows that the check for AMP URL is PASSing with a link to the www. page. Why is that? The mobile page that it's redirecting to does not have a amphtml url, so looks like there's something wonky here.
http://ampbench.appspot.com/validate?url=https://www.alibaba.com/photo-detail/Antiqued-Copper-Plated-Lobster-Claw-Swivel_60532600944.html

URL's with UTF-8 characters fail

I get [ERROR: INVALID URL] Please check the formatting of the requested URL error message from ampbench when a URL contains an UTF-8 character

For example: [this one](https://ebela.in/amp/joyi-aka-debadrita-basu-belongs-to-a-family-deeply-connected-to-theatre-dgtl- -1.733642) has a white space, and this one contains greek characters

I could encode the URI before calling Ampbench, but then some checks fail, like the canonical link.

Could ampbench deal with not encoded URL with UTF-8 characters? I it's feasible, I could take a look at it and submit a PR

AMP Bench warns about missing ETag and informs about presence of ETag

https://ampbench.appspot.com/validate?url=http%3A%2F%2Fwww.ampproject.org%2F

[WARNING] Site does not support either "If-Modified-Since" or "ETag" headers: these make amp serving more efficient
[WARNING] Header entry for If-Modified-Since not found
[PASS] Found header entry for ETag"eac69a0d2ab2423341565dd518ec23d4"

Note that in the same message we first warn that ETag headers are not set. Then state that a an ETag header was found. Site does appear to support etags.

Error while starting AMP NPM package

[email protected] start C:\Users\amanpreet.oberoi\ampbench
node ampbench_main.js

module.js:442
throw err;
^

Error: Cannot find module './amp-story/linter'
at Function.Module._resolveFilename (module.js:440:15)
at Function.Module._load (module.js:388:25)
at Module.require (module.js:468:17)
at require (internal/module.js:20:19)
at Object. (C:\Users\amanpreet.oberoi\ampbench\ampbench_handlers.
js:23:16)
at Module._compile (module.js:541:32)
at Object.Module._extensions..js (module.js:550:10)
at Module.load (module.js:458:32)
at tryModuleLoad (module.js:417:12)
at Function.Module._load (module.js:409:3)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: node ampbench_main.js
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional log
ging output above.

npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\amanpreet.oberoi\AppData\Roaming\npm-cache_logs\2018-10-1
6T09_46_17_221Z-debug.log

C:\Users\amanpreet.oberoi\ampbench>

When an AMP or a canonical is being served with a noindex directive then flag it as a warning

Pages (AMPs and canonicals) can include directives such as the following:
<meta content="noindex" data-app="true" name="robots" property="robots"/>

These do NOT make them invalid but will lead to problems in AMP consumption systems such as search engines.

AMPBench should detect this and raise it as a warning ("Using noindex means that your AMPs will likely fail to be consumed by search engines").

Indexed in amp cache test pass status

When reviewing multiple versions of the same page delivered over http and https the "Indexed in Google
AMP Cache?" test seems confusing.

Actual result:

Expected result:

  • Unreachable url would not be in the cache.
  • Reachable pages would either 404 as they had not been indexed or 200 if they had (this seems to be the case at the moment).

N.B. This may be due to the semantics of the AMP cache rather than amp-bench => won't fix.

Indexed in Google checker fails when the AMP page is over HTTP

When the AMP page is over HTTP, like this one, "Indexed in Google" checker fails because it also tries to find the URL in the AMP cache using HTTP, which is incorrect.

Since URLs in AMP Cache are always HTTPs, this checker should check the following URL
https://amp.ewn.co.za/2018/05/31/mokholo-report-on-r2-4bn-sa-express-fuel-deal-will-determine-if-case-is-opened

instead of this one

http://amp.ewn.co.za/2018/05/31/mokholo-report-on-r2-4bn-sa-express-fuel-deal-will-determine-if-case-is-opened

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.