iaincollins / structured-data-testing-tool Goto Github PK

View Code? Open in Web Editor NEW

63.0 7.0 14.0 1.7 MB

A library and command line tool to help inspect and test for Structured Data.

Home Page: https://www.npmjs.com/package/structured-data-testing-tool

License: ISC License

JavaScript 93.25% HTML 6.75%

structured-data json-ld schema-org

structured-data-testing-tool's Introduction

Structured Data Testing Tool

Inspect and test web pages for Structured Data.

Includes both a Command Line Interface for easy ad-hoc testing of URLs and library with extendable API for use when writing tests or building other tools.

Install

To install the command line tool (sdtt), include the -g (global) flag when installing:

npm i structured-data-testing-tool -g

Features

Command Line Interface (sdtt) and API that can be used with any test framework.
Accepts a URL, file or string, buffer or stream containing HTML or JSON.
Automatically detects all Schema.org schemas, in HTML (microdata), JSON-LD and RDFa.
Can test <meta> tags (and custom schemas) for specific tags / fields / values.
Built-in presets for testing for Twitter, Facebook and Google structured data.
Support creation of custom presets to test any schema or tests specific to your site.
Use with a headless browser to test Structured Data injected by client side JavaScript (e.g. Google Tag Manager).

Usage

Command Line Interface

Usage: sdtt --url <url> [--presets <presets>] [--schemas <schemas]

Options:
  -u, --url      Inspect a URL
  -f, --file     Inspect a file
  -p, --presets  Test for specific markup from a list of presets
  -s, --schemas  Test for a specific schema from a list of schemas
  -i, --info     Show more detailed information about structured data found
  -o, --output   Output test results to a file
  -h, --help     Show help
  -v, --version  Show version number

Usage: sdtt --url <url> [--presets <presets>] [--schemas <schemas]

Examples:
  sdtt --url "https://example.com/article"                Inspect a URL
  sdtt --url <url> --presets SocialMedia                  Test a URL for social media metatags
  sdtt --url <url> --presets Google                       Test a URL for markup inspected by Google
  sdtt --url <url> --presets "Twitter,Facebook"           Test a URL with multiple presets
  sdtt --url <url> -p Twitter -p Facebook                 Test a URL with multiple presets (alternative)
  sdtt --url <url> --schemas Article                      Test a URL for the Article schema
  sdtt --url <url> --schemas "jsonld:Article"             Test a URL for the Article schema in JSON-LD
  sdtt --url <url> --schemas "microdata:Article"          Test a URL for the Article schema in microdata/HTML
  sdtt --url <url> --schemas "rdfa:Article"               Test a URL for the Article schema in RDFa
  sdtt --url <url> --schemas "Article,WPHeader,WPFooter"  Test a URL for multiple schemas
  sdtt --url <url> -s Article -s WPHeader -s WPFooter     Test a URL for multiple schemas (alternative)
  sdtt --url <url> --output results.json                  Output test results to a JSON file
  sdtt --file <path-to-file>.html                         Test file containing HTML
  sdtt --file <path-to-file>.json                         Test file containing JSON-LD
  sdtt --presets                                          List all built-in presets
  sdtt --schemas                                          List all supported schemas

Inspect a URL to see what markup is found:

sdtt --url <url>

Inspect a file to see what markup is found:

sdtt --file <path to file>

Test a URL contains specific markup:

sdtt --url <url> --presets "Twitter,Facebook"

Test a URL contains specific schema:

sdtt --url <url> --schemas "Article"

Test a URL contains specific schema in both JSON-LD and in microdata/HTML:

sdtt --url <url> --schemas "jsonld:Article,microdata:Article"

Run sdtt --presets to list the built-in-presets:

NAME                      DESCRIPTION
Google                    Check for common markup used by Google
Twitter                   Suggested metatags for Twitter
Facebook                  Suggested metatags for Facebook
SocialMedia               Suggested markup for integration with social media sites

Example output from CLI

$ sdtt --url https://www.bbc.co.uk/news/world-us-canada-49060410 --presets Google,SocialMedia
Tests

  Schema.org > ReportageNewsArticle - 100% (1 passed, 0 failed)
    ✓  schema in jsonld
    •  @context
    •  @type
    •  url
    •  publisher.@type
    •  publisher.name
    •  publisher.publishingPrinciples
    •  publisher.logo.@type
    •  publisher.logo.url
    •  datePublished
    •  dateModified
    •  headline
    •  image.@type
    •  image.width
    •  image.height
    •  image.url
    •  thumbnailUrl
    •  author.@type
    •  author.name
    •  author.logo.@type
    •  author.logo.url
    •  author.noBylinesPolicy
    •  mainEntityOfPage
    •  video.@list[0].@type
    •  video.@list[0].name
    •  video.@list[0].description
    •  video.@list[0].duration
    •  video.@list[0].thumbnailUrl
    •  video.@list[0].uploadDate
    •  video.@list[1].@type
    •  video.@list[1].name
    •  video.@list[1].description
    •  video.@list[1].duration
    •  video.@list[1].thumbnailUrl
    •  video.@list[1].uploadDate

  Google > ReportageNewsArticle > #0 (jsonld) - 100% (12 passed, 0 failed)
    ✓  ReportageNewsArticle
    ✓  @type
    ✓  author
    ✓  datePublished
    ✓  headline
    ✓  image
    ✓  publisher.@type
    ✓  publisher.name
    ✓  publisher.logo
    ✓  publisher.logo.url
    ✓  dateModified
    ✓  mainEntityOfPage

  Facebook - 100% (8 passed, 0 failed)
    ✓  must have page title
    ✓  must have page type
    ✓  must have url
    ✓  must have image url
    ✓  must have image alt text
    ✓  should have page description
    ✓  should have account username
    ✓  should have locale

  Twitter - 100% (7 passed, 0 failed)
    ✓  must have card type
    ✓  must have title
    ✓  must have description
    ✓  must have image url
    ✓  must have image alt text
    ✓  should have account username
    ✓  should have username of content creator

Statistics

  Number of Metatags: 38
  Schemas in JSON-LD: 1
     Schemas in HTML: 0
      Schema in RDFa: 0
  Schema.org schemas: ReportageNewsArticle
       Other schemas: 0
     Test groups run: 5
  Optional tests run: 71
 Pass/Fail tests run: 28

Results

    Passed: 28 	(100%)
  Warnings: 0 	(0%)
    Failed: 0 	(0%)

  ✓ 28 tests passed with 0 warnings.

Use the option '-i' to display additional detail.

API

How to test a URL

You can integrate Structured Data Testing Tool with a CD/CI pipeline by using the API.

const { structuredDataTest } = require('structured-data-testing-tool')
const { Google, Twitter, Facebook } = require('structured-data-testing-tool/presets')

const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
 
let result

structuredDataTest(url, { 
  // Check for compliance with Google, Twitter and Facebook recommendations
  presets: [ Google, Twitter, Facebook ],
  // Check the page includes a specific Schema (see https://schema.org/docs/full.html for a list)
  schemas: [ 'ReportageNewsArticle' ]
})
.then(res => {
  console.log('✅ All tests passed!')
  result = res
})
.catch(err => {
  if (err.type === 'VALIDATION_FAILED') {
    console.log('❌ Some tests failed.')
    result = err.res
  } else {
    console.log(err) // Handle other errors here (e.g. an error fetching a URL)
  }
})
.finally(() => {
  if (result) {
    console.log(
      `Passed: ${result.passed.length},`,
      `Failed: ${result.failed.length},`,
      `Warnings: ${result.warnings.length}`,
    )
    console.log(`Schemas found: ${result.schemas.join(',')}`)

    // Loop over validation errors
    if (result.failed.length > 0)
      console.log("⚠️  Errors:\n", result.failed.map(test => test))
  }
})

How to test a local HTML file

You can also test HTML in a file by passing it as a string, a stream or a readable buffer.

const html = fs.readFileSync('./example.html')
structuredDataTest(html)
.then(response => { /* … */ })
.catch(err => { /* … */ })

How to define your own tests

The built-in presets only cover some use cases and are only able to check if values are defined (not what they contain).

With the API you can use JMESPath query syntax to define your own tests to check for additional properties and specific values. You can mix and match tests with presets.

const testUrl = 'https://www.bbc.co.uk/news/world-us-canada-49060410'

const options = {
  tests: [
    // Check 'NewsArticle' schema exists in JSON-LD
    { test: 'NewsArticle', expect: true, type: 'jsonld' },

    // Check a 'NewsArticle' schema exists with 'url' property set to the value of the  variable 'url'
    { test: 'NewsArticle[*].url', expect: testUrl },

    // A similar check as above, but won't fail (only warn) if the test doesn't pass
    { test: 'NewsArticle[*].mainEntityOfPage', expect: testUrl, warning: true },

    // Test for a Twitter meta tag with specific value
    { test: '"twitter:domain"' expect: 'www.bbc.co.uk', type: 'metatag' }
  ]
}

structuredDataTest(testUrl, options)
.then(response => { /* … */ })
.catch(err => { /* … */ })

How to define your own presets

A preset is a collection of tests.

There are built-in presets you can use, you can list them with --presets option using the CLI. You can also easily define your own custom presets when using the API. The Command Line Interface only supports built-in presets.

Presets must have a name (which should ideally be unique, but does not have to be) and description and an array of test objects in tests. Both name and description be arbitrary strings, tests should be an array of valid test objects.

You can optionally group tests by specifying a value for group and set a default schema to use for all tests in schema. These can be arbitrary strings, though it's recommended schemas reflect Schema.org schema names.

If a test explicitly defines it's own group or schema, that will override the default value for the preset for that specific test (which may impact how results are grouped).

Presets can contain other presets using the presets property (an array).

Presets can have conditional property, which contains a test object, in which case the tests in the preset will only only be run if the conditional test passes.

Preset Example 1

const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'

// This test shows how you can use different types of tests in one preset.
const MyCustomPreset = {
  name: 'My Custom Preset', // Required
  description: 'Test ReportageNewsArticle JSON-LD data is defined and twitter metadata was found', // Required
  tests: [ // Required (unless 'presets' is specified)
    { test: 'ReportageNewsArticle', type: 'jsonld' },
    { test: '"twitter:card"', type: 'metatag' },
    { test: '"twitter:domain"', expect: 'www.bbc.co.uk', type: 'metatag', }
  ],
  // Additional options you can use in a preset:
  // group: 'My Group Name', // A group name can be used to group tests in a preset (defaults to preset name)
  // schema: 'NewsArticle', // Specify a schema at the top level if all the tests in the preset apply to the same schema
  // presets: [] // A preset can also invoke other presets, making it easy to re-use custom tests
  // conditional: {} // Define a conditional `test`, which is evaluated to determine if the preset should run
}

const options = {
  // The 'presets' argument should be an array of preset objects
  presets: [ MyCustomPreset ],
  // If you just want to detect a schema exists, you can populate the
  // the 'schemas' option with a list of schema names (as strings).
  schemas: [ 'ReportageNewsArticle' ],
  // By default, any structured data detected will automatically be tested.
  // Set 'auto' to 'false' if you want to disable this (defaults to 'true').
  // This may mean you miss some errors, but make make debugging easier.
  auto: false
}

structuredDataTest(url, options)
.then(response => { /* … */ })
.catch(err => { /* … */ })

Preset Example 2

This is the code for one of the built-in presets, it tests for the ClaimReview schema.

It shows how to write a preset that will automatically run against all instances of a given schema found.

This is useful to be able to do when you have multiple instances of the same schema on page.

NB: This example is quite simple and doesn't try and validate the contents of the properties in the schema or check for invalid properties on the schema.

const ClaimReview = {
  name: 'ClaimReview',
  description: 'A fact-checking review of claims made (or reported) in some creative work (referenced via itemReviewed).',
  // If you add 'schema' property to a preset **and** write tests that start with a selector like `ClaimReview[*]`
  // (i.e. with the schema name followed by an asterisk in the selector) then those tests will automatically
  // be run against every instance of that schema found, so you can easily find where an error is if there are
  // multiple instances of the same schema on a page.
  schema: 'ClaimReview',
  // A 'conditional' on a preset or test is just a normal test object. If it fails to pass, the tests in the
  // preset (or the individual test, if it is used on a test) will not be run.
  conditional: {
    test: 'ClaimReview'
  },
  tests: [
    // Expected by Google
    { test: `ClaimReview` },
    { test: `ClaimReview[*]."@type"`, expect: 'ClaimReview' },
    { test: `ClaimReview[*].url` },
    { test: `ClaimReview[*].reviewRating` },
    { test: `ClaimReview[*].claimReviewed` },
    // Warnings
    { test: `ClaimReview[*].author`, warning: true },
    { test: `ClaimReview[*].datePublished`, warning: true },
    { test: `ClaimReview[*].itemReviewed`, warning: true },
  ],
}

module.exports = {
  ClaimReview
}

Test options

test

Type: string
Required: true

The value for test should be a valid JMESPath query.

Examples of JMESPath queries:

Article Test Article schema found.

Article[*].url Test url property of any Article schema found.

Article[0].headline Test headline property of first Article schema found.

Article[1].headline Test headline property of second Article schema found.

Article[*].publisher.name Test name value of publisher on any Article schema found.

Article[*].publisher."@type" Test @type value of publisher on any Article schema found.

"twitter:image" || "twitter:image:src" Check for a metatag named either twitter:image -or- twitter:image:src

Tips:

Use double quotes to escape special characters in property names.
You can console.log() the structuredData property of the response object from structuredDataTest() to see what sort of meta tags and structured data was found to help with writing your own tests.

type

Type: string ('json'|'rdfa'|'microdata'|'any')
Required: false
Default: 'any'

You can specify a type to indicate if markup should be in jsonld, rdfa or microdata (HTML) format.

You can also specify a value of metatag to check <meta> tags.

If you do not specify a type for a test, a default of any will be assumed and all types will be checked (and if any source matches, the test will pass).

If you specifically want to test for a value and you know if it is JSON-LD, RDFa or microdata you should specify the explicit type for the test to check.

expect

Type: boolean|string|RexExp
Required: false
Default: true

You can specify a value for expect that is either a boolean, a string or a Regular Expression object (defaults to true).

A value of true indicates the property must exist (but does not check it's value).
A value of false that indicates the value must not exist.
A Regular Expression is evaluated against the test query (the test passes if a test for expression passes).
Any other value is treated as a string and the value of the property should exactly match it.

When using a Regular Expression if the query points to an array then the test will pass if any item in the array matches the Regular Expression.

Examples of how to use Regular Expressions with the expect option:

expect: /^[0-9]+$/g // Value being tested should only contain numbers
expect: /^[A-z]+$/g // Value being tested should only contain letters
expect: /^[A-z0-9 ]+$/g // Value should only contain letters, numbers and spaces

You can use regular expressions to validate dates, specific values, URLs, etc.

warning

Type: boolean
Required: false
Default: false

When warning is set to true, if the test does not pass it will only result in a warning.

The default is false, meaning if the test fails it will be counted as a failure.

optional

Type: boolean
Required: false
Default: false

When the optional property is set to true on a test, a test will not count as either passed or failed, but the test will still be run and the result able to be inspected.

Optional tests do not count towards the total number of tests run, test passed or tests failed. They will show up in results in the Command Line Interface if they pass, but not if they fail; however passing optional tests appear differently to other tests in the results to make it clear they are optional checks.

You can use --info/-i on the CLI or inspect the optional property on the response from the API to see the result of any test that has optional property set on it. However, if an optional test fails because the property it was testing does not exist, it will not be displayed in the CLI. If a property is optional but recommended, use the warning option instead.

Note: Strictly speaking, in principle no specific properties on Schema.org objects are "required" but in practice implementations by vendors like Google have some "required" or expected properties and also respect some "optional" properties; this option is useful for writing tests that don't fail if a valid, but not necessarily required, property is not found.

The default is false.

conditional

Type: object
Required: false
Default: undefined

A conditional object can contain a conditional test to be run, to determine if the test itself should be run.

If the conditional test fails, the test will not be run (and it will not be included in the test results). If the conditional test passes, the test will be run as it otherwise would be if the condition wasn't specified.

This is considered advanced usage, to help avoid having to write overly complex test statements. Conditional test objects use the same syntax as regular test objects, but conditional tests are not included in the results.

It is particularly useful for checking if it is appropriate to run a group of tests. For example, it is used by internal presets to check if a schema exists; if it does then all the tests for that schema are run (and required tests must pass), but if a schema does not exist then none of the tests for that schema are run.

group

Type: string
Required: false
Default: undefined

You can pass a string for the group value to indicate how tests should be grouped when displaying results. You do not need to specify a group if tests are in a preset, by default the preset name will be used.

groups

Type: array of strings
Required: false
Default: undefined

You can pass an array of strings to be used to group tests. This used internally to group tests by the structured data testing tool and is considered advanced usage for edge case situations like creating tests dynamically.

schema

Type: string
Required: false
Default: undefined

You can pass a schema value that indicates what schema a test is for. Tests in different presets can test the same schema, tests in the same preset can also test multiple schemas.

This is intended as an option to control how tests are grouped when displaying results, the value is not checked for validity and is considered advanced usage for edge case situations.

Testing with client side rendering

If a page uses JavaScript with client side rendering to generate Structured Data, you can use a tool like Puppeteer (a headless Chrome API) to fetch the HTML and allow any client side JavaScript to run and then test the rendered page with the Structured Data Testing Tool.

This can be used to test pages that rely on client side injection with tools like Google Tag Manager to add Structured Data to pages.

Notes:

Puppeteer is a large package (~272 MB) and must be installed separately.
You can only use Puppeteer with the API, not the Command Line Interface.

Example of how to use puppeteer with structured-data-testing-tool to write a test that relies on client side JavaScript:

const { structuredDataTest } = require('structured-data-testing-tool')
const puppeteer = require('puppeteer');

(async () => {
  const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
  
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle2' });
  const html = await page.evaluate(() => document.body.innerHTML);
  await browser.close();
  
  await structuredDataTest(html)
  .then(response => { console.log("All tests passed.") })
  .catch(err => { console.log("Some tests failed.") })
})();

Contributing

Contributions are welcome - especially additions and improvements to the built-in presets.

This can include bug reports, feature requests, ideas, pull requests, examples of how you have used this tool (etc).

Please see the Code of Conduct and complete the issue and/or Pull Request templates when reporting bugs, requesting enhancements or contributing code.

Feedback and insight on how you use Structured Data Testing Tool is also very helpful.

structured-data-testing-tool's People

Contributors

Stargazers

Watchers

Forkers

pdehaan damianoilacqua themesexpert1986 anntao noodles rodrisan dolukhanov kylekirkby thejimbirch itninja04 osvaldogdelrio akosbalasko ignaciocarre lilo

structured-data-testing-tool's Issues

Add options to test object to evaluate if a test or preset should run

How could the tool be better? Please describe.
Individual tests and presets (collections of tests) should be able to specify a conditional test that must match the expected result for the test to run.

The purpose of this is to be able to conditionally run a test or a preset, so that a preset only applies if the response matches a precondition; for example if the preset "Google" is specified, it should only validate schemas found in the input.

Describe the solution you'd like
This could be something like another test object (the same syntax for "test" and "expect", with the same default assumed value for "expect" of "true") although supporting this may require refactoring of the existing method that invokes tests.

{
  test: 'test-to-run',
  expect: 'value-to-expect',
  conditional: {
    test: 'evaluate-before-running-test',
    expect: 'value-to-expect-conditional-test-to-return'
  }
}

Additional context

To check a schema is present, this will still be possible by specifying it with --schemas.
Not all tests will need a conditional (e.g. "Twitter" and "Facebook" presets expect specific values).
Conditionals on presets are probably more useful in presets that are contained within other presets.

Error: Cannot find module './presets'

Im failing with

Error: Cannot find module './presets'

if I try this code in the documentation.

const { structuredDataTest } = require('structured-data-testing-tool')
const { ReportageNewsArticle, Twitter, Facebook } = require('./presets')
 
const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
 
structuredDataTest(url, { presets: [ ReportageNewsArticle, Twitter, Facebook ] })
.then(res => {
  // If you end up here, then there were no errors
  console.log("All tests passed.")
  console.log('Passed:',res.passed.length)
  console.log('Failed:',res.failed.length)
  console.log('Warnings:',res.warnings.length)
})
.catch(err => {
  // If any test fails, the promise is rejected
  if (err.type === 'VALIDATION_FAILED') {
    console.log("Some tests failed.")
    console.log('Passed:',err.res.passed.length)
    console.log('Failed:',err.res.failed.length)
    console.log('Warnings:',err.res.warnings.length)  
    // Loop over validation errors
    err.res.failed.forEach(test => {
      console.error(test)
    })
  } else {
    // Handle other errors here (e.g. an error fetching a URL)
    console.log(err)
  }
})

Error "The property _____ was not found" is received if expect value different from actual

Describe the bug
Incorrect error is displayed

To Reproduce
After run of

const { structuredDataTest } = require('structured-data-testing-tool')
const url = 'https://www.bbc.co.uk/news/world-us-canada-49060410'
const testUrl = 'https://www.bbc.co.uk/news/world-us-canada-49060410_EXTRA_TEXT'

const CustomPreset = {
    name: 'Non-premium About Page',
    description: 'Test all the structured data is correct for Non-premium about tab',
    tests: [
        // Check 'NewsArticle' schema exists in JSON-LD
        { test: 'ReportageNewsArticle', expect: true, type: 'jsonld' },

        // Check a 'NewsArticle' schema exists with 'url' property set to the value of the  variable 'url'
        { test: 'ReportageNewsArticle[*].url', expect: testUrl },

        // A similar check as above, but won't fail (only warn) if the test doesn't pass
        { test: 'ReportageNewsArticle[*].mainEntityOfPage', expect: testUrl, warning: true },

        // Test for a Twitter meta tag with specific value
        { test: '"twitter:domain"', expect: 'www.bbc.co.uk', type: 'metatag' }
      ]
};

let result

structuredDataTest(url, {
  // Check for compliance with Google, Twitter and Facebook recommendations
  presets: [ CustomPreset ],
  // Check the page includes a specific Schema (see https://schema.org/docs/full.html for a list)
})
.then(res => {
  console.log('✅ All tests passed!')
  result = res
})
.catch(err => {
  if (err.type === 'VALIDATION_FAILED') {
    console.log('❌ Some tests failed.')
    result = err.res
  } else {
    console.log(err) // Handle other errors here (e.g. an error fetching a URL)
  }
})
.finally(() => {
  if (result) {
    console.log(
      `Passed: ${result.passed.length},`,
      `Failed: ${result.failed.length},`,
      `Warnings: ${result.warnings.length}`,
    )
    console.log(`Schemas found: ${result.schemas.join(',')}`)

    // Loop over validation errors
    if (result.failed.length > 0)
      console.log("⚠️  Errors:\n", result.failed.map(test => test))
  }
})

I receive

❌ Some tests failed.
Passed: 3, Failed: 1, Warnings: 1
Schemas found: ReportageNewsArticle
⚠️  Errors:
 [ { test: 'ReportageNewsArticle[*].url',
    expect:
     'https://www.bbc.co.uk/news/world-us-canada-49060410_EXTRA_TEXT',
    group: 'Non-premium About Page',
    passed: false,
    type: 'any',
    value: null,
    error:
     { type: 'NOT_FOUND',
       message: 'The property "ReportageNewsArticle[*].url" was not found' } } ]

Include either CLI command or example code which can be used to reproduce the behavior.

Expected behavior
Should it be ?
type: 'INCORRECT_VALUE',
message: Incorrect value found for "${path}",

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

vulnerabilities with "css-what" module

For version ^4.5.0 I have vulnerabilities with css-what

Rewrite and collaboration

@iaincollins A week ago I searched (again) for resources to validate html markup and meta data. As always, the most found answer was to use Google's structured data testing tool, which might be a workaround, but it is a bad answer. So I searched through npm packages to find existing tools or good approaches and I found your tool.

Well, I already mentioned in #13, that I'm not the biggest fan of your coding style. I apologise, if I insulted you with "Your code is a mess.". After looking at a dictionary, I discovered some more meanings than "not well structured and really hard to read"...

What I've done

I rewrote your tool and the web-auto-extractor into an object oriented style. It is readable, modular and easy to extend. It has two entry points for browser and for cli usage and I covered most of your existing features. It needs some cleanup here and there, the schema handling isn't fully implemented and it only works with urls for the moment. It shouldn't be much work to add the missing features.

I published it today as an alpha release: https://github.com/raffaelj/seo-meta-validator
See README.md and code comments for more details or ask me anything, you want to know about it.

What I want

My main goal is to validate websites on localhost before I publish them and I need a browser based ui. I also think about splitting the project into multiple packages/repos - e. g. core, cli tool and browser based test suite.

What do you think about a complete rewrite?

It doesn't have to match my style, but I really need some structure to be able to contribute.

Are you interested in a collaboration?

I'm very interested in your knowledge about data validation and I'm not familiar with publishing npm packages. I also like your style of writing detailed issues and documentation.

Differentiate between Presets and Schemas?

Currently the preset option in the API (-p or --preset in the CLI) is used for both Schema.org schemas (e.g. Article, NewsArticle) and for test suites that check for meta tags for sites like Twitter and Facebook.

It is worth considering at this early stage if it is worth splitting up these options and having a list of schemas specifically for JSON-LD/RDFa/Microdata and a list of presets that can do wider tasks, such as check meta tags and possibly inspect HTML markup too.

If presets should be renamed something else - e.g. "test suites" - or if a term like "test suites" should apply to any group of tests is also open for discussion.

It would likely be easier to refactor this now, before version 1.0, rather than later.

Handle when an itemProp contains multiple properties

Neither the API or CLI find properties in an itemProp when it contains multiple properties.

In these example from a NYT article the datePublished and publisher properties are not found as they are combined with other properties:

<meta data-rh="true" property="article:published" itemprop="datePublished dateCreated" content="2019-07-21T09:00:06.000Z"/>

<span itemProp="publisher copyrightHolder provider sourceOrganization" itemscope="" itemType="http://schema.org/NewsMediaOrganization" itemID="https://www.nytimes.com">

See also this example of an image property from this Guardian article which is also not detected:

<figure itemprop="associatedMedia image" itemscope itemtype="http://schema.org/ImageObject" data-component="image" class="element element-image img--landscape  fig--narrow-caption fig--has-shares " data-media-id="f82028d62b1edd7417d7d3773c4abf0d4fa86174" id="img-3">
  <meta itemprop="url" content="https://i.guim.co.uk/img/media/f82028d62b1edd7417d7d3773c4abf0d4fa86174/0_272_6435_3861/master/6435.jpg?width=700&amp;quality=85&amp;auto=format&amp;fit=max&amp;s=016df6a3f33eabe3cbca39eb389a60fb">
</figure>

This is an edge case usage scenario that passes in the Google Structured Data Testing Tool but does not pass in this tool. This bug is only known to happen when parsing HTML/microdata but could potentially be triggered by RDFa or JSON-LD markup.

Add `url` property to tests

Summary of proposed feature
Add url property to tests and presets that can be used to link to documentation, and display the the URL in the CLI when a test does not pass (ie on error or warning).

Purpose of proposed feature
Provide a way to reference online documentation to resolve issues, especially when a test fails.

Detail of the proposed feature

If defined on a preset, the url property for a preset should be displayed in the console under the preset name.
If defined on a test, the url property should be displayed under the name of the property if the test does not pass (ie on error or warning).

This property should link to pages that document a schema and/or can help resolve the specific problem. It would work well with the new schema.org property tests (which are in master but not in the release version).

Potential problems
None.

Describe any alternatives you've considered
None.

Additional context
URL properties for Schema objects could also be automatically generated.

New Schema.org tests should work with nested objects

Describe the bug
The new Schema.org property tests - that check if property names valid or not - should work with nested objects.

Note: Applies to the new Schema.org property tests are in master but not in the release version.

To Reproduce
Nested property names (ie any property that is not a top level property) are not currently checked for validity. See existing invalid properties fixture for an example of an error that is not caught (only some of the errors in the example are caught by the current tests).

Expected behaviour
Nested property names should be checked to see if they are valid and should raise an error if they are not valid on a nested property. This might prove tricky to implement as some properties can be one of many different types and that might complicate the logic.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

Not detecting schemas

I tried both CLI and programmatically to check and validate schemas in multiple WordPress sites but tests were failing.

Sites

https://growth.cx has 4 schemas when checking manually (WebSite,ImageObject,WebPage,BreadcrumbList)

sdtt --url https://growth.cx --schemas "WebSite,ImageObject,WebPage,BreadcrumbList"

https://crawlq.ai has 5 schemas when checking manually (Organization,WebSite,ImageObject,WebPage,BreadcrumbList)

sdtt --url https://crawlq.ai --schemas "Organization,WebSite,ImageObject,WebPage,BreadcrumbList"

The schemas were generated by the Yoast SEO plugin.

Any advice?

Add option to disable testing of automatically detected schemas

The current default behaviour is to perform tests for all known schemas found when inspecting a page.

This should remain the default behaviour, but it should be possible to override this default behaviour (in both via CLI and API) so that only have tests (and presets) explicitly requested to run be tested.

Schemas should still always be automatically detected but tests for them should not be automatically run if an option requesting they not be is passed.

There is no proposal for what to call this option yet.

Schema.org tests should only apply to Schema.org schemas

Describe the bug
The new Schema.org property tests (which are in master but not in the release version) should only apply to Schema.org schemas.

Currently the tool assumes schemas are Schema.org schemas, which is what is most common, but to be technically correct the tool should check the schema type.

Add instructions on how to contribute

Add some simple instructions on how to contribute, especially by adding tests for schemas.

These should be in a CONTRIBUTING.md file, which is referenced in README.md

Article Schema not detected & errors not reported

Describe the bug

When validating Google structured data SDTT is:

only recognising 2 Schema.org schemas instead of 4

  Schema.org > Article - 0% (0 passed, 1 failed)
    ✕  schema found [Article[*]]
        └─  NOT_FOUND

passing tests that fail when run in the Google Structured Data Testing Tool

✓ 2 tests passed with 0 warnings.

For this article, the Google tool shows 2 errors:

VideoObject > uploadDate -> A value for the uploadDate field is required.
Article > image > url -> A value for the url field is required.

To Reproduce

2 tests passed with 0 warnings:

sdtt -u "https://amp.sbs.com.au/v1/article/as-qantas-and-international-airlines-prepare-to-ramp-up-flights-how-safe-is-flying/36b7ce75-ece7-42cb-95a1-7f63a9848955?amp=1" -p Google

Schema.org > Article - 0% (0 passed, 1 failed):

sdtt -u "https://amp.sbs.com.au/v1/article/as-qantas-and-international-airlines-prepare-to-ramp-up-flights-how-safe-is-flying/36b7ce75-ece7-42cb-95a1-7f63a9848955?amp=1" --schemas Article -i

Expected behavior

✕ 2 of 4 tests failed with 0 warnings.

SDTT should report the following 2 errors:

VideoObject > uploadDate -> A value for the uploadDate field is required.
Article > image > url -> A value for the url field is required.

  Schema.org > Article - 100% (1 passed, 0 failed)
    ✓  schema in microdata

The Organization & Article schemas should also be recognised in addition to VideoObject & CreativeWork

Screenshots

Google output:

SDTT output:

Additional context

SDTT does report that some optional values are empty:

  2 optional structured data properties found had an empty value.
    ├─ Metatags > "uploadDate" 
    └─ VideoObject > VideoObject[0]."uploadDate"

Were these values optional in v7.0 but are now required in v8.0?

How to test nested Object

Describe the bug
Given we have this structured data:

{
   Product: [
        { name: "p1",  "offers": [ {"@type": "Offer", "name": "p1o1"}, {"@type": "Offer", "name": "p1o2"} ]  },
       { name: "p2",  "offers": [ {"@type": "Offer", "name": "p2o1"}, {"@type": "Offer", "name": "p2o2"} ]  },
   ]
}

Question:
How can I define the preset for testing Product[*].offers[*].name are exist for all

For example the schema: https://schema.org/offers

Add option to check for and test schemas by name

How could the tool be better? Please describe.
Add --schemas <schemas> option to test for specific schemas.

The purpose of this is to provide a way to validate schema markup to ensure it's correct according to the specifications, independent of considerations for specific implementations (e.g. irrespective of what Google considers 'valid'), so that it's more widely useful.

Describe the solution you'd like

This would work like the existing ---presets option, except it would only validate against the Schema.org spec (i.e. it would not consider what search engines consider 'required' fields or acceptable values for properties).

The command would fail if the schema specified was unknown.

A test for a specified schema would generate an error if:

The schema was not found in the input
If a property appeared to contain an invalid value (in violation of the spec at Schema.org)

A test for a specified schema would generate a warning if:

An object contained a property not listed as valid for the schema at Schema.org

Project Status

Hi Iain!

Very cool-looking tool. I have the CLI running locally, and the social media checks work great. I am having a bit of trouble getting the Schemas working, but that is probably me.

My question is what is the status of this project moving forward? It looks like there hasn't been any contribution in about a year. I ask because what a year it has been! And completely understand the nature of open source... I'm considering adding it to my team's CI process.

Thanks for your time and this great tool!

Jim

Support referencing schemas using @id

Summary of proposed feature
Allow @id to be used to reference one schema from another schema on the same page.

Purpose of proposed feature
This format exists to publishers to avoid repeating the same information.

It is valid syntax used by some publishers (although it is not especially common).

For a more detailed explanation, see:
https://webmasters.stackexchange.com/questions/98569/what-is-the-use-of-id-in-json-ld-syntax

Detail of the proposed feature

e.g. markup for a news article might look like:

{
  "@context": "http://schema.org",
  "@type": "NewsArticle",
  "publisher": {
    "@id": "https://www.nytimes.com/#publisher"
 }

And it might reference an organisation object on a page which looks like like this:

{
  "@context": "http://schema.org",
  "@type": "NewsMediaOrganization",
  "@id": "https://www.nytimes.com/#publisher",
  "name": "The New York Times",
  "logo": {
    "@context": "http://schema.org",
    "@type": "ImageObject",
    "url": "https://static01.nyt.com/images/misc/NYT_logo_rss_250x40.png",
    "height": 40,
    "width": 250
  },
}

For a real world, example, see:
https://www.nytimes.com/2020/04/14/us/coronavirus-updates-usa.html

Potential problems

This could be done by either making the test logic more sophisticated or (perhaps more easily?) by 'unflattening' the data structures after parsing the page, which would mean the test logic doesn't need to change, and that could be better as it would be much simpler logic and avoid special case handling for this feature in an already complicated section of code.

Describe any alternatives you've considered

None

Additional context

None

Support for validating schema properties

Summary of proposed feature

Schema properties should be checked for validity.

Purpose of proposed feature

Currently properties are only checked to see if they exist, and not if the value they contain is valid.

Detail of the proposed

The value of properties should be checked.

This may include primitive types (strings, numbers) as well as specific types (dates, URLs) and complex objects (including nested types).

Strings should be valid
Numbers should be valid
Dates should be valid
URLs should be valid
Objects should be valid

Potential problems

As per the outline for the milestone for version 5.0, doing this for all schemas is expected to involve extending the schema.org scraper and writing a parser to handle scraping and using meta-programming to create tests that apply validation rules to properties.

Initial versions may include simple handling for primitive types and easily checkable types, but supporting complex types and properties that can be one of many types will be more difficult and support for that will likely come later. There may be edge cases it is not practical to support.

Describe any alternatives you've considered

It would be nice to have a list of valid templates that are parsable (e.g. in JSON Schema format) but I have not been able to find a suitable library of these and it does not appear there is a list of them published by Schema.org.

Additional context

Is there value is creating JSON Schema profiles, as something other people could reuse?

This would require extra work to integrate schema validation into this tool, but that is something I am familiar with from other projects.

test fails for urls without tld (http://localhost)

Describe the bug
When the input url doesn't end with a tld, it is treated as html and all tests will fail.

To Reproduce

Windows, Xampp, Apache, test website is in /htdocs/mysite
run apache, website is available under http://localhost/mysite
data test is in /htdocs/datatest

cd /path/to/htdocs/datatest
npm init
npm install structured-data-testing-tool

Now I tried an example snippet from README.md

/datatest/index.js:

const { structuredDataTest } = require('structured-data-testing-tool');
const { Twitter, Facebook } = require('structured-data-testing-tool/presets');

const url = 'http://localhost/mysite';

structuredDataTest(url, { presets: [ Twitter, Facebook ] })
.then(res => {
  console.log("All tests passed.");
})
.catch(err => {
  if (err.type === 'VALIDATION_FAILED') {
    err.res.failed.forEach(test => {
      console.error(test);
    })
  } else {
    console.log(err);
  }
});

run node index.js --> all tests failed

If I use the html output of my site as input string, everything works like expected.

Solution 1

When I write my test, I know, that I always expect a url, so I can skip that check by calling structuredDataTestUrl directly.

const { structuredDataTestUrl } = require('structured-data-testing-tool');
const url = 'http://localhost/mysite';
structuredDataTestUrl (url, {/* ... */}).then(/* ... */).catch(/* ... */);

Solution 2

Add options to isURL check to ignore tld:

https://github.com/glitchdigital/structured-data-testing-tool/blob/master/index.js#L434

if (validator.isURL(input, { require_tld: false )) {/* ... */}

And thanks for your tool. Now, that I found my issue, I can fiddle and explore some more of it's features.

Testing tool skips invalid JSON should report as a failure

Describe the bug
sdtt does not report invalid JSON syntax errors as a failure and skips those JSON-LD blocks.

$ sdtt --url https://supplydrop.com
Error in jsonld parse - SyntaxError: Unexpected end of JSON input
Tests

  Schema.org > FAQPage - 100% (1 passed, 0 failed)
    ✓  schema in jsonld

Statistics

  Number of Metatags: 19
  Schemas in JSON-LD: 1
     Schemas in HTML: 0
      Schema in RDFa: 0
  Schema.org schemas: FAQPage
       Other schemas: 0
     Test groups run: 2
  Optional tests run: 44
 Pass/Fail tests run: 1

Results

    Passed: 1 	(100%)
  Warnings: 0 	(0%)
    Failed: 0 	(0%)

  ✓ 1 tests passed with 0 warnings.

The google validator reports that as an error:
https://search.google.com/structured-data/testing-tool/u/0/#url=https%3A%2F%2Fsupplydrop.com

Missing ',' or '}' in object declaration.

Google preset not running

Describe the bug
It seems that when using the Google preset option no google specific tests run

To Reproduce
sdtt --url https://www.techradar.com --presets "Google"

Expected behavior
Tests run under Google heading (as happens when using Facebook, Twitter )

Screenshots

Validating JSON blobs against schemas?

Hello,

Is there a way to use the API to validate a JSON/HTML file with the following data, or do I need to write a custom schema and rules?

{
  "@context": "https://schema.org/",
  "@type": "Product",
  "@ids": "https://www.blah.org/#blahorg",
  "urls": "https://www.blah.org/en-US/developer/",
  "logod": "https://www.example.com/example-logo.jpg",
  "images": [
    "https://example.com/photos/1x1/photo.jpg",
    "https://example.com/photos/4x3/photo.jpg",
    "https://example.com/photos/16x9/photo.jpg"
  ],
  "named": "Blah Org",
  "alternateName": "Blah",
  "brand": {
    "@type": "Brand",
    "@id": "https://www.blah.org/#brand",
    "name": "Firefox"
  },
  "sameAs": ["https://en.wikipedia.org/wiki/blah"],
  "offers": {
    "@type": "Offer",
    "url": "https://www.blah.org/developer/",
    "priceCurrency": "USD",
    "price": "0",
    "availability": "https://schema.org/InStock"
  }
}

Currently I'm using the following function and looping over a glob of .json files, but I'm getting zero warnings or failed rules:

async function lintFile(file, options = {}) {
  const txt = fs.readFileSync(file, "utf-8").toString();
  const html = `<script type="application/ld+json">${txt}</script>`;
  return structuredDataTestHtml(html, options);
}

Consolidate presets - allow presets to contain other presets

How could the tool be better? Please describe.
Presets should be able to contain other suites (with no limit on the number of levels down), so you can have a suite like "Google" which contains the existing presets and specifically raise warnings and errors that reflect how Google validates structured data.

The purpose of this is to separate testing general schema validity (e.g valid props and valid values) from checking for specific properties / values expected by specific implementations, such as by Google, Bing, etc

This will make it possible to evaluate pages for technical correctness, independently of checking for compatibility with specific implementations - only some (or none) of which may be relevant.

Describe the solution you'd like
The implementation may need to have something like adding support for a test to only run a test suite if a corresponding schema was actually found. For example, this could be done by adding a conditional property at a top level to presets, which contains a test (and optionally an expect) property.

As part of this work, existing "presets" (except Twitter and Facebook) should be consolidated under a test suite called "Google", with some sort of check evaluated to determine if a suite (e.g. "Article", "NewsArticle", etc) should run.

iaincollins / structured-data-testing-tool Goto Github PK

structured-data-testing-tool's Introduction

Structured Data Testing Tool

Install

Features

Usage

Command Line Interface

Example output from CLI

API

How to test a URL

How to test a local HTML file

How to define your own tests

How to define your own presets

Preset Example 1

Preset Example 2

Test options

test

type

expect

warning

optional

conditional

group

groups

schema

Testing with client side rendering

Contributing

structured-data-testing-tool's People

Contributors

Stargazers

Watchers

Forkers

structured-data-testing-tool's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs