GithubHelp home page GithubHelp logo

Comments (8)

andydavies avatar andydavies commented on May 27, 2024

Here are two cases I came across recently:

  1. Page level TTFB exceeds range of smallint (was a site in China tested from Dulles - but can't find the waterfall ATM)
  2. Response code for a request is -2 so fails to import due to requests.status being unsigned.

This test on Ilya's site is one example - http://www.webpagetest.org/result/140601_Q7_PTB/1/details/

Edit:

The second issue might be fixed by catchpoint/WebPageTest#256

from legacy.httparchive.org.

stevesouders avatar stevesouders commented on May 27, 2024

Here's an example of the current crawl's batch_report:

failed:        3014 (11%)
                  0 - submission failed
                  0 - WPT test failed
                735 - test result processing failed
               2279 - HAR import failed

More detail is written to batch.log. I removing unique info (pageid, url, time) and sorted by error type:

   1146 The first request () failed with status 12007.
    352 The first request () failed with status 12029.
    155 The first request () failed with status 404.
    100 The first request () failed with status 403.
     97 no first HTML URL found.
     97 AggregateStats failed. Purging pageid
     38 The first request () failed with status 12031.
     34 The first request () failed with status 503.
     25 The first request () failed with status 500.
     12 The first request () failed with status 12152.
     11 The first request () failed with status 504.
      9 The first request () failed with status 502.
      9 The first request () failed with status 400.
      8 The first request () failed with status 522.
      4 The first request () failed with status 401.
      3 The first request () failed with status 523.
      3 The first request () failed with status 520.
      2 The first request () failed with status 405.
      1 The first request () failed with status 521.
      1 The first request () failed with status 512.
      1 The first request () failed with status 505.
      1 The first request () failed with status 408.
      1 The first request () failed with status 12055.

The "first request () failed" errors occur midway during HAR parsing, so the HAR was successfully downloaded but there were no valid entries.

I found these 12xxx error code definitions in chromeExtensionUtils.js:

  'net::ERR_NAME_NOT_RESOLVED': 12007,
  'net::ERR_CONNECTION_ABORTED': 12030,
  'net::ERR_ADDRESS_UNREACHABLE': 12029,
  'net::ERR_CONNECTION_REFUSED': 12029,
  'net::ERR_CONNECTION_TIMED_OUT': 12029,
  'net::ERR_CONNECTION_RESET': 12031

What do 12055 and 12152 mean?

I increased the "retry" value to "3" (so it makes 2 more passes through the failed URLs). The failure rate drops from ~11% to ~5% (need to verify this once the current crawl is done). It would be interesting to track the URLs that failed once or twice and then worked. That might help indicate why they failed initially.

Then we should add some code to track URLs that always fail and remove them from the crawl.

from legacy.httparchive.org.

pmeenan avatar pmeenan commented on May 27, 2024

Here are the wininet error codes: http://msdn.microsoft.com/en-us/library/windows/desktop/aa385465%28v=vs.85%29.aspx

12055 - The SSL certificate contains errors
12152 - The server response could not be parsed.

We might also be able to do a pre-crawl step where we take the URL list and run it through a script that uses CURL with an IE user agent on the base page to weed out broken domains/pages.

In particular we can use it to figure out if www. works or if we should just use the bare domain.

That won't necessarily help if there is a transient server or network issue but it would help reduce the amount of time spent testing invalid pages.

For the cases where no first URL is found it would be nice to know if the HAR file itself failed to generate (maybe retry in case there was a WPT server issue) or something else.

from legacy.httparchive.org.

pmeenan avatar pmeenan commented on May 27, 2024

Is the batch log somewhere I can download it? I'd like to see what the errors were that were fixed with a re-run (or is that already included?). If page X was re-submitted twice and failed all 3 times are there 3 entries in the counts above or just the last failure?

from legacy.httparchive.org.

stevesouders avatar stevesouders commented on May 27, 2024

The batch log is not downloadable. A failed page is only counted once, but the errors will show up 3 times in the log.

from legacy.httparchive.org.

stevesouders avatar stevesouders commented on May 27, 2024

Some notes: The "urls" table has some helpful columns like "optout" and "urlFixed". These could be used to address this bug. For example,

  • If the Alexa zip file contains a domain that no longer exists, you could set optout=true in the urls table and it will no longer be tested
  • if our HA code is INcorrectly converted a domain to http://www.foo.com and that produces an error but http://foo.com works, then you could set urlFixed to http://foo.com and I believe the crawl code already prefers urlFixed and then falls back to urlOrig

from legacy.httparchive.org.

Themanwithoutaplan avatar Themanwithoutaplan commented on May 27, 2024

Not sure if this is directly related but I picked up really a couple of minor errors with the September 1st run (both mobile and desktop). A couple of sites have NULL numDomains which is a constraint violation when I import them into Postgres and I think this constraint is correct, hence these are errors.

DETAIL: Failing row contains (3448296, 1473341436, All, Sep 1 2016, , 0, http://www.mc361.com/, 1473336845, 12088, 11430, 36559, null, null, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null, 2016-09-01, 4576, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2226, 0, 0, 0, 0, 0, 0, 0, 38050, null, 14523, 40100, 221861, 7247, null, null, 454, null, ).
CONTEXT: COPY urls, line 3868: "3448296 1473341436      All     Sep 1 2016      454         0       http://www.mc361.com/   \N      1473336845      4576    12088   11430   36559   ..."
2016-09-12T18:05:11.142000+02:00 ERROR Database error 23502: null value in column "numDomains" violates not-null constraint
DETAIL: Failing row contains (3448299, 1473341436, All, Sep 1 2016, , 0, http://www.kurogal.com/, 1473337075, 12192, 19818, 25634, null, null, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null, 2016-09-01, 8366, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1305, 0, 0, 0, 0, 0, 0, 0, 27839, null, 20503, 26100, 361553, 19184, null, null, 454, null, width=device-width, initial-scale=1.0, maximum-scale=1.0).
CONTEXT: COPY urls, line 3: "3448299    1473341436      All     Sep 1 2016      454         0       http://www.kurogal.com/ \N      1473337075      8366    12192   19818   2563..."
2016-09-12T18:05:11.548000+02:00 ERROR Database error 23502: null value in column "numDomains" violates not-null constraint
DETAIL: Failing row contains (3448300, 1473341437, All, Sep 1 2016, , 0, http://www.digitalsummit.com/, 1473336869, 7662, 4288, 25830, null, null, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, null, 2016-09-01, 2059, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 488, 0, 0, 0, 0, 0, 0, 0, 33855, null, 8677, 11400, 1289606, 81990, null, null, 454, null, width=device-width, initial-scale=1, maximum-scale=1).
CONTEXT: COPY urls, line 1: "3448300    1473341437      All     Sep 1 2016      454         0       http://www.digitalsummit.com/   \N      1473336869      2059    7662    4288    ..."

from legacy.httparchive.org.

rviscomi avatar rviscomi commented on May 27, 2024

Closing as obsolete. Feel free to reopen if there is still interest in this.

FYI the error rate is lower now that we are using the CrUX corpus.

from legacy.httparchive.org.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.