GithubHelp home page GithubHelp logo

httparchive / bigquery Goto Github PK

View Code? Open in Web Editor NEW
65.0 65.0 19.0 2.14 MB

BigQuery import and processing pipelines

Ruby 0.12% Shell 1.15% Java 1.09% Jupyter Notebook 96.98% Python 0.63% JavaScript 0.04%
bigquery

bigquery's Introduction

The HTTP Archive tracks how the Web is built

!! Important: This repository is deprecated. Please see HTTPArchive/httparchive.org for the latest development !!

This repo contains the source code powering the HTTP Archive data collection.

What is the HTTP Archive?

Successful societies and institutions recognize the need to record their history - this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In 1996 Brewster Kahle realized the cultural significance of the Internet and the need to record its history. As a result he founded the Internet Archive which collects and permanently stores the Web's digitized content.

In addition to the content of web pages, it's important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.

bigquery's People

Contributors

dependabot[bot] avatar igrigorik avatar malchata avatar rreverser avatar rviscomi avatar tomayac avatar tunetheweb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigquery's Issues

Automate CDN HTTPS cert renewal

HTTPArchive/httparchive.org#14 documents the steps needed to renew the cert.

The certificate currently has a 3 month expiration. The next expiration date is April 9, 2018. Ensure that it is renewed before then and ideally automate the process.

I'm assigning this issue to the bigquery repo because I think it makes the most sense to have a cron job on the GCE instance doing the automation, as opposed to the GAE web server itself.

blink_features.usage has null rank column

Since we have this column can we populate it with the new CrUX ranking? It's confusing not to have it in here, makes joins more difficult, and means you need an extra join to summary_pages table to get ranking.

@rviscomi / @pmeenan not sure what populates this table and so where this change would need to be made?

Duplicates in BigQuery

Hi folk,

There are too many duplicates in technology tables. For example in many rows WordPress is as CMS and BLOG stored for same URL. There are also some JavaScript Libraries which are as JavaScript Framework and Library stored. In these duplicates, just the category name is different. I think these duplicates should be removed in next releases (at least) .

Generate new tables for JS library results

We had a scratch space table created for ad hoc analysis of the JS library results, but we need something more permanent.

Similar to the Lighthouse tables, generate a new table for JS libraries.

Make it easier to automatically generate CrUX reports for YYYYMM-1 dates

Example: in early June the May CrUX dataset is available under the YYYYMM format of 201805. When we generate the reports in June the YYYYMM value is 201806, whose corresponding dataset is not yet available. So we need a better way to generate CrUX reports for YYYYMM-1 automatically.

Maybe after the last sync har/csv completes, we run generate_reports.sh -h YYYMM as well as something like generate_report.sh -d YYYYMM-1/crux[fp,fcp,dcl,ol].json to generate each crux report individually for the previous month.

Monthly batch jobs hosted under older user directory

The monthly batch jobs are still hosted in /home/igrigorik/code and the cron runs under that user (though connecting to a different BigQuery user).

Additional this is currently not authenticated to GCP:

Looks like @rviscomi it was using your account last but I obviously can't re-authenticate that:

igrigorik@worker:~/code$ bq show "httparchive:pages.2021_12_01_desktop"
ERROR: (bq) Your current active account [[email protected]] does not have any valid credentials

We need to fix the BigQuery authentication issue before the January run finishes in the next month or it won't process the pipeline nor ruin the reports as they are under this cron.

Longer term we should probably also moved these out of the /home/igrigorik/code directory and cron to a generic account on the server (create a httparchive user?) ideally with an equivalent BigQuery account it can use that won't expire.

Clean up errors

I hate regular errors in log files. Makes it too easy too miss real errors and difficult for new people to support something as they don't know if they are expected errors or something has gone wrong.

Currently a number of SQL queries cannot run, including:

  • CrUX histograms are based on CrUX data not HTTP Archive crawl so they fail as data usually missing - see HTTPArchive/httparchive.org#306
  • Some of the SQL queries do not work for lenses - for example Blink Usage queries for Capabilities report do not have URL level data to apply lens's.

There's a few things we could do here:

  • Remove reports like CrUX to a separate folder so they can be run separately
  • Fix queries so they do run (e.g. CruUX could be changed to look at previous month's data, Blink Usage reports could add dummy URL column so at least the query doesn't fail, even if it doesn't return data)
  • Add exclusion functionality so certain reports do not run for certain dates or lens's

Any thoughts?

Deduplicate generate_report.sh and generate_reports.sh

The only difference between the scripts is that _reports iterates through the histograms/timeseries directories and queries each SQL file.

Rewrite _reports so it simply calls _report for each metric, passing through all of the flags. This way all of the query/lens/storage logic is in one script.

Investigate why EOM report generation runs multiple days in a row

Each big blue spike is the BigQuery analysis cost, which coincides with the end of the month when we generate reports for httparchive.org.

image

The most troubling thing to me is not so much the height of the bars (a lot of money) but that they seem to be repeating unnecessarily over consecutive days. Report generation should happen once after the data is ready and subsequent cron jobs should see that it's already been done and stop. That doesn't seem to be happening.

Reports have not generated for January 2022

So the January reports have not run. This happens every so often and ran it manually. but it's bugged me, and think I've finally figured it out.

We run the following in the cron:

$ crontab -l
0 15 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh `date +\%b_1_\%Y`'  >> /var/log/HAimport.log 2>&1
0  8 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_csv.sh mobile_date +\%b_1_\%Y`'  >> /var/log/HAimport.log 2>&1
0 10 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh chrome' >> /var/log/HA-import-har-chrome.log 2>&1
0 11 * * * /bin/bash -l -c 'cd /home/igrigorik/code && ./sync_har.sh android' >> /var/log/HA-import-har-android.log 2>&1

The CSV jobs generate the summary tables and then attempt to run the reports if all the other data is there.
The HAR jobs generate the non-summary tables and then attempts to run the reports if all the other data is there.

So the last job to upload the data should run the reports, because at that point all 4 sets of tables are there.
The other 3 jobs only do the imports and fail on the report generation as not all the tables are there.

Running this shows the completion date of each upload:

	bq show "httparchive:summary_pages.${YYYY_MM_DD}_desktop" | head -5
	bq show "httparchive:summary_pages.${YYYY_MM_DD}_mobile" | head -5
	bq show "httparchive:pages.${YYYY_MM_DD}_desktop" | head -5
	bq show "httparchive:pages.${YYYY_MM_DD}_mobile" | head -5

Which is summarised below

dataset data
httparchive:summary_pages.2022_01_01_desktop 19 Jan 01:04:59
httparchive:summary_pages.2022_01_01_mobile 25 Jan 22:16:00
httparchive:pages.2022_01_01_desktop 24 Jan 16:54:34
httparchive:pages.2022_01_01_mobile 25 Jan 07:16:24

So the last job to complete is the summary pages for mobile. So it should have kicked off the reports.

However the logs show this:

Attempting to generate reports...
The BigQuery tables for 2022_01_01_mobile are not available.

This is because the date passed to the sql/generate_reports.sh script is 2022_01_01_mobile instead of 2022_01_01. This is due to a bug in the sync_csv.sh script that sets this to the _date_client (for other reasons in the script).

The net effect is, if the mobile CSV/summary pages finishes last the reports are not generated automatically. If any of the other tables finish last, then they are automatically generated.

Will submit a fix for this, and rerun the reports.

Hopefully this. whole hacky script will be rewritten soon but this is a simple fix for now.

Support manual backfilling with sync_har.sh of non-standard dates

The 12/1 test batch didn't actually run until 12/2, so things got a bit screwed up and 12/1 never appeared in BigQuery. Manually running the sync_har.sh script doesn't work because it only expects the standard [1, 15] dates.

  • support non-standard dates
  • support mapping the BigQuery table name to a standardized date

Context

Add field comparable to firstHtml to the har.request tables

The runs.request tables include a firstHtml field to indicate that the request is for the parent document.

Queries on the har.request tables must join on the corresponding runs table to get this info. There are tens of millions of requests in each table, so the join is expensive.

To simplify queries and make them less expensive, add a boolean field comparable to firstHtml to the har.request tables. It should share the same logic as the runs table; first 200 response with HTML mime type.

response_bodies.2018_12_01_mobile missing data

The 12/1 mobile table is much smaller and missing a lot of data compared to the previous crawl and the current desktop crawl.

2018_11_15_mobile: 35,441,289 rows, 1.41 TB
2018_12_01_mobile: 18,084,199 rows, 152 GB

2018_11_15_desktop: 45,975,086 rows, 2.00 TB
2018_12_01_desktop: 46,284,186 rows, 2.01 TB

I just reran the 12/1 mobile HAR dataflow pipeline and it produced identical results.

cc @jeffposnick

Truncate request_bodies at 10 MB

The row limit seems to be 10 MB, not 2 MB per this error message: Row size is larger than: 10485760

If that's the case, we can raise the ceiling on request (response?) bodies rows.

Mobile HAR pipeline failing

        [...]
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
	at com.httparchive.dataflow.BigQueryImport$DataExtractorFn.processElement(BigQueryImport.java:222)

https://github.com/HTTPArchive/bigquery/blob/master/dataflow/java/src/main/java/com/httparchive/dataflow/BigQueryImport.java#L222

This code processes LH results. I think in the latest LH release there were some changes to the JSON LHR, so I'll have to update this.

(PS: Yay for Stackdriver error notifications!)

`httparchive.urls.*` tables schema change

The schema of the httparchive.urls.* tables seems to have changed from…

I used to be able to quickly get historical ranks by querying httparchive.urls.* and extracting the date as the _TABLE_SUFFIX, but this is now no longer possible. Was this announced anywhere? If so, I missed it and also can't find it now.

Pages table have two entries for each URL

The httparchive:runs.2016_09_15_pages table seems to have two entries for each page:

SELECT rank, url FROM [httparchive:runs.2016_09_15_pages] 
WHERE rank > 0
ORDER BY rank ASC
LIMIT 10
>>
1   1   http://www.google.com/   
2   1   http://www.google.com/   
3   2   http://www.youtube.com/      
4   2   http://www.youtube.com/      
5   3   http://www.facebook.com/     
6   3   http://www.facebook.com/     
7   4   http://www.baidu.com/    
8   4   http://www.baidu.com/    
9   5   http://www.yahoo.com/    
10  5   http://www.yahoo.com/

SELECT count(rank) FROM [httparchive:runs.2016_09_15_pages] 
>> 980874
SELECT count(DISTINCT rank) FROM [httparchive:runs.2016_09_15_pages] 
>> 471366

If this is intentional, would be nice to note in http://httparchive.org/about.php#testchanges
PS: Apologies if this is not the right place to file the issue.

BigQuery import is paused due to a DataFlow SDK issue

@igrigorik said:

@RByers there was a regression in latest DataFlow SDK so I paused the automated imports... I can work around it though if its high priority.

We're beginning to rely on this data more and more in blink API owner discussions (eg. this particular came up in the context of this intent to remove. So having the data flowing regularly is definitely valuable. In many cases the Feb data is good enough though, so I can't really say it's that urgent (yet).

Apr_15_2018 dir on GCS doesn't exist

The CSV pipeline failed because gs://httparchive/Apr_15_2018/ doesn't exist. This is created at the end of the crawl which should have happened by now since May 1 just started. Still investigating.

cc @pmeenan

Incomplete page resources in httparchive.response_bodies.* tables

The site Wired registers a service worker located at https://www.wired.com/sw.js and references a web app manifest located at https://www.wired.com/manifest.json. While the service worker is available in several runs, the manifest is not:

SELECT
  *
FROM
  `httparchive.response_bodies.*`
WHERE
  url = "https://www.wired.com/manifest.json"
  OR url = "https://www.wired.com/sw.js"

Web app manifests are generally indexed, as can be seen in this quick test:

SELECT
  url,
  body
FROM
  `httparchive.response_bodies.*`
WHERE
  url LIKE "%/manifest.json"
  AND body LIKE "%\"short_name\":%"
LIMIT
  10

I am not sure what causes the manifest to not be included, but I think Wired's manifest should be indexed as well, the HTML referencing it definitely was indexed:

SELECT
  page,
  url,
  body
FROM
  `httparchive.response_bodies.*`
WHERE
  page = "http://www.wired.com/"
  AND (url = "http://www.wired.com/"
    OR url = "https://www.wired.com/")
  AND body LIKE "%manifest.json%"

(CC: @rviscomi and @jeffposnick)

Too few summary_requests.2019_07_01_desktop rows

In the most recent 2019_07_01 crawl, which we will use for the Almanac, I'm only seeing 240,411,901 rows in the desktop summary_requests table. For reference, the corresponding requests table has 420,510,876 rows. The corresponding mobile tables are much more aligned: 468,544,640 vs 463,862,666.

I'll try rerunning the CSV sync to ensure the Almanac queries are accurate.

image

Clean up processed data after each crawl

Downloading and processing CSV files eats up a lot of disk space and can cause the pipeline to stall if it runs out of space.

We need to clean up the CSVs and processed data when we're done generating the BigQuery tables.

CommandException: Invalid command "application/json".

During generateReports.sh at the end of each metric being generated, I noticed the following error:

CommandException: Invalid command "application/json".

Ensure that the JSON files are being uploaded to GCS correctly.

Upgrade to the latest Apache Beam SDK version to prevent job disruption

Upgrade to the latest Apache Beam SDK version or add your project to an “allow” list to ensure continuity of current workflow.

Hello Rick,

This is a reminder that we will soon discontinue support for the JSON-RPC protocol and Global HTTP Batch and, as a result, will decommission the following SDK versions on March 31, 2020:

  • Apache Beam SDK for Java, versions 2.4.0 and below (inclusive)
  • Apache Beam SDK for Python, versions 2.4.0 and below (inclusive)
  • Cloud Dataflow SDK for Java, versions 2.4.0 and below (inclusive)
  • Cloud Dataflow SDK for Python, 2.4.0 and below (inclusive)

Timeline for decommissioning:

  1. January 31, 2020 - Deadline to add a project(s) to the “allow” list (see instructions below).
  2. February 2020 - Jobs using the SDKs listed above will start to fail unless added to the allow list. Jobs that have been upgraded to the latest Apache Beam SDK version will not be affected.
  3. March 31, 2020 - Any job still running on the SDKs listed above will fail, even if the project was added to the allow list.

What do I need to know?

Jobs will start failing in February 2020 as a way to notify all users of the requirement to upgrade/migrate affected pipelines to supported SDKs before March 31, 2020. Adding your project to the allow list lets us know that you got the message, and are working on migrating your projects before the March deadline. After March 31, 2020, any job still running on Apache Beam or Cloud Dataflow SDK versions 2.4.0 or earlier will fail.
Your projects listed below will be affected by this change:

  • HTTP Archive (httparchive)

What do I need to do?

  1. To exempt jobs running affected SDKs from failure between February and March 2020, request that the project ID(s) be added to the “allow” list. Requests must be submitted by January 31, 2020.
    If you have a technical account manager (TAM) or a strategic cloud engineer (SCE), contact them directly to have your project(s) added to the allow list. Include the project ID(s) for the job(s) to be exempted.
    If you do not have a TAM or SCE, reply to this email to request a project be added to the allow list. Include the project ID(s) for the job(s) to be exempted.

  2. Migrate your affected jobs to the latest Apache Beam SDK version by March 31, 2020.
    If you have any questions or require assistance, please reply to this email to contact Google Cloud Support.

Thanks for choosing Apache Beam and Cloud Dataflow.

—The Google Apache Beam and Cloud Dataflow Teams

Cookie values missing from recent runs

It seems that starting with the September 2019 run, cookies were stripped from the WebPageTest results due to a change in the required net-log command line argruments, resulting in the a message such as "[x bytes were stripped]" being stored in place of ~85% of all cookies values.

The commands line argument used in wptagent was updated by @pmeenan . I'm just opening this issue as a reminder to check the cookie data in next month's data.

Create and maintain a 10k-row subset table

Suggested in the HTTP Archive Slack channel:

Was wondering if it makes sense to add a "sample" dataset that contains data for the first ~1000 pages. This way you can easily test out a query on httparchive.latest.response_bodies_desktop using something smaller like httparchive.sample.response_bodies_desktop. I manually create sample datasets for the same reason when working with the larger tables.

having an official 10K subset would make this process cheaper for non-Google folks, and would make it feasible to create an occasional query without hitting the free plan limits

Just need to figure out which tables to subset, how to organize them, and how to keep them updated with the latest release.

Ensure 100% table coverage in BigQuery

https://discuss.httparchive.org/t/missing-2016-02-15-chrome-requests/1310 is a bug report that some 2016_02_15 tables are missing.

We should take inventory of all tables across all dates and reprocess anything that's missing.

This can be a good first bug for first time contributors. Overview of the expected workflow:

  • use the bq command line interface to list the contents of each dataset
  • export results to a spreadsheet
    • graph the results to make it obvious if there are any gaps
  • or write a script to check if any YYYY_MM_[01, 15] tables are missing
    • some early tables are not necessarily DD=[01, 15]
  • ignore tables that are expected to be missing, eg lighthouse.YYYY_MM_DD_desktop, or others missing as a result of known data loss bugs (citation needed)

Reduce size of Lighthouse payload

The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes.

image

  1. identify parts of the JSON payload that are unnecessary or unlikely to have analytical value and also significant contributors to the payload size
  2. modify the Dataflow pipeline to omit these parts of the payload
  3. profit

Add HTTP Archive to the public datasets in BigQuery

The new experimental UI of BigQuery doesn't seem to allow adding external sources, unless they're part of either the organisation or the catalogue of public datasets.

While, for now, it's possible to switch to the old/stable UI and so use httparchive from BigQuery, we should look into adding HTTPArchive to public datasets to make it accessible in the future too.

Update the "latest" tables from Dataflow

Forked from #76

Currently we use scheduled queries to scan each dataset/client combo for the latest release and save that to its respective latest.<dataset>_<client> table.

For example, here's the scheduled query that generates the latest.response_bodies_mobile table:

#standardSQL
SELECT
  *
FROM
  `httparchive.response_bodies.*`
WHERE
  ENDS_WITH(_TABLE_SUFFIX, 'mobile') AND
  SUBSTR(_TABLE_SUFFIX, 0, 10) = (
  SELECT
    SUBSTR(table_id, 0, 10) AS date
  FROM
    `httparchive.response_bodies.__TABLES_SUMMARY__`
  ORDER BY
    table_id DESC
  LIMIT
    1)

BigQuery usually has some heuristics to help minimize the number of bytes processed by a query if the WHERE clause clearly limits the _TABLE_SUFFIX pseudocolumn to a particular table. But I'm not sure if that's happening here because the estimated cost of this query is over $1000 (200 TB): This query will process 202.9 TB when run..

Queries for each dataset/client combo are scheduled to run on the first couple of days of every month. They become more expensive over time as we add new tables to every dataset.

A much more efficient approach would be to overwrite the latest.* tables in the Dataflow pipeline when we create the tables for each release. Rather than updating the deprecated Java pipeline, add this as a feature to #79.

Investigate 3x increase in response body rows as of 2021_07_01

Now that we've got the first response_bodies data in several months, it's strange to see a steep increase in the number of rows per table despite the table size (TB) not growing by as much: https://datastudio.google.com/u/0/reporting/1jh_ScPlCIbSYTf2r2Y6EftqmX9SQy4Gn/page/5ike

image

Investigate the cause of the increased rows and deduplicate if needed. This table will be used by the 2021 Web Almanac, so it's important to make sure it doesn't introduce any data errors.

A couple of theories to start on:

  • Bisecting the HARs results in some null rows
  • Bisecting the HARs results in some duplicate rows

Making the HTTP2 query cheaper

We have a HTTP/2 requests graph which does a look up on the $_protocol field in the requests.payload column. This currently costs 211TB and costs an estimated $1,058 (yes - one thousand bucks!!!) and counting, to run and is re-run every month. Which is quite frankly ridiculous. It also takes forever to run and sometimes times out.

I wanted to add an HTTP/3 graph since it's getting out there but can't justify doubling that cost! While our generous benefactor may be able to absorb that, others can't, and I think we should be setting a better examples here.

If we use the summary_requests table and use the reqHttpVersion, or respHttpVersion (or both!) then the cost plummets to 363GB and or an estimated $1.77!!! And the data looks pretty similar (not exactly the same as requests and summary_requests look to have slight differences in number of rows, but close enough).

However, there is an issue as these fields had bad data for a long time (relevant WPT issue and was only fixed from October 2020. I would prefer to track the growth longer than that and ideally back to 2015 when HTTP/2 was launched.

So we've a few choices:

  1. Fix up the bad data. Ideally we'd join requests to summary_requests and update the bad reqHttpVersion, or respHttpVersion values to the $._protocol field but can't figure out how to do that.
  2. Patch the bad data by saying ori:, us:, od:, me: or : / values are effectively HTTP/2. This isn't always the case and there are a small number of HTTP/1.1 connections which give those values, but it's close enough and a lot easier to run this clean up than option 1 (unless there is a way to join these two tables I'm not seeing?).
  3. Have a hacky SQL (see below) to patch it in the query instead. Seems a bit of a hack.
  4. Add the protocol column to summary_requests table and backfill all the old values. Seems like quite an effort.
  5. Wait until we reorganised the tables like we've talked about.
  6. Leave as is and just implement HTTP/3 query in cheaper manner.

Thoughts?

#standardSQL
SELECT
  SUBSTR(_TABLE_SUFFIX, 0, 10) AS date,
  UNIX_DATE(CAST(REPLACE(SUBSTR(_TABLE_SUFFIX, 0, 10), '_', '-') AS DATE)) * 1000 * 60 * 60 * 24 AS timestamp,
  IF(ENDS_WITH(_TABLE_SUFFIX, 'desktop'), 'desktop', 'mobile') AS client,
  ROUND(SUM(IF(respHttpVersion = 'HTTP/2'
               OR respHttpVersion = 'ori' -- bad value that mostly means HTTP/2 (parsed incorrectly from :authority:)
               OR respHttpVersion = 'us:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :status:)
               OR respHttpVersion = 'od:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :method:)
               OR respHttpVersion = 'me:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :scheme:)
               OR respHttpVersion = ': /' -- bad value that mostly means HTTP/2 (parsed incorrectly from :path:)
               OR reqHttpVersion = 'HTTP/2'
               OR reqHttpVersion = 'ori' -- bad value that mostly means HTTP/2 (parsed incorrectly from :authority:)
               OR reqHttpVersion = 'us:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :status:)
               OR reqHttpVersion = 'od:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :method:)
               OR reqHttpVersion = 'me:' -- bad value that mostly means HTTP/2 (parsed incorrectly from :scheme:)
               OR reqHttpVersion = ': /' -- bad value that mostly means HTTP/2 (parsed incorrectly from :path:)
             , 1, 0)) * 100 / COUNT(0), 2) AS percent
FROM
  `httparchive.summary_requests.*`
GROUP BY
  date,
  timestamp,
  client
ORDER BY
  date DESC,
  client

Here's the comparison of what that comes back with compared to the current production site:

date timestamp client percent curr_pct diff
2021_05_01 1.6198E+12 desktop 64.55 64.8 0.25
2021_05_01 1.6198E+12 mobile 64.96 65.3 0.34
2021_04_01 1.6172E+12 desktop 68.46 68.6 0.14
2021_04_01 1.6172E+12 mobile 67.47 67.6 0.13
2021_03_01 1.6146E+12 desktop 68.5 68.6 0.1
2021_03_01 1.6146E+12 mobile 68.15 68.3 0.15
2021_02_01 1.6121E+12 desktop 68.15 68.3 0.15
2021_02_01 1.6121E+12 mobile 68.05 68.2 0.15
2021_01_01 1.6095E+12 desktop 67.19 67.3 0.11
        67.5 67.5
2020_12_01 1.6068E+12 desktop 66.75 66.9 0.15
2020_12_01 1.6068E+12 mobile 67.11 67.3 0.19
2020_11_01 1.6042E+12 desktop 65.95 66.1 0.15
2020_11_01 1.6042E+12 mobile 66.24 66.4 0.16
2020_10_01 1.6015E+12 desktop 65.57 65.7 0.13
2020_10_01 1.6015E+12 mobile 65.46 65.6 0.14
2020_09_01 1.5989E+12 desktop 63.52 64.8 1.28
2020_09_01 1.5989E+12 mobile 65.61 64.9 -0.71
2020_08_01 1.5962E+12 desktop 62.53 63.7 1.17
2020_08_01 1.5962E+12 mobile 65.09 63.8 -1.29
2020_07_01 1.5936E+12 desktop 62.23 64.2 1.97
2020_07_01 1.5936E+12 mobile 64.43 64.2 -0.23
2020_06_01 1.591E+12 desktop 61.46 64.4 2.94
2020_06_01 1.591E+12 mobile 62.34 64.5 2.16
2020_05_01 1.5883E+12 desktop 60.63 63.4 2.77
2020_05_01 1.5883E+12 mobile 61.79 63.8 2.01
2020_04_01 1.5857E+12 desktop 59.6 62.2 2.6
2020_04_01 1.5857E+12 mobile 60.6 62.4 1.8
2020_03_01 1.583E+12 desktop 59.79 62.3 2.51
2020_03_01 1.583E+12 mobile 60.68 62.5 1.82
2020_02_01 1.5805E+12 desktop 60.32 63.5 3.18
2020_02_01 1.5805E+12 mobile 60.91 63.1 2.19
2020_01_01 1.5778E+12 desktop 55.1 59.2 4.1
2020_01_01 1.5778E+12 mobile 55.11 59.3 4.19
2019_12_01 1.5752E+12 desktop 54.37 58.9 4.53
2019_12_01 1.5752E+12 mobile 54.27 58.9 4.63
2019_11_01 1.5726E+12 desktop 47.22 58 10.78
2019_11_01 1.5726E+12 mobile 53.51 58.2 4.69
2019_10_01 1.5699E+12 desktop 52.55 57.1 4.55
2019_10_01 1.5699E+12 mobile 52.43 56.9 4.47
2019_09_01 1.5673E+12 desktop 51.8 56.2 4.4
2019_09_01 1.5673E+12 mobile 53.47 56 2.53
2019_08_01 1.5646E+12 desktop 51.4 55.7 4.3
2019_08_01 1.5646E+12 mobile 55.16 55.5 0.34
2019_07_01 1.5619E+12 desktop 51.81 54.9 3.09
2019_07_01 1.5619E+12 mobile 54.53 54.8 0.27
2019_06_01 1.5593E+12 desktop 50.83 53.8 2.97
2019_06_01 1.5593E+12 mobile 50.21 53.3 3.09
2019_05_01 1.5567E+12 desktop 48.16 53.1 4.94
2019_05_01 1.5567E+12 mobile 47.38 52.6 5.22
2019_04_01 1.5541E+12 desktop 45.57 52.3 6.73
2019_04_01 1.5541E+12 mobile 44.19 52 7.81
2019_03_01 1.5514E+12 desktop 48.49 50.6 2.11
2019_03_01 1.5514E+12 mobile 47.34 50.7 3.36
2019_02_01 1.549E+12 desktop 49.63 49.7 0.07
2019_02_01 1.549E+12 mobile 49.79 49.8 0.01
        48.3 48.3
        48.3 48.3
2018_12_15 1.5448E+12 desktop 32.8 47.8 15
2018_12_15 1.5448E+12 mobile 36.73 48.9 12.17
        49.1 49.1
        48.8 48.8
2018_11_15 1.5422E+12 desktop 46.92 48.4 1.48
2018_11_15 1.5422E+12 mobile 46.87 48.4 1.53
2018_11_01 1.541E+12 desktop 46.27 47.8 1.53
        47.5 47.5
2018_10_15 1.5396E+12 desktop 45.53 46.5 0.97
2018_10_15 1.5396E+12 mobile 45.13 46.2 1.07
2018_10_01 1.5384E+12 desktop 45.89 46 0.11
2018_10_01 1.5384E+12 mobile 45.53 45.5 -0.03
2018_09_15 1.537E+12 desktop 45.66 45.8 0.14
2018_09_15 1.537E+12 mobile 45.19 45.2 0.01
2018_09_01 1.5358E+12 desktop 44.83 45 0.17
2018_09_01 1.5358E+12 mobile 44.6 44.6 0
2018_08_15 1.5343E+12 desktop 44.65 44.8 0.15
        44.9 44.9
2018_08_01 1.5331E+12 desktop 44.26 44.4 0.14
2018_08_01 1.5331E+12 mobile 44.61 44.6 -0.01
2018_07_15 1.5316E+12 desktop 43.77 44 0.23
2018_07_15 1.5316E+12 mobile 44.3 44.3 0
2018_07_01 1.5304E+12 desktop 43.42 43.6 0.18
2018_07_01 1.5304E+12 mobile 41.37 41.6 0.23
2018_06_15 1.529E+12 desktop 38.59 38.8 0.21
2018_06_15 1.529E+12 mobile 40.36 40.6 0.24
2018_06_01 1.5278E+12 desktop 38.17 38.2 0.03
2018_06_01 1.5278E+12 mobile 39.9 40.1 0.2
2018_05_15 1.5263E+12 desktop 38.16 38.3 0.14
2018_05_15 1.5263E+12 mobile 39.56 39.7 0.14
2018_05_01 1.5251E+12 desktop 37.94 38 0.06
2018_05_01 1.5251E+12 mobile 39.21 39.4 0.19
2018_04_15 1.5238E+12 desktop 37.59 37.6 0.01
2018_04_15 1.5238E+12 mobile 39.16 39.4 0.24
        37.1 37.1
        38.7 38.7
2018_03_15 1.5211E+12 desktop 36.67 36.8 0.13
2018_03_15 1.5211E+12 mobile 37.82 38 0.18
2018_03_01 1.5199E+12 desktop 35.9 35.9 0
2018_03_01 1.5199E+12 mobile 37.1 37.3 0.2
2018_02_15 1.5187E+12 desktop 35.46 35.5 0.04
2018_02_15 1.5187E+12 mobile 36.39 36.5 0.11
2018_02_01 1.5174E+12 desktop 35.23 35.3 0.07
2018_02_01 1.5174E+12 mobile 35.98 36.1 0.12
2018_01_15 1.516E+12 desktop 33.9 34 0.1
2018_01_15 1.516E+12 mobile 34.69 34.8 0.11
2018_01_01 1.5148E+12 desktop 33.3 33.7 0.4
2018_01_01 1.5148E+12 mobile 34.3 34.7 0.4
2017_12_15 1.5133E+12 desktop 33 33.4 0.4
2017_12_15 1.5133E+12 mobile 34.03 34.4 0.37
2017_12_01 1.5121E+12 desktop 31.92 32.4 0.48
2017_12_01 1.5121E+12 mobile 32.58 33.1 0.52
2017_11_15 1.5107E+12 desktop 31.39 31.8 0.41
        32.6 32.6
2017_11_01 1.5095E+12 desktop 31.11 31.5 0.39
2017_11_01 1.5095E+12 mobile 31.76 32.4 0.64
2017_10_15 1.508E+12 desktop 30.19 30.6 0.41
2017_10_15 1.508E+12 mobile 31.06 31.5 0.44
2017_10_01 1.5068E+12 desktop 29.89 30.2 0.31
2017_10_01 1.5068E+12 mobile 30.54 31.1 0.56
2017_09_15 1.5054E+12 desktop 28.88 29.2 0.32
2017_09_15 1.5054E+12 mobile 29.43 30 0.57
2017_09_01 1.5042E+12 desktop 28.21 0 -28.21
2017_09_01 1.5042E+12 mobile 29 0.1 -28.9
2017_08_15 1.5028E+12 desktop 27.25 0 -27.25
2017_08_15 1.5028E+12 mobile 28.07 0 -28.07
2017_08_01 1.5015E+12 desktop 26.76 0 -26.76
2017_08_01 1.5015E+12 mobile 27.41 0 -27.41
2017_07_15 1.5001E+12 desktop 26.63 26.5 -0.13
2017_07_15 1.5001E+12 mobile 27.02 27.1 0.08
2017_07_01 1.4989E+12 desktop 26.14 26 -0.14
2017_07_01 1.4989E+12 mobile 26.44 26.5 0.06
2017_06_15 1.4975E+12 desktop 25.29 25.2 -0.09
2017_06_15 1.4975E+12 mobile 25.88 26 0.12
2017_06_01 1.4963E+12 desktop 25.05 25 -0.05
2017_06_01 1.4963E+12 mobile 25.47 25.7 0.23
2017_05_15 1.4948E+12 desktop 25.02 24.9 -0.12
2017_05_15 1.4948E+12 mobile 25.29 25.5 0.21
2017_05_01 1.4936E+12 desktop 24.87 23.9 -0.97
2017_05_01 1.4936E+12 mobile 24.49 23.8 -0.69
2017_04_15 1.4922E+12 desktop 25.12 24.9 -0.22
2017_04_15 1.4922E+12 mobile 25.41 25.2 -0.21
2017_04_01 1.491E+12 desktop 24.55 24.7 0.15
2017_04_01 1.491E+12 mobile 24.69 24.9 0.21
2017_03_15 1.4895E+12 desktop 23.78 24 0.22
2017_03_15 1.4895E+12 mobile 23.69 23.9 0.21
2017_03_01 1.4883E+12 desktop 23.4 23.4 0
2017_03_01 1.4883E+12 mobile 23.3 23.4 0.1
2017_02_15 1.4871E+12 desktop 23.07 23.1 0.03
2017_02_15 1.4871E+12 mobile 22.91 23.1 0.19
2017_02_01 1.4859E+12 desktop 22.74 22.8 0.06
2017_02_01 1.4859E+12 mobile 22.85 22.9 0.05
        22 22
2017_01_15 1.4844E+12 mobile 22 22 0
        21.3 21.3
2017_01_01 1.4832E+12 mobile 21.58 21.6 0.02
2016_12_15 1.4818E+12 desktop 19.68 20.9 1.22
        21.3 21.3
        20.7 20.7
        21.2 21.2
2016_11_15 1.4792E+12 desktop 20.54 20.3 -0.24
2016_11_15 1.4792E+12 mobile 20.55 20.6 0.05
2016_11_01 1.478E+12 desktop 20.25 20.3 0.05
2016_11_01 1.478E+12 mobile 19.91 20 0.09
2016_10_15 1.4765E+12 desktop 18.66 18.6 -0.06
2016_10_15 1.4765E+12 mobile 19.37 19.7 0.33
2016_10_01 1.4753E+12 desktop 18.5 18.7 0.2
2016_10_01 1.4753E+12 mobile 19.32 19.5 0.18
2016_09_15 1.4739E+12 desktop 17.11 17.4 0.29
2016_09_15 1.4739E+12 mobile 17.29 17.5 0.21
2016_09_01 1.4727E+12 desktop 16.45 16.5 0.05
2016_09_01 1.4727E+12 mobile 16.66 16.5 -0.16
2016_08_15 1.4712E+12 desktop 16.49 16.5 0.01
2016_08_15 1.4712E+12 mobile 16.4 16.4 0
2016_08_01 1.47E+12 desktop 16.36 16.4 0.04
        16.2 16.2
2016_07_15 1.4685E+12 desktop 15.9 0 -15.9
        0 0
2016_07_01 1.4673E+12 desktop 15.47 0 -15.47
        0 0
2016_06_15 1.4659E+12 desktop 15.16 0 -15.16
        0 0
2016_06_01 1.4647E+12 desktop 13.72 0 -13.72
        0 0
2016_05_15 1.4633E+12 desktop 13.15 0 -13.15
        0 0
2016_05_01 1.4621E+12 desktop 0 0 0
        0 0
2016_04_15 1.4607E+12 desktop 0 0 0
        0 0
2016_04_01 1.4595E+12 desktop 0 0 0
        0 0
2016_03_15 1.458E+12 desktop 0 0 0
        0 0
2016_03_01 1.4568E+12 desktop 0 0 0
2016_03_01 1.4568E+12 mobile 0 0 0
2016_02_15 1.4555E+12 desktop 0 0 0
2016_02_15 1.4555E+12 mobile 0 0 0
2016_02_01 1.4543E+12 desktop 0 0 0
2016_02_01 1.4543E+12 mobile 0 0 0
2016_01_15 1.4528E+12 desktop 0 0 0
2016_01_15 1.4528E+12 mobile 0 0 0
2016_01_01 1.4516E+12 desktop 0 0 0
2016_01_01 1.4516E+12 mobile 0 0 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.