vipulnaik / wikipediaviews Goto Github PK
View Code? Open in Web Editor NEWUnderlying code of https://wikipediaviews.org with sensitive parts redacted
Home Page: https://wikipediaviews.org
License: Other
Underlying code of https://wikipediaviews.org with sensitive parts redacted
Home Page: https://wikipediaviews.org
License: Other
When submitting "Cumulative Facebook shares", the numbers for redirects are the same as those of the destination pages because Facebook merges redirects with the actual articles. This is confusing unless you can tell that a page is a redirect.
For instance submit the following pages:
Quora
Timeline of Quora
at http://wikipediaviews.org/multiplemonths.php (I would post a link but Wikipedia Views can't do this currently).
Notice that the "Cumulative Facebook shares" are the same because the timeline page redirects to the main page.
I would suggest coloring redirects in rgb(255, 137, 33)
(#FF8921
), which is the color Wikipedia uses in the "Display links to disambiguation pages in orange" gadget.
Redirects can be detected using the MediaWiki API. Compare https://en.wikipedia.org/w/api.php?action=query&titles=Timeline%20of%20Quora&redirects&format=jsonfm
{
"batchcomplete": "",
"query": {
"redirects": [
{
"from": "Timeline of Quora",
"to": "Quora",
"tofragment": "Timeline"
}
],
"pages": {
"26749224": {
"pageid": 26749224,
"ns": 0,
"title": "Quora"
}
}
}
}
with https://en.wikipedia.org/w/api.php?action=query&titles=Quora&redirects&format=jsonfm:
{
"batchcomplete": "",
"query": {
"pages": {
"26749224": {
"pageid": 26749224,
"ns": 0,
"title": "Quora"
}
}
}
}
See https://www.mediawiki.org/wiki/API:Query#Resolving_redirects for more.
Using the "Alternative page specification", only the tag method works. The "category", "user" and 'linking page" methods return "There are no pages for the . . .-language combination."
ETA: I only tested this on http://wikipediaviews.org/
The file should test our in-built functions as well as our data fetching routines, and should detect breaks in our ability to read from MediaWiki and stats.grok.se.
The original version of the plotting code is at https://gist.github.com/riceissa/8726e6a90d4a7634f3dd79cc1fdb63d5. It would be good to do one or more of the following:
The repository has no LICENSE
file or similar.
Single quotes aren't properly escaped when making SQL insertions, causing some inconsistent behavior. Don't HTML-encode, just escape.
Done on most of the important parts of the code, but not fully rolled out.
Instead of using regex matching on the HTML source, use PHP's in-built JSON parser, making the code more robust and not sensitive to formatting and ordering changes.
There are vertical blue and red lines in the plot, but no explanation of what they are. I would suggest explaining that they show the addition of mobile/spider pageviews and the API switch.
There is currently a normalization option, "Daily average (for days in the month when stats are available)", for HTML output. This option should be available for other output formats as well.
When exporting as CSV, this creates extra columns that don't exist.
Fix: add delimiter option or quote page titles.
Currently we have pretty much a giant pool of functions split across many files, and it's often not clear what file a given function being called belongs to. This is okay for the current codebase size but is not good software engineering practice. Figure out how to fix this within PHP, otherwise just add comments identifying function sources.
Basically read from pages like https://stats.wikimedia.org/archive/squid_reports/2018-01/SquidReportPageViewsPerCountryBreakdownHuge.htm and output SQL like the one at https://github.com/vipulnaik/wikipediaviews/blob/master/sql/country-language-data.sql
See the announcement at https://blog.wikimedia.org/2018/01/16/wikipedia-rabbit-hole-clickstream/
Folder: https://dumps.wikimedia.org/other/clickstream/
We previously incorporated some clickstream data into WV as a one-off project; however, it was not published regularly at the time. The format may have changed by now. For the historical work, see issa-bae/wikipedia-clickstream/
Tables should be sortable.
I suggest http://tablesorter.com/docs/, which is what gwern.net uses. This requires jQuery.
This should require a 2-3 day concentrated stretch to make sure there is no break in dependencies/compatibility, but it will benefit the codebase by making it more human-readable.
https://meta.wikimedia.org/wiki/Complete_list_of_Wikimedia_projects
Likely column SQL:
project enum('wikipedia','wiktionary','wikiquote','wikinews','wikisource','wikibooks','wikijunior','wikiversity','wikivoyage','wikimedia','wikidata','wikispecies') default 'wikipedia';
The actual implementation logic for the multilingual projects will be a little different.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.