riccardobucco / wikipedia_downloader Goto Github PK
View Code? Open in Web Editor NEWDownload Wikipedia data dumps
License: MIT License
Download Wikipedia data dumps
License: MIT License
Description: Some columns might be incompatible with utf-8 (for example: pagelinks.cl_sortkey).
Current behavior: All the columns whose values are strings are decoded with utf-8.
Expected behavior: Some values can't be decoded with utf-8, because they represent a sequence of bytes.
Current behavior: When you install the module through pip in a clean virtual environment, and then you try to import it, an error occurs due to a missing dependency (pandas).
Expected behavior: All the required dependencies should be automatically installed.
Description: It would be nice to have the possibility to specify the dtypes of each column of the dataframe returned by the get_dataframe
function.
Current behavior: The get_dataframe
function automatically infers the dtype of each column.
Expected behavior: The user should be able to specify the dtypes. There could also be some predefined dtypes, based on the table and column names.
Description: The get_dataframe
function doesn't work when the where
parameter is not explicitly specified.
Current behavior: When you try to call the function without specifying the where
parameter (for example wpd.get_dataframe("en", "page")
, you get the error
TypeError: 'NoneType' object is not iterable
Expected behavior: The absence of the where
parameter means that no filters have to be applied.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.