lahwaacz / arch-wiki-docs Goto Github PK
View Code? Open in Web Editor NEWA script to download pages from Arch Wiki for offline browsing
License: GNU General Public License v3.0
A script to download pages from Arch Wiki for offline browsing
License: GNU General Public License v3.0
How do I use this?
Subpages now have language tag in both parts (parent page and subpage) after talk in help_talk:I18n#Localized_subpages and since Help:I18n#Page_titles has been updated.
For sub-pages that have language tag in both parts, arch-wiki-docs still downloads the subpage correctly, but the subpage is kept in the language root directory in an unrecognizable filename (e.g. "2252df9ed53fcf7fbb097a1d379a8158.html") without creating a subdirectory to hold the subpages.
subpages without language tag in the parent page part are not affect -- they receive the correct filename and go to the expected subdirectory.
Is this possible?
Well, this is definitely a weird one...
When updating the arch-wiki-docs
package to the latest commit (b6c20fd
from 216a217
), I'm getting a TypeError that throws this:
==> Starting prepare()...
Downloading CSS...
ArchWikiOffline.css
Available namespaces:
-2 -- Media
-1 -- Special
0 -- Main
1 -- Talk
2 -- User
3 -- User talk
4 -- ArchWiki
5 -- ArchWiki talk
6 -- File
7 -- File talk
8 -- MediaWiki
9 -- MediaWiki talk
10 -- Template
11 -- Template talk
12 -- Help
13 -- Help talk
14 -- Category
15 -- Category talk
3000 -- DeveloperWiki
3001 -- DeveloperWiki talk
Processing namespace 0...
[skipping] .NET
[downloading] .NET Core (Español)
Traceback (most recent call last):
File "/build/arch-wiki-docs/src/arch-wiki-docs/arch-wiki-docs.py", line 41, in <module>
downloader.process_namespace(ns)
File "/build/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/downloader.py", line 96, in process_namespace
text = self.optimizer.optimize(fname, r.text)
File "/build/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/optimizer.py", line 31, in optimize
self.update_links(root, relbase)
File "/build/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/optimizer.py", line 102, in update_links
href += "#" + fragment
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'
arch-wiki-docs/ArchWiki/optimizer.py
Lines 80 to 109 in b6c20fd
This is a bit confusing because one of the first if
statements in that particular function is checking if href
is not None.
For example: en/Syslinux.html
has a link "Boot loaders" in it, it should link to en/Category:Boot_loaders.html
, but instead it links to the non-existent en/Boot_loaders.html
.
This is in order to install Arch Linux having Win8 machine only, to have the wiki on USB flash when installing the Arch.
Steps:
> git clone ...
> pip install simplemediawiki
> py .\arch-wiki-docs.py
Traceback (most recent call last):
File ".\arch-wiki-docs.py", line 6, in <module>
from simplemediawiki import build_user_agent
File "C:\Python\lib\site-packages\simplemediawiki.py", line 179
print test_api_url
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(test_api_url)?
Necessary for compatibility of Arch packages with the C locale.
categories cannot be handled the same as regular pages, because the wiki text is usually empty
Redirect pages are currently not managed, which leaves many broken links.
Redirect pages are currently not managed, which leaves many broken links.
All links link to file:///title/***
, which is not Redirect pages located.
version: arch-wiki-docs 20211116-1
Hello,
Running
./arch-wiki-docs.py --output-directory dir --clean
works perfectly, but running
./arch-wiki-docs.py --output-directory ./dir --clean
'clean's and deletes all the files except the CSS.
Hello,
Thank-you for the free software!
However, I am having some problems using it. First, I was having issues with the unmaintained package for simplemediawiki, although it was installed via pip3 I still had to the 2to3 command against it to get it more compliant with python 3 (it still uses 'print' without brackets and imports old packages, for example)
Once I got that seemingly working I tried to run arch-wiki-docs again and I am still receiving errors, the latest which is:
./arch-wiki-docs.py --output-directory /tmp/
Downloading CSS...
ArchWikiOffline.css
Traceback (most recent call last):
File "./arch-wiki-docs.py", line 39, in <module>
aw.print_namespaces()
File "/home/username/scripts/arch-wiki-docs/ArchWiki/ArchWiki.py", line 152, in print_namespaces
nsmap = self.namespaces()
File "/home/username/scripts/arch-wiki-docs/ArchWiki/ArchWiki.py", line 147, in namespaces
self._namespaces = super().namespaces()
File "/home/username/.local/lib/python3.7/site-packages/simplemediawiki.py", line 271, in namespaces
'siprop': 'namespaces'})
File "/home/username/.local/lib/python3.7/site-packages/simplemediawiki.py", line 149, in call
return json.loads(self._fetch_http(self._api_url, params))
File "/home/username/.local/lib/python3.7/site-packages/simplemediawiki.py", line 124, in _fetch_http
response = self._opener.open(request)
File "/usr/lib/python3.7/urllib/request.py", line 523, in open
req = meth(req)
File "/usr/lib/python3.7/urllib/request.py", line 1254, in do_request_
raise TypeError(msg)
TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.
So I guess my question is: Is arch-wiki-docs currently in a working state? I just want to know before I try to fix more issues just to see another one pop up.
Thanks!
Error when building arch-wiki-docs
package, but I'm not sure whether it is on ArchWiki or in the software.
$ LC_ALL=C makepkg -fsC
==> Making package: arch-wiki-docs 20170822-1 (Wed Nov 22 21:49:44 -02 2017)
==> Checking runtime dependencies...
==> Checking buildtime dependencies...
==> Retrieving sources...
-> Updating arch-wiki-docs git repo...
Fetching origin
==> Validating source files with md5sums...
arch-wiki-docs ... Skipped
==> Removing existing $srcdir/ directory...
==> Extracting sources...
-> Creating working copy of arch-wiki-docs git repo...
Cloning into 'arch-wiki-docs'...
done.
Switched to a new branch 'makepkg'
==> Starting prepare()...
Available namespaces:
-2 -- Media
-1 -- Special
0 -- Main
1 -- Talk
2 -- User
3 -- User talk
4 -- ArchWiki
5 -- ArchWiki talk
6 -- File
7 -- File talk
8 -- MediaWiki
9 -- MediaWiki talk
10 -- Template
11 -- Template talk
12 -- Help
13 -- Help talk
14 -- Category
15 -- Category talk
Processing namespace 0...
[downloading] .NET Core
Traceback (most recent call last):
File "arch-wiki-docs.py", line 31, in <module>
downloader.process_namespace(ns)
File "/home/rffontenelle/builds/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/downloader.py", line 87, in process_namespace
self.cb_download(fullurl, fname)
File "/home/rffontenelle/builds/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/optimizer.py", line 23, in optimize_url
self.optimize(urllib.request.urlopen(url), fout)
File "/home/rffontenelle/builds/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/optimizer.py", line 42, in optimize
self.fix_footer()
File "/home/rffontenelle/builds/arch-wiki-docs/src/arch-wiki-docs/ArchWiki/optimizer.py", line 138, in fix_footer
f_list = self.root.cssselect("#f-list")[0]
IndexError: list index out of range
==> ERROR: A failure occurred in prepare().
Aborting...
archlinux announce newest public service: A manual pages indexing site at man.archlinux.org that publishes the man pages of all packages.
Hope that manual html pages can add to arch-wiki-docs.
Cheers.
Title says it all.
Pages not existing on ArchWiki should be removed from the local output directory.
The idea is to simply have a dark_mode.css
file in the same directory as the ArchWikiOffline.css
and a button in each page to toggle between light and dark mode.
I am not suggesting to try and copy what the live wiki site does, but rather just append some lines of code to each file.
I have extracted and written the CSS file needed and a JS function to do that with no page reloads. The code also saves the preference to local storage, so it will persist over closing and opening the browser.
Unfortunately, I don't have the time currently to get into the project and make a pull request. I will provide the CSS and JS below for anyone who wants to undertake this.
*.html
<head>
...
<!-- Link to the stylesheet location -->
<link rel="stylesheet" type="text/css" href="dark_mode.css" id="darkModeLink">
...
</head>
...
<!-- The button that will toggle the mode -->
<button id="toggleDarkLightModeButton" onclick="toggleDarkLightMode()"></button>
...
<!-- Make the script defer to make sure the elements exist -->
<!-- Another way make sure the elements are loaded is document.addEventListener('DOMContentLoaded', () => {}) -->
<script defer>
// reference to the link tag
const darkModeLinkTag = document.getElementById('darkModeLink');
// load the preference of the user
darkModeLinkTag.disabled = localstorage.getItem('wantsDarkMode') == 'true';
// reference to the toggle button
const toggleDarkLightModeButton = document.getElementById('toggleDarkLightModeButton');
// set the initial text of the button based on the state of the tag
toggleDarkLightModeButton.innerText = darkModeLinkTag.disabled ? "Dark Mode" : "Light Mode";
function toggleDarkLightMode() {
// check for the current state of the link tag
if (darkModeLinkTag.disabled) {
// enable the link tag and update the button text to say that by clicking it you will go to light mode
darkModeLinkTag.disabled = false;
toggleDarkLightModeButton.innerText = "Light Mode";
} else {
// disable the link tag and update the button text to say that by clicking it you will go to dark mode
darkModeLinkTag.disabled = true;
toggleDarkLightModeButton.innerText = "Dark Mode";
}
// store the preference of the user
localstorage.setItem('wantsDarkMode', darkModeLinkTag.disabled ? 'false' : 'true');
}
</script>
dark_mode.css
body
{
color: white;
background-color: black;
}
.sidebar-toc
{
background-color: #161617;
}
#footer, .mw-footer li
{
color: #bbbbbb;
}
body.skin-vector div.mw-page-container, #content
{
background-color: black;
}
#content pre:not([class*="CodeMirror"]), #content code, #content tt
{
background-color: #131314;
}
.wikitable, .wikitable > * > tr > th
{
background-color: #0d0d0c;
}
td[data-sort-value="1"]
{
background-color: #f44 !important;
}
td[data-sort-value="3"]
{
background-color: #ff4 !important;
}
td[data-sort-value="5"]
{
background-color: #4d4 !important;
}
td[data-sort-value]
{
color: #2a2a2a !important;
}
#content table, #content h1, #content h2, #content h3, #content h4, #content h5, #content pre, #content code, #content tt
{
color: #cccccc;
}
.catlinks
{
background-color: #181819;
}
could you please, include the pretty blue css form arch wiki???
Hi,
some distros (Fedora, CentOS) don't have Python3 as default yet.
Please specify python3 in shebang line to choose a proper Python version
Unless MediaWiki:Common.css is downloaded, some templates will look like this:
https://wiki.archlinux.org/index.php?action=render&title=Help:Reading#Pseudo-variables_in_code_examples
Can you please add a simple README.md file preferably with installation/dependency instructions?
I'm getting a error when try to run arch-wiki-docs.py
I already ran these commands before (found on arch package page):
sudo pip install cssselect
sudo pip install lxml
sudo pip install simplemediawiki
I'm on Ubuntu 14.04.
Error python3 arch-wiki-docs.py
:
Traceback (most recent call last):
File "arch-wiki-docs.py", line 6, in <module>
from simplemediawiki import build_user_agent
ImportError: No module named 'simplemediawiki'
sudo pip install simplemediawiki
Requirement already satisfied (use --upgrade to upgrade): simplemediawiki in /usr/local/lib/python2.7/dist-packages
Cleaning up...
Thanks for arch-wiki-docs.
I installed python2-simplemediawiki and python2-kitchen,
then:
$python2 ./arch-wiki-docs.py
Traceback (most recent call last):
File "./arch-wiki-docs.py", line 8, in
import ArchWiki
File "/home/archives/System/Distro/Arch/arch-wiki-docs-git/ArchWiki/init.py", line 3
SyntaxError: Non-ASCII character '\xc3' in file /home/archives/System/Distro/Arch/arch-wiki-docs-git/ArchWiki/init.py on line 3, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
There are some images in the File namespace, used in Template:Expansion etc. They should be included in the next release.
The simple index page should list available Main Page
and Table of Contents
pages.
Currently arch-wiki-lite isn't really maintained it seems, it would be cool if this script could also generate a plaintext version which saves a lot of space and doesn't require a browser per se.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.