iftechfoundation / ifarchive-ifmap-py Goto Github PK
View Code? Open in Web Editor NEWThe tool that generates the index files at http://ifarchive.org/ .
The tool that generates the index files at http://ifarchive.org/ .
It might be nice to include the full pathname of the added entry in the new-additions RSS feed. Currently the title is just the bare filename, and the body is the index description (if any). We could put the path in the title, or maybe as the first line of the description.
We're good on files with URL-escapable characters. (E.g. "Brain_Guzzlers_from_Beyond!.gblorb".) Those work.
However, I'm not sure we've tested:
That third one is a minefield. Linux and MacOS may not agree on how unicode is stored in the filesystem; Apache may have its own ideas. I hear that MacOS 10.13 is changing the filesystem again, too. So that will require very careful testing.
But the first two are definitely achieveable.
The index template can generate output like "1 Items" and "1 Subdirectories". Fix.
We could either add hacky map properties like "count1" and "subdircount1" (bools), or build a general tag query syntax {?count=1}.
The upload.py script was hacked out of somebody's example that supported uploading several files at once. I suspect that was implemented with some JS code to create new <input type="file">
entries in the form. This is worth re-creating, I think.
They're always at the top and nobody cares.
If you write an Index entry that looks like
# Games/Dr Ludwig and the Devil.zip
...then ifmap chokes with an error: "Index entry without file". The file exists but ifmap doesn't know to look down the directory tree.
This should be valid, and generate an index page with the correct link.
Note that subdirectory entries with slashes don't have this problem. (See https://ifarchive.org/if-archive/games/competition97/Index .)
Generating two sets of index page with different directory structures is a bad idea. It makes it hard to write sensible links in Markdown descriptions.
We can make the X-pages into external redirects at this point. I hope.
(It may be easiest to do this with an Apache rewrite config line.)
Currently the Index files have Unicode characters in HTML-escaped form -- ø
, &
, etc. (But not consistently; there are some bare &
as well.)
I would like to change these to literal Unicode characters, declaring that the Index files (and Master-Index) are all UTF-8. Then we can have ifmap.py escape them consistently when generating HTML and XML output.
(I think we're already serving plain text files with a UTF-8 content type header. Check this.)
(Not actually an issue with this repo, but I have to file it somewhere.)
We get 40-50% cache coverage from CloudFlare. We can probably improve that by turning up the cache lifetime.
I originally tried to balance the cache lifetime against the problem of old cache data lingering after a file was replaced. But now the admin tool has a "clear cache" button so this problem is much reduced.
Footnote: should we also add index pages to the cache now? (With a shorter lifetime, say 24 hours.) It would be fairly simple to cache-bump selected index pages when we update them. On the other hand, it would be a giant pain to cache-bump all of them, which we occasionally have to do. (E.g. when changing a web page template.)
Like file symlinks.
If we print a warning, it'll be visible in the admin tool.
We could shave another 1.4 seconds (25% or so) on Index-only updates if we skipped the date_X.html pages. (They have no metadata or description lists, so they don't need to be touched if only Index files have changed. In particular, the every-file list at date.html is enormous.)
Index-only updates are unfortunately not easy to detect.
Rename {name} to {htmlname} and {rawname} to {name}.
Conditional-check on {desc} (in File-List-Entry) so we don't have to create blank {desc} entries.
This business of sorting by ASCII (so that capitalized files are all on top) is so 20th-century.
We don't have to rewrite all 8000-odd index.html files every time we rebuild. Some smart date-checking could reduce that to a few dozen.
(And 16000 metadata files, don't forget those.)
We'd have to check the timestamp of all Index files, as well as all data files. Easy enough as a plan.
E.g., in https://ifarchive.org/indexes/if-archive/programming/ , ifp
, inform6
, and inform7
should really be up above.
Master-Index.xml has this info: <symlink type="dir">
so it should be a matter of tweaking ifmap.py logic.
EDIT: Quite a bit of tweaking though. dir.subdirs
wants to be a list of real subdirs, no symlinks. Do we include symlinks in subdircount
? Etc.
We include MD5 checksums in the Master-Index file, but MD5 is old and busted. Don't drop that, but include a newer checksum algorithm.
Not sure what the current sweet spot is. SHA3 exists in Python. Do all languages support it?
https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions
You can edit the whole Index file at the root, and individual file/subdir entries, but not the .
entry.
Looking at https://ifarchive.org/indexes/if-archive/games/zcode/
The "view contents" link for Apollo18+20.zip doesn't urlencode.
<a href="https://unbox.ifarchive.org?url=/if-archive/games/zcode/Apollo18+20.zip">View contents</a>
This doesn't work. Gotta be Apollo18%2B20.zip
.
E.g. https://ifdb.org/viewcomp?id=jhpnw6ta74dbak9k
.
Currently a tuid
line always loads an IFDB game page. They're separate namespaces.
My hacked-up template system wants to be Jinja, and the admin tool has Jinja as a requirement anyhow.
Jinja is probably faster although we should test that.
Nobody likes URLs like http://ifarchive.org/indexes/if-archiveXgamesXcompetition2016.html. Change these everywhere to http://ifarchive.org/indexes/if-archive/games/competition2016.html, creating a shadow directory tree in /indexes with the same structure as the main tree.
It should also create symlinks in the old locations to preserve old links. (Except where the old and new location are identical, e.g. http://ifarchive.org/indexes/if-archive.html.)
There are commands in build-indexes which chmod/chgrp all the index files after creation. These should walk the trees correctly, but I'll have to test that.
CloudFlare? CloudFront (Amazon)? See what's easy to set up.
We would deprecate the old mirror network. Mirrors could continue to operate, but we'd drop the list from the front page and make mirror.ifarchive.org a synonym of the main site.
Then remove robots.txt and see if everything survives. :)
Relying on ls -lR
is an old hack. It should just walk the directory tree, looking for Index files.
Then we could get rid of the LC_COLLATE hack, too. (Sort in Python code rather than using ls
order.)
The HTML layout doesn't have a maximum column width. Also the way it handles narrow (phone-size) displays is hacky.
Dan suggested flexbox at one point; that's probably better.
Note that we have a few basic layouts:
Right-hand info column: https://ifarchive.org/
Single column: https://ifarchive.org/misc/about.html
Left-hand info column: https://ifarchive.org/indexes/if-archive/
Doc page (currently uses the admin tool CSS): https://ifarchive.org/misc/org-overview.html
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.