iftechfoundation / ifarchive-ifmap-py Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 318 KB

The tool that generates the index files at http://ifarchive.org/ .

Python 80.12% HTML 12.73% Shell 7.15%

ifarchive-ifmap-py's People

Stargazers

Watchers

Forkers

dfabulich

ifarchive-ifmap-py's Issues

Master-Index link is illegible if you click on it in Google Chrome

When you click on the link to Master-Index.xml in Google Chrome, it just renders the file using Chrome's default stylesheet, which hides all tag names. It's illegible.

Include the full pathname in the RSS entries.

It might be nice to include the full pathname of the added entry in the new-additions RSS feed. Currently the title is just the bare filename, and the body is the index description (if any). We could put the path in the title, or maybe as the first line of the description.

Support files and directories with Unicode characters in the name

We're good on files with URL-escapable characters. (E.g. "Brain_Guzzlers_from_Beyond!.gblorb".) Those work.

However, I'm not sure we've tested:

Files with & in the name
Directories with space, &, ! in the name
Anything with unicode characters in the name

That third one is a minefield. Linux and MacOS may not agree on how unicode is stored in the filesystem; Apache may have its own ideas. I hear that MacOS 10.13 is changing the filesystem again, too. So that will require very careful testing.

But the first two are definitely achieveable.

Correct plurals

The index template can generate output like "1 Items" and "1 Subdirectories". Fix.

We could either add hacky map properties like "count1" and "subdircount1" (bools), or build a general tag query syntax {?count=1}.

Support multiple file upload

The upload.py script was hacked out of somebody's example that supported uploading several files at once. I suspect that was implemented with some JS code to create new <input type="file"> entries in the form. This is worth re-creating, I think.

Omit ls-lR and Master-Index from the by-date lists

They're always at the top and nobody cares.

Support subdir files

If you write an Index entry that looks like

# Games/Dr Ludwig and the Devil.zip

...then ifmap chokes with an error: "Index entry without file". The file exists but ifmap doesn't know to look down the directory tree.

This should be valid, and generate an index page with the correct link.

Note that subdirectory entries with slashes don't have this problem. (See https://ifarchive.org/if-archive/games/competition97/Index .)

Legacy -X-ified pages should be redirects

Generating two sets of index page with different directory structures is a bad idea. It makes it hard to write sensible links in Markdown descriptions.

We can make the X-pages into external redirects at this point. I hope.

(It may be easiest to do this with an Apache rewrite config line.)

Can <wbr> tags be used in the index?

Could <wbr> tags be added after slashes in the indexes, to avoid breaking words across lines? It would also avoid the awkward "if-" being on a line by itself.

Correctly escape & in Index files

Currently the Index files have Unicode characters in HTML-escaped form -- ø, &, etc. (But not consistently; there are some bare & as well.)

I would like to change these to literal Unicode characters, declaring that the Index files (and Master-Index) are all UTF-8. Then we can have ifmap.py escape them consistently when generating HTML and XML output.

(I think we're already serving plain text files with a UTF-8 content type header. Check this.)

Increase CloudFlare cache times?

(Not actually an issue with this repo, but I have to file it somewhere.)

We get 40-50% cache coverage from CloudFlare. We can probably improve that by turning up the cache lifetime.

I originally tried to balance the cache lifetime against the problem of old cache data lingering after a file was replaced. But now the admin tool has a "clear cache" button so this problem is much reduced.

Footnote: should we also add index pages to the cache now? (With a shorter lifetime, say 24 hours.) It would be fairly simple to cache-bump selected index pages when we update them. On the other hand, it would be a giant pain to cache-bump all of them, which we occasionally have to do. (E.g. when changing a web page template.)

Add/delete directory symlinks

Like file symlinks.

Report if an Index file contains a duplicate entry

If we print a warning, it'll be visible in the admin tool.

More optimization

We could shave another 1.4 seconds (25% or so) on Index-only updates if we skipped the date_X.html pages. (They have no metadata or description lists, so they don't need to be touched if only Index files have changed. In particular, the every-file list at date.html is enormous.)

Index-only updates are unfortunately not easy to detect.

Various template cleanups

Rename {name} to {htmlname} and {rawname} to {name}.

Conditional-check on {desc} (in File-List-Entry) so we don't have to create blank {desc} entries.

Sort files alphabetically

This business of sorting by ASCII (so that capitalized files are all on top) is so 20th-century.

Smart rewrite

We don't have to rewrite all 8000-odd index.html files every time we rebuild. Some smart date-checking could reduce that to a few dozen.

(And 16000 metadata files, don't forget those.)

We'd have to check the timestamp of all Index files, as well as all data files. Easy enough as a plan.

List directory symlinks under Subdirs rather than Files

E.g., in https://ifarchive.org/indexes/if-archive/programming/ , ifp, inform6, and inform7 should really be up above.

Master-Index.xml has this info: <symlink type="dir"> so it should be a matter of tweaking ifmap.py logic.

EDIT: Quite a bit of tweaking though. dir.subdirs wants to be a list of real subdirs, no symlinks. Do we include symlinks in subdircount? Etc.

Add SHA256 or something

We include MD5 checksums in the Master-Index file, but MD5 is old and busted. Don't drop that, but include a newer checksum algorithm.

Not sure what the current sweet spot is. SHA3 exists in Python. Do all languages support it?

https://en.wikipedia.org/wiki/SHA-3#Comparison_of_SHA_functions

Edit . button for root page

You can edit the whole Index file at the root, and individual file/subdir entries, but not the . entry.

Incorrect escaping for unbox links

Looking at https://ifarchive.org/indexes/if-archive/games/zcode/

The "view contents" link for Apollo18+20.zip doesn't urlencode.

<a href="https://unbox.ifarchive.org?url=/if-archive/games/zcode/Apollo18+20.zip">View contents</a>

This doesn't work. Gotta be Apollo18%2B20.zip.

Add a metadata option for IFDB comps

E.g. https://ifdb.org/viewcomp?id=jhpnw6ta74dbak9k.

Currently a tuid line always loads an IFDB game page. They're separate namespaces.

Switch to Jinja

My hacked-up template system wants to be Jinja, and the admin tool has Jinja as a requirement anyhow.

Jinja is probably faster although we should test that.

Get rid of the X-slash convention

Nobody likes URLs like http://ifarchive.org/indexes/if-archiveXgamesXcompetition2016.html. Change these everywhere to http://ifarchive.org/indexes/if-archive/games/competition2016.html, creating a shadow directory tree in /indexes with the same structure as the main tree.

It should also create symlinks in the old locations to preserve old links. (Except where the old and new location are identical, e.g. http://ifarchive.org/indexes/if-archive.html.)

There are commands in build-indexes which chmod/chgrp all the index files after creation. These should walk the trees correctly, but I'll have to test that.

Begin using a CDN

CloudFlare? CloudFront (Amazon)? See what's easy to set up.

We would deprecate the old mirror network. Mirrors could continue to operate, but we'd drop the list from the front page and make mirror.ifarchive.org a synonym of the main site.

Then remove robots.txt and see if everything survives. :)

make-master-index.py should walk the tree

Relying on ls -lR is an old hack. It should just walk the directory tree, looking for Index files.

Then we could get rid of the LC_COLLATE hack, too. (Sort in Python code rather than using ls order.)

Improve CSS layout

The HTML layout doesn't have a maximum column width. Also the way it handles narrow (phone-size) displays is hacky.

Dan suggested flexbox at one point; that's probably better.

Note that we have a few basic layouts:

Right-hand info column: https://ifarchive.org/
Single column: https://ifarchive.org/misc/about.html
Left-hand info column: https://ifarchive.org/indexes/if-archive/
Doc page (currently uses the admin tool CSS): https://ifarchive.org/misc/org-overview.html

iftechfoundation / ifarchive-ifmap-py Goto Github PK

ifarchive-ifmap-py's People

Stargazers

Watchers

Forkers

ifarchive-ifmap-py's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs