plaidweb / publ Goto Github PK

Flexible publishing system for the web

License: MIT License

Python 84.35% Makefile 0.35% HTML 12.35% Shell 0.33% C++ 0.03% JavaScript 0.02% CSS 2.47% Batchfile 0.05% MDX 0.04%

cms site-generator flask-application html-template blog-engine python3 hacktoberfest2020 hacktoberfest hacktoberfest2021

publ's Introduction

Publ

A personal publishing platform. Like a static publishing system, only dynamic.

Motivation

I make a lot of different things — comics, music, art, code, games — and none of the existing content management systems I found quite satisfied my use cases. Either they don't allow enough flexibility in the sorts of content that they can provide, or the complexity in managing the content makes it more complicated than simply hand-authoring a site.

I wanted to bring the best of the classic static web to a more dynamic publishing system; scheduled posts, private posts, category-based templates, and built-in support for image renditions (including thumbnails, high-DPI support, and image galleries). And I want to do it all using simple Markdown files organized in a sensible file hierarchy.

Basic tenets

Containerized web app that's deployable with little friction (hopefully)
Do one thing (present heterogeneous content), do it well (hopefully)
Use external tools for site content editing
Be CDN-friendly
High-DPI images and image sets as first-class citizens
Interoperate with everything that's open for interoperation (especially IndieWeb)

See it in action

The main demonstration site is at https://beesbuzz.biz/ — it is of course a work in progress! The documentation site for Publ itself (which is also a work in progress) lives at https://publ.plaidweb.site/

Operating requirements

I am designing this to work in any WSGI-capable environment with a supported version of Python. This means that it will, for example, be deployable on any shared hosting which has Passenger support (such as Dreamhost), as well as on Heroku, Google AppEngine, S3, or any other simple containerized deployment target.

The file system is the ground truth for all site data, and while it does use a database as a content index, the actual choice of database doesn't matter all that much. A typical deployment will use SQLite, but MySQL, Postgres, Oracle, and Cockroach are also supported.

Developing Publ

In order to develop Publ itself, you'll need to install its dependencies; see the getting started guide for more information. In particular, make sure you have compatible versions of Python and Poetry installed, and, if on Windows, you'll probably need to install the Visual C++ build tools.

As far as developing Publ itself goes, cloning this repository and running ./runTests.sh (Linux/macOS/etc.) or wintests.cmd (Windows) should get you up and running. The runtime manual test suite site lives in tests/ (with the actual site content in content/, templates/ and static/).

For developing CLI functionality, you'll have to override the FLASK_APP environment variable to be test_app.py.

Additional resources

The Publ-site repository stores all of the templates, site content, and configuration for the Publ site.

The Publ-templates-beesbuzz.biz repository provides a stripped-down sample site based on my personal homepage.

Authors

In order of first contribution:

publ's People

Contributors

Stargazers

Watchers

Forkers

karinassuni the-cc-dev web-work-tools

publ's Issues

Fix default atom feed

The root URN comes up as / which makes feedvalidator unhappy.

Investigate caching issue on Dreamhost deployment

Expected Behavior

When a page or its dependencies is updated, it should be reflected after an amount of time.

Current Behavior

On Dreamhost, prior responses seem to be cached indefinitely.

Possible Solution

Probably a matter of setting response.headers['Cache-Control']; see these potential solutions. It is unclear if this caching is occurring in Flask or in Passenger (probably the latter).

Ability to have entry and view links to a specific pagination type

It would be nice to have archive links for entries and view, e.g.

entry.archive(paging='count',template='') where paging is one of day month year count, with optional template arg that specifies the template to use (defaults to '' i.e. index). (If archive=None this has no effect.)

Maybe this could be added to view.link as well although that lacks specificity for which date/reference/etc. to use, so requiring the template to do e.g. view.first.archive() would probably be ideal.

Relatedly, it would be nice if view.type returned the type of pagination (day month year count, or None if no pagination is in effect).

Enhanced path-alias

Currently we can map single specific legacy URLs to a specific entry or to a category template. It would also be great if we could map a legacy URL via some higher-level scripting functionality from the templates directly.

On beesbuzz.biz I have this in my main.py to do a quick-and-dirty version of this:

@app.route('/d/<int:date>.php')
def redirect_date(date):
    return flask.redirect(flask.url_for('category', category='comics', date=date))

but it would be great if this could be supported more directly in Publ itself.

Add support for table of contents rendering

Expected Behavior

There should be some means of signaling that we should render a table of contents in the more-text.

With this mechanism, we would set nesting_level on the HtmlRenderer and also run an HtmlTocRenderer with the same nesting level.

Possible Solution

entry._get_markup() could also take a show_toc=N parameter, which is set to None on entry.text and to a value indicated by a Toc-Level header on the entry itself. Perhaps it should take the minimum of the entry's value and the template's value (so both the template and the entry need to want a TOC for it to display).

The TOC content itself could be wrapped in a <div class="toc"> (or maybe an id with the caveat that it only makes sense to render a single entry's more content on a single page).

Context

Would be great for the manual, among other things.

Publ could really use some actual automated tests

Unit tests where possible, integration tests in general.

Integration tests could be in the form of a test runner which is able to run through all of the test pages to capture their output and compare it with a reference output (with some added stuff for scenario testing).

Improve configuration/deployment management

Configuration is currently handled with a config.py and potentially deploying multiple git repositories to a server, which isn't particularly Heroku-friendly, among other things. To make this work well it'll probably be necessary to make a deployment wrapper project that people can fork or the like, which will add Publ and the site-specific files as git submodules or the like (and write out the config.py file as necessary).

It'd probably be a good idea to move to an actual configuration framework, for that matter, so that a single config file can be provided to both Flask and Publ.

entry.previous_in/next_in should allow better filtering

Expected Behavior

The previous_in/next_in functionality on entry template objects should also allow for filtering based on entry type.

Current Behavior

Currently it can only filter on category, which means things like sidebar links etc. can't be excluded from a flow.

Possible Solution

Suggestion: add the entry_type and entry_type_not query generators into the queries utility class to make this functionality a bit more commonizable.

Tagging system

It'd be helpful to be able to tag/filter posts (separately from Entry-Type); Entry-Type generally is used only for affecting layout/visibility, whereas tags are used for filtering with one or more attributes. (For example, in a comic, being able to tag entries based on characters or subject matters.)

An efficient strategy for implementing this would start with adding Tag and EntryTag columns to the database, where Tag provides the plaintext strings and EntryTag associates them with the entry. When an entry is scanned its EntryTag associations would be destroyed before the existing ones are added. Adding the appropriate joined expression to the view query is not something I want to think about right now.

Implement view pagination, sorting

view.View needs to be able to take in restrictions that help us paginate, including:

limit count
entry-relative
date-relative

and the relative pagination types need to also generate sensible automatic previous/next links. Sort criteria also need to be a part of this.

Support image renditions

Supporting multi-resolution image renditions.

Requirements:

tags for image (<img src="..." srcset="...">), gallery (using lightroom), rendition URL (for templated stylesheets)
ability to configure overall rendition at the template level (via the markdown processor, arguments passed to entry.body and entry.more) and on the gallery or individual image request
should be able to look at images as relative path to entry file, category, or image store (based on their respective positions in the filesystem) so that images can be moved around with their .md or template files that reference them
configurable output directory and URL mapping

Template meta-object

Templates need a meta-object about themselves. It should include things like:

file path
modification time

Better mechanism for scanning content changes

Expected Behavior

When scanning new content, the watchdog process should probably just add stuff to a "things to process" queue that gets serviced in order, rather than trying to do it asynchronously.

Current Behavior

Right now, if you edit an entry on a live server, watchdog sees the file get written out, causes
it to be processed, then it sees that file get written out and due to order of operations ends up
reassigning the ID again, leaving turds behind in the index. This also happens when doing a git pull
onto the server, among other things.

There was a previous mitigation for the issue but something caused that to break.

Possible Solution

Both scan_index and IndexWatchdog should simply visit the filesystem nodes and put
them into a reprocess queue. Each step of the os.walk thing should be a separate async task, as well.

Other nice things to have

First-scan deployments on sites with a few thousand entries can take quite some time, which isn't ideal. Even if a site is incomplete as it starts up, it should still be able to show something.

There could also be a status dashboard (routed to /_status or whatever) that shows the current task queue length and so on.

Asynchronously generate image renditions

Expected Behavior

Rather than blocking the renderer on all images being present, it'd be nice to have them get asynchronously generated by a background worker.

When a new rendition is being generated the resulting URL should point to a proxy that waits for the rendition to be finished before serving up the file, so that they don't 404.

This will also probably require enabling async/non-blocking stuff to Flask to really be worthwhile, and requires further research along those lines.

Context

First render of a large image gallery can take quite some time, or even result in an ISE or process death due to a watchdog timer. And in most deployments the first render of a complex page will make the whole site freeze while it generates.

First/last entry in category

It would be useful to have category.first(recurse=False) and category.last(recurse=False) as convenience functions (returning the appropriate Entry), even if they’d be the same basic thing as get_view(category=...,limit=1).entries[0] (which is unwieldy).

Add support for math markdown

Expected Behavior

Things inside of Markdown math markup should be rendered as math expressions.

Current Behavior

Math expressions get converted to raw LaTeX expressions, which are ugly and nonsensical.

Possible Solution

Look into existing math render libraries or LaTeX bindings for Python.

Steps to Reproduce (for bugs)

Make an entry with math content, e.g.

\\( x^2 + y^2 = z^2 \\)

$ x^2 + y^2 = z^2 $

Context

Lots of folks want math to be easy to deal with.

Try to detect multiple entries with the same ID

Current Behavior

Sometimes it's possible to accidentally assign the same ID multiple times, like if you're working
on an entry in a branch and work on another entry in a different branch, and the IDs get generated on different machines with different existing indexes.

Possible Solution

One possible approach: when scanning an entry, if the entry record already exists and has a different pathname, see if the other pathname is still valid. If so, throw an error so it ends up in the log. Or something.

Another possibility: stop using autoincrement indexes, you silly

Context

This seems unlikely to happen on real sites but I've run into it a few times while developing the example site content.

Provide human-readable name for views

Expected Behavior

A view object should have a way of getting some sort of string that indicates what the view is of;
for example, providing the applicable portions of the formatted date for a date range, or the range of entry dates in the case of a limit page.

Ideally this would be in a way that's reasonable to format for different languages and so on.

Context

I would like paginated views to have a better label than just "previous page" or "next page"

HTML not being properly escaped for atom feed

It appears that Jinja's "safe" markup doesn't actually re-encode & characters correctly, and as such Atom feeds showing code blocks don't get properly escaped out.

Example entry: http://publ.beesbuzz.biz/324

The entry looks like this:

But the resulting Atom item renders like this (in Feed On Feeds):

Get rid of SelfStrCall

Current Behavior

The functionality exposed by utils.SelfStrCall is now mostly redundant with utils.CallableProxy, which also does more useful stuff. Convert all of the SelfStrCall subclasses to CallableProxy equivalents.

Change to MIT license

Serves me right for just grabbing the first one that came to mind. MIT is GPLv2 compatible, Apache is not.

Remove deleted files from the mtime cache

Not that it particularly matters, but it'd be nice to remove files from the mtime cache when they're deleted.

Support for embeds via image tag

Expected Behavior

Per the blog post on extending Markdown, an "image" that can be handled by an embedding plugin should hand the URL off to the embedding logic.

For example, ![](https://www.youtube.com/watch?v=oHg5SJYRHA0) should embed the linked video inline. It would also be able to use arguments the same as images, for example ![{320,240}](https://www.youtube.com/watch?v=oHg5SJYRHA0).

Current Behavior

The only way to embed stuff is by copy-pasting the website's own embed <iframe> code or whatever. Which admittedly isn't the worst thing in the world.

Category metadata mechanism

Expected Behavior

Right now it's up to templates to provide default information about categories and entries and so on. It'd be nice to have some sort of metadata mechanism that provides a key-value storage mechanism for defaults for unspecified headers; for example, have a default Author header for the template to draw from (unless overridden in the entry), or make a category default to Publish-Status: DRAFT or whatever. it should also provide a means for adding metadata to the category itself, e.g. category.name, category.description and so on.

Possible Solution

category.meta can be a category-mapped file that is used when constructing the Category object. The Entry object can also proxy to it as a fallback for headers. So for example it could look like:

Name: Frumious Bandersnatch
Description: Everything you want to know about the ongoing adventures of the Bandersnatch, who is
    quite frumious when they play with their best friend, the mome rath.
Entry-Defaults:
    Author: fluffy
    Publish-Status: DRAFT

Considerations

One potential issue is that because this might affect things like Publish-Status, these headers will need to be available at index time, and if they change all files will have to be reindexed.

Also there is an implicit bit of weirdness when thinking about setting a default Category, which would certainly be useful in many cases (such as photo galleries) but then what does this mean if a category's entries are automatically assigned to a different category that have different defaults? Perhaps entry defaults should be in a different file e.g. defaults.meta which explicitly only inherit based on filesystem location and not based on category.

entry.next/previous doesn't respect item visibility

Expected Behavior

entry.next and entry.previous should only show entries which are PUBLISHED

Current Behavior

It will happily link to entries which are DRAFT (which can't be seen at all), or ones which are SCHEDULED and in the future (which shouldn't be seen). And, for that matter, ones which are HIDDEN which probably shouldn't be part of the entry flow.

Possible Solution

entry.Entry.__getattr()__ adds the appropriate constraints to the next/previous getters

Change `getattr` ad-hoc binding to `@property` where applicable

Rationale

Ad-hoc __getattr__ for populating properties both makes PyLint unhappy and is difficult to properly document. It'd be a good idea to switch to @cached_property, which exists for a reason.

Entry auto-excerpts

Expected Behavior

For the purpose of generating link cards and the like it'd be useful for the entry template object to take parameters that allow enabling of an auto-generated summary.

Current Behavior

Lots of things can't figure out where the actual entry content starts. See this Patreon excerpt for example, where the autogenerated excerpt is:

Because it's easy to get started with and it's what I know, and provides some pretty decent flexibility while also having a nice ecosystem of modules that I might be using in the future. This decision isn't set in stone, though, and the number of specific dependencies on Flask are pretty minimal.

which appears to be the first content section below an <h4> tag. Why they chose <h4> for the autoexcerpt cutoff is a mystery!

Possible Solution

The CallableProxy handler registered for Entry.body could take an excerpt=True argument, which should provide an HTML-stripped version of the first paragraph-looking part of the body.

UTF-8 titles get round-tripped weirdly

Expected Behavior

An entry's rewritten Title element shouldn't get mangled if it contains UTF-8

Current Behavior

When an entry gets rewritten, something (probably the email.message parser) puts out a bunch of junk that signifies the string as being a UTF-8 encoding; but then it doesn't decode this correctly on reload.

Steps to reproduce

Make a new entry with something like:

Title: UTF-8 title rewriting 😀

and save it out. The indexer rewrites the file and it reloads as:

Title: =?utf-8?q?UTF-8_title_rewriting_=F0=9F=98=80?=
Date: 2018-05-10 19:10:33-07:00
Entry-ID: 149
UUID: 2634a71d-e6c1-48de-9c6c-26ab467f8624

and the mangled title is what displays with the entry.

Automatically generate manual from pydoc

Trying to manually keep the manual in sync with the code is getting a bit annoying. This needs to be automated.

Best served by also implementing #51

Entry previous/next links

Implement previous/next entries, primarily to be used on entry templates.

Requirements:

Returns a new entry.Entry object (rather than just a link)
Can specify the category it's relative to (defaults to the entry's category) and possibly the adjacency criteria (defaults to date-based)
- If a category is specified it should probably be assumed to be recursive

Add ability to paginate from empty limited views

Expected Behavior

Trying to get pagination from an entry view (e.g. where it was built from a date with no entries, or a limit ID that exceeds the entry sequence) should still return a valid pagination.

Current Behavior

When there are no entries in the view, the pagination simply gives up, which is bad UX.

Possible Solution

If there's no entries to paginate from, next and previous should refer to the first entries in either direction that would appear. This is straightforward on date views (simply do a <= and >= query on the date) but for limit views it'll require some finagling based on the current view limit, and if the specified ID doesn't actually exist we'll probably still need to just throw up our hands.

Steps to Reproduce (for bugs)

Go to any date-based view with no entries (e.g. http://publ.beesbuzz.biz/blog/?date=1929-01-15)

There are an awful lot of wish-it-were-ternary-expressions

Because I am so new to Python I wasn't aware that Python actually has a ternary expression.

test and foo or bar

is an antipattern that I use way too much, and I need to find and fix those to be:

foo if test else bar

instead.

Reference: https://stackoverflow.com/a/394814/318857

Item deletion isn't working correctly

Trying to delete an item from disk causes an ISE on Dreamhost.

Support non-English slug text

The current slug text generation makes some pretty bad assumptions about languages. It should probably use a Unicode/locale-aware non-character stripper and then URLencode the result.

Find workaround for bogus Dreamhost .php routing rule

I sent this support ticket to Dreamhost regarding an issue with their Passenger setup that prevents PathAlias from working fully; I am not optimistic that they are going to solve this, however, and so a workaround might be necessary:

Path mapping problem with Passenger and .php extensions

Hi, I'm trying to set up a new Passenger-based site on http://publ.beesbuzz.biz, and it seems like the way that the /public routing rules get applied gets in the way of Passenger path mapping.

One of the things I'm trying to provide in this application is the ability to map legacy URLs to the new routing system, so for example there is a redirection from http://publ.beesbuzz.biz/some-old-url.php to http://publ.beesbuzz.biz/tests/321-bladlkfjal. However, what seems to happen is that in the Passenger frontend, it's seeing that the URL ends in .php and attempts to load the PHP file first, and when that fails, Passenger gets a request to retrieve /missing.html instead.

To make things more irritating, the Passenger app never even sees the original file request - it only sees the (erroneous) request for /missing.html, and so I have no way to work around this.

Would it be possible to fix Passenger to only pass in the original request for the URL, rather than the request for the error page? I'm not sure how one would configure that behavior in Passenger but the error handler needs to make the fallback request based on the original URL, NOT based on what the Apache ErrorDocument directive says.

For an example of the request routing working correctly, http://publ.beesbuzz.biz/some-old-url.html works - the problem seems to ONLY be with incoming URLs that end in .php, which tells me it's a PHP-specific configuration issue, where a missing .php file is internally/silently rewriting the request URL to the ErrorDocument handler's result.

Possible workarounds (if they don't fix this):

use RewriteRule and a special path handler for path aliases
maybe ErrorDocument can be configured for this instead?
???

Investigate better Markdown engine

Python-Markdown kinda sucks. There exists Github-flavored markdown for Python (including code formatting and it seems to support some sort of extensions API for our custom tags as well.

Date-based filtering isn't working right

Expected Behavior

The date-based filter should filter with the same timezone as the site's configuration.

Current Behavior

Filtering seems to take place based on UTC, while pagination happens based on the local timezone. This causes extra-fun problems where this happens, e.g. on http://publ.beesbuzz.biz/blog/?date=2018-04-17

and then the next page http://publ.beesbuzz.biz/blog/?date=2018-04-15 turns up blank.

Explain template mapping strategy in understandable English

The template-mapping algorithm makes logical sense until you try explaining it. This needs to be fixed in the manual.

Figure out why watchdog isn't working on Dreamhost

Expected Behavior

watchdog should be notifying the index of updates on Dreamhost like it does on desktop

Current Behavior

It isn't.

But the command line tool (watchmedo) seems to work fine.

Maybe Passenger does something weird to threads, but the Sqlite background journaling thread is still working...

File mtime table for speeding up startup

There should be a thing that tracks the mtime of the last parsing if all files, stored at the index level (rather than entry, image, etc) that drives the startup scan. No sense parsing unchanged files right?

Counterpoint: might be some weird with cases of ID collision. But touching the affected files would put it right anyway.

If app is not in debug mode, handle internal errors

When running in production it'd be nice to have the ISE map to our own error 500 handler rather than getting the default ugly page.

view should generate a reasonable last_modified

The view.View object currently just returns arrow.now() for last_modified. This is not ideal.

Clean up Publ site templates

Right now there's a lot of duplicated code; it'd be nice to use Jinja's blocks/inheritance functionality to simplify things.

Return error (404?) for empty/nonexistent categories

If someone requests a category page for a category which doesn’t exist (I.e. there are no entries in that category or any subcategories) we should return an error rather than an empty category.

Entry link styles

Entry links can come in multiple forms, including ID-only or fully-expanded, and they could be handled as relative or absolute. It would be great if we could get those from the entry.link attribute in templates.

Caching configuration isn’t being pulled in

Expected Behavior

The cache configuration should be read after the configuration is applied

Current Behavior

caching.py is using the initial cache config values when it’s first imported, meaning it’s always {}

Need to specify a temporary directory

Using Python best-practices for temporary files is causing problems on Dreamhost, where $HOME/var can be on a different filesystem meaning atomic moves don't work correctly. The application should be configurable with a temporary directory that is on the same filesystem as the content files.

Upgrade arrow to > 0.12.1

This is a placeholder to remind myself to check for Arrow upgrades; 0.12.1 doesn't support tzinfo specification for strings, but the next version will.

Fix <p> nesting for image sets

Expected Behavior

Image sets should be their own block-level <div>, or at least shouldn't wrap a <div> inside a <p>.

Current Behavior

Image sets get nested inside a containing <p>, which is valid for single images but not valid for a <div>

Possible Solution

Cheesy approach: emit a <span> instead (which satisfies the letter if not the spirit of the nesting rules)

Less cheesy approach: somehow get Misaka to not emit a <p> in the case of an image set, or have it replace the containing <p> with the <div> for the image set.

Extreme cheese approach: close the <p> and open a new one (and then maybe post-filter empty <p> out, which actually makes some amount of sense I guess?)

Steps to Reproduce (for bugs)

https://validator.w3.org/nu/?showsource=yes&showoutline=yes&doc=http%3A%2F%2Fpubl.beesbuzz.biz%2Fblog%2F249#l50

Date-based pagination isn't overriding the limit

Expected Behavior

Specified date should override limit-based pagination

Current Behavior

It doesn't; see http://publ.beesbuzz.biz/blog/?date=2018-04 for example

Blocks completion of #13

plaidweb / publ Goto Github PK

publ's Introduction

Publ

Motivation

Basic tenets

See it in action

Operating requirements

Developing Publ

Additional resources

Authors

publ's People

Contributors

Stargazers

Watchers

Forkers

publ's Issues

Expected Behavior

Current Behavior

Possible Solution

Expected Behavior

Possible Solution

Context

Expected Behavior

Current Behavior

Possible Solution

Expected Behavior

Current Behavior

Possible Solution

Other nice things to have

Expected Behavior

Context

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Current Behavior

Possible Solution

Context

Expected Behavior

Context

Current Behavior

Expected Behavior

Current Behavior

Expected Behavior

Possible Solution

Considerations

Expected Behavior

Current Behavior

Possible Solution

Rationale

Expected Behavior

Current Behavior

Possible Solution

Expected Behavior

Current Behavior

Steps to reproduce

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Path mapping problem with Passenger and .php extensions

Expected Behavior

Current Behavior

Expected Behavior

Current Behavior

Expected Behavior

Current Behavior

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Expected Behavior

Current Behavior

Recommend Projects

Recommend Topics

Recommend Org

Jobs