spdx / fsf-api Goto Github PK

This project forked from wking/fsf-api

FSF License Metadata API

License: MIT License

Python 96.00% HTML 2.58% Shell 1.42%

fsf-api's Issues

Use pathlib for path munging for easier and more robust handling than string paths

Python 3.6+ includes pathlib and full support across the stdlib, which makes path munging (that the script does a lot of) much easier, more maintaible and robust with fewer edge cases. It would be a good idea (and fairly easy) to port the existing old-style os.path and string path code over to use it.

Add basic pre-commit checks to check and fix issues and conform style

A suite of basic checks using Pre-Commit would be a good idea to add to automatically spot likely errors, bugs, security problems, bad practices, style issues and more. This could also automatically check that commit messages are descriptive and follow the standard format, and commits are signed off.

Pre-commit makes this super easy, just drop in the config file, add the action to the CIs and it takes care of the rest, and it can run both locally with each commit and on CIs, while dramatically reducing the number of nitpicky issues that need to be handled manually in code review. Pre-commit installs, updates and runs the deps and even the runtimes required automatically in isolated environments, to avoid any extra workload to get a dev environment set up aside from running pre-commit install.

I would also take the opportunity to clean up some remaining linting issues and add a few security scanners like Semgrep and CodeQL to the CIs, given this is a web API after all (perhaps in a separate PR), and add the necessary prefab sections to the contributing guide explaining this.

Add all licenses (free and non-free) from https://www.gnu.org/licenses/license-list.en.html and add FSF classification

Classifications of licenses at https://www.gnu.org/licenses/license-list.en.html

Free licenses, compatible with the GNU GPL
Free licenses, compatible with the FDL
Free licenses, incompatible with the GNU GPL and FDL
Nonfree licenses
Licenses for works stating a viewpoint

Fully move repository to SPDX organization

As suggested in spdx/LicenseListPublisher#77 (comment), we could re-fork this repo so that folks make changes to this repo rather than the no-longer maintained upstream repo.

An email was sent to the SPDX tech list to seek feedback before moving the repo. If we don't hear any concerns, I'll move over the repo.

Make code a proper Python package for better organization, installation and use

Currently, the repo contents are just one big standalone script. Making it a proper Python package would be fairly simple, would allow better code and data organization, and make installation, setup and running easier and more automated.

It could optionally be published to PyPI in the future to make it easier to install and run by a wider audience, though given at least for now, unless we make major breaking changes to the API that's likely not worth the maintainer overhead given the currently highly-integrated continuous deployment of the primary use case.

Move code to src/fsf_api and organize into modules
Add __init__ with version and __main__
Add basic setup.py and pyproject.toml
Add setup.cfg with package config and package data (+ MANIFEST.in)
Modify CIs to test packaging

FSF-SPDX issues in other repositories

These issues are referenced in https://git.savannah.gnu.org/cgit/directory.git/tree/subprojects/spdx/ISSUES

Add a security policy (SECURITY.md) for this repo

Adding a security policy might be a good idea given the code both consumes and exposes web-facing data, and GitHub makes it easy to do. @goneall , if SPDX already has something of this sort, that would be preferable; otherwise I can draft something based on the standard template and go ahead with a PR so you can review it.

Separate the data from the code, e.g. into JSON files, for easier maintenance

Currently, over half of the pull.py script is just data stored as long Python lists and dicts in module-level constants. It would be easier to navigate, maintain and edit if we separated these into proper data files, and would also allow us to more easily validate the resulting schema for any errors (e.g. with jsonschema, pydantic, dataclasses, etc).

For the data file format, the prime contenders are JSON, YAML or TOML. However, I'm thinking JSON makes the most obvious sense for the following reasons:

Same format as API output
Built into Python, no additional deps
Widely used and understood
Existing Python dicts/lists are already valid JSON or nearly so with minor changes (trailing comma, quote char)
Could even be exposed as part of the API if desired

On the other hand, there are a few downsides vs the other two formats (and to a much lesser extent, the Python status quo), but for this application I'm thinking the aforementioned upsides outweigh them

Harder for humans to read and write
No trailing comma and overall stricter syntax
Style less standardized

Add some basic functional tests and run them in CIs

It would be good to have some proper tests and run them on PRs and pushes; right now we have a smoke test that the script runs without errors/warnings and at least outputs something, but that's implemented as a CI action and can't as easily be run locally, and doesn't check that the output looks anything like what's expected.

Function by function unit tests are likely overkill, but setting up a few basic high-level functional tests with Pytest that not only the code runs without error, but that the output data at least roughly matches what we expect, the major CLI options are parsed correctly, etc. would be pretty straightforward. This would greatly increase confidence that PRs don't break anything, and thus aid long-term maintenance, as well as making it easy to test changes locally with a single command and detailed, user-friendly debugging output if tests fail.

FSF mentions 1.0 for SISSL but links to the 1.1 text

Copied from wking#8

$ curl -s https://www.gnu.org/licenses/license-list.html | grep -A2 'id="SISSL"'
<dt><a id="SISSL"
       href="http://www.openoffice.org/licenses/sissl_license.html">
       Sun Industry Standards Source License 1.0</a>
$ curl -s https://www.openoffice.org/licenses/sissl_license.html | grep '<title>.*Version'
<title>Sun Industry Standards Source License - Version 1.1</title>

From @wking:

I suspect the FSF linked that page when it was hosting the 1.0 license or they typo'ed their title. The Internet Archive has 1.1 content there for their first archive on 2001-04-17. I haven't been able to turn up a copy of the 1.0 text.

SPDX carries identifiers for 1.1 and https://github.com/spdx/license-list-XML/blob/9f4432fbb660510859417b3d78a795beeeb8279b/src/SISSL-1.2.xml, so it would be nice to know what the FSF thinks about both versions. But with SISSL retired in 2005 it may not matter.

Build failure from Actions update

The following error was reported after merging PR #9

Error: Unable to resolve action `JamesIves/github-pages-deploy-action@v4`, unable to find version `v4`

SPDX full names missing

Copied from wking#22

I need the full names for the SPDX licenses in the JSON file.

Add support for building, testing and deploying the API contents via GitHub Actions

Currently, the API contents in the gh-pages branch must be manually rebuilt. Building it in PRs and deploying it on pushes to the main branch will avoid the need to manually perform this convoluted and maintenance-intensive process, as well as document it in the Contributing Guide, and will keep the API content in sync with the code and serve as a basic test of PRs.

If desired, Netlify support could also be added, in order to preview the API output from PRs; my current GH workflows include that as well. The one difficulty there aside from granting app permission for this repo is managing the Netlify account, as it doesn't seamlessly integrate with Github's authentication. I'd either need to run the preview off my own Netlify account, or we could create one specifically for this project and share the credentials with each other. Not the cleanest way of accomplishing things, but not sure of a great alternative. So I suggest saving that for a followup, since it isn't critical.

spdx / fsf-api Goto Github PK

fsf-api's Issues

Use pathlib for path munging for easier and more robust handling than string paths

Add basic pre-commit checks to check and fix issues and conform style

Add all licenses (free and non-free) from https://www.gnu.org/licenses/license-list.en.html and add FSF classification

Fully move repository to SPDX organization

Make code a proper Python package for better organization, installation and use

FSF-SPDX issues in other repositories

Add a security policy (SECURITY.md) for this repo

Separate the data from the code, e.g. into JSON files, for easier maintenance

Add some basic functional tests and run them in CIs

FSF mentions 1.0 for SISSL but links to the 1.1 text

Build failure from Actions update

SPDX full names missing

Add support for building, testing and deploying the API contents via GitHub Actions

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs