spdx / fsf-api Goto Github PK
View Code? Open in Web Editor NEWThis project forked from wking/fsf-api
FSF License Metadata API
License: MIT License
This project forked from wking/fsf-api
FSF License Metadata API
License: MIT License
Python 3.6+ includes pathlib
and full support across the stdlib, which makes path munging (that the script does a lot of) much easier, more maintaible and robust with fewer edge cases. It would be a good idea (and fairly easy) to port the existing old-style os.path
and string path code over to use it.
A suite of basic checks using Pre-Commit would be a good idea to add to automatically spot likely errors, bugs, security problems, bad practices, style issues and more. This could also automatically check that commit messages are descriptive and follow the standard format, and commits are signed off.
Pre-commit makes this super easy, just drop in the config file, add the action to the CIs and it takes care of the rest, and it can run both locally with each commit and on CIs, while dramatically reducing the number of nitpicky issues that need to be handled manually in code review. Pre-commit installs, updates and runs the deps and even the runtimes required automatically in isolated environments, to avoid any extra workload to get a dev environment set up aside from running pre-commit install
.
I would also take the opportunity to clean up some remaining linting issues and add a few security scanners like Semgrep and CodeQL to the CIs, given this is a web API after all (perhaps in a separate PR), and add the necessary prefab sections to the contributing guide explaining this.
Classifications of licenses at https://www.gnu.org/licenses/license-list.en.html
As suggested in spdx/LicenseListPublisher#77 (comment), we could re-fork this repo so that folks make changes to this repo rather than the no-longer maintained upstream repo.
An email was sent to the SPDX tech list to seek feedback before moving the repo. If we don't hear any concerns, I'll move over the repo.
Currently, the repo contents are just one big standalone script. Making it a proper Python package would be fairly simple, would allow better code and data organization, and make installation, setup and running easier and more automated.
It could optionally be published to PyPI in the future to make it easier to install and run by a wider audience, though given at least for now, unless we make major breaking changes to the API that's likely not worth the maintainer overhead given the currently highly-integrated continuous deployment of the primary use case.
__init__
with version and __main__
Adding a security policy might be a good idea given the code both consumes and exposes web-facing data, and GitHub makes it easy to do. @goneall , if SPDX already has something of this sort, that would be preferable; otherwise I can draft something based on the standard template and go ahead with a PR so you can review it.
Currently, over half of the pull.py script is just data stored as long Python lists and dicts in module-level constants. It would be easier to navigate, maintain and edit if we separated these into proper data files, and would also allow us to more easily validate the resulting schema for any errors (e.g. with jsonschema, pydantic, dataclasses, etc).
For the data file format, the prime contenders are JSON, YAML or TOML. However, I'm thinking JSON makes the most obvious sense for the following reasons:
On the other hand, there are a few downsides vs the other two formats (and to a much lesser extent, the Python status quo), but for this application I'm thinking the aforementioned upsides outweigh them
It would be good to have some proper tests and run them on PRs and pushes; right now we have a smoke test that the script runs without errors/warnings and at least outputs something, but that's implemented as a CI action and can't as easily be run locally, and doesn't check that the output looks anything like what's expected.
Function by function unit tests are likely overkill, but setting up a few basic high-level functional tests with Pytest that not only the code runs without error, but that the output data at least roughly matches what we expect, the major CLI options are parsed correctly, etc. would be pretty straightforward. This would greatly increase confidence that PRs don't break anything, and thus aid long-term maintenance, as well as making it easy to test changes locally with a single command and detailed, user-friendly debugging output if tests fail.
Copied from wking#8
$ curl -s https://www.gnu.org/licenses/license-list.html | grep -A2 'id="SISSL"'
<dt><a id="SISSL"
href="http://www.openoffice.org/licenses/sissl_license.html">
Sun Industry Standards Source License 1.0</a>
$ curl -s https://www.openoffice.org/licenses/sissl_license.html | grep '<title>.*Version'
<title>Sun Industry Standards Source License - Version 1.1</title>
From @wking:
I suspect the FSF linked that page when it was hosting the 1.0 license or they typo'ed their title. The Internet Archive has 1.1 content there for their first archive on 2001-04-17. I haven't been able to turn up a copy of the 1.0 text.
SPDX carries identifiers for 1.1 and https://github.com/spdx/license-list-XML/blob/9f4432fbb660510859417b3d78a795beeeb8279b/src/SISSL-1.2.xml, so it would be nice to know what the FSF thinks about both versions. But with SISSL retired in 2005 it may not matter.
The following error was reported after merging PR #9
Error: Unable to resolve action `JamesIves/github-pages-deploy-action@v4`, unable to find version `v4`
Copied from wking#22
I need the full names for the SPDX licenses in the JSON file.
Currently, the API contents in the gh-pages
branch must be manually rebuilt. Building it in PRs and deploying it on pushes to the main branch will avoid the need to manually perform this convoluted and maintenance-intensive process, as well as document it in the Contributing Guide, and will keep the API content in sync with the code and serve as a basic test of PRs.
If desired, Netlify support could also be added, in order to preview the API output from PRs; my current GH workflows include that as well. The one difficulty there aside from granting app permission for this repo is managing the Netlify account, as it doesn't seamlessly integrate with Github's authentication. I'd either need to run the preview off my own Netlify account, or we could create one specifically for this project and share the credentials with each other. Not the cleanest way of accomplishing things, but not sure of a great alternative. So I suggest saving that for a followup, since it isn't critical.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.