creativecommons / ccos-scripts Goto Github PK
View Code? Open in Web Editor NEWScripts used to maintain various pieces of CC's open source presence.
License: MIT License
Scripts used to maintain various pieces of CC's open source presence.
License: MIT License
We need Intern/GSoC/Outreachy tag so that they repositories that will have issues tagged that way will have consistent tags.
Tags are maintained in normalize_repos/labels.py
List repositories
list_repos.py
:
#!/usr/bin/env python3
# Standard library
import os
# Third-party
from github import Github
GITHUB_TOKEN = os.environ["ADMIN_GITHUB_TOKEN"]
github_client = Github(GITHUB_TOKEN)
cc = github_client.get_organization("creativecommons")
repos = []
for repo in cc.get_repos():
repos.append(repo.name)
repos.sort()
for repo in repos:
print(repo)
The following line uses a logger named sync_community_teams.py
:
ccos-scripts/sync_community_teams.py
Line 16 in cebc132
However the other modules use a logger named sync_community_teams
:
ccos-scripts/ccos/teams/get_community_team_data.py
Lines 34 to 35 in cebc132
In some of the files, a global logger variable is defined (logger
), but never used.
Defined:
ccos-scripts/ccos/teams/get_community_team_data.py
Lines 34 to 35 in cebc132
Never used (logging
is used instead of logger
):
The global variable should be UPPERCASE and shortened. If the ccos.log
module is imported in the full namespace then there is not a collision (technically, it's more about avoiding misunderstandings as LOG
and log
differ in Python)
Using LOG
instead of Logger
would also make it harder to confuse with logging
.
functional and easier to use, please
Manage Issues in Projects (move_closed_issues.py
) is repeatedly failing (since 2023-07-01):
Unhandled exception: Traceback (most recent call last):
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 75, in move_cards
done.create_card(
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/ProjectColumn.py", line 141, in create_card
headers, data = self._requester.requestJsonAndCheck(
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/Requester.py", line 353, in requestJsonAndCheck
return self.__check(
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/Requester.py", line 378, in __check
raise self.__createException(status, responseHeaders, output)
github.GithubException.GithubException: 422 {"message": "Validation Failed", "errors": [{"resource": "ProjectCard", "code": "unprocessable", "field": "data", "message": "Project already has the associated issue"}], "documentation_url": "https://docs.github.com/v3/projects/cards/#create-a-project-card"}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 99, in <module>
main()
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 94, in main
move_cards(args, github, backlog, done)
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 79, in move_cards
except github.GithubException as e:
AttributeError: 'Github' object has no attribute 'GithubException'
GitHub Action should complete successfully an overwhelming majority of the time.
The added workflow which "Add Community PRs to Project" information is not updated to the README.
When doing issue triage, it's hard to tell the difference between issues that have not been triaged, and as such, are marked as not ready for work
, and issues that have been triaged, but actually are not ready for work.
We could add an awaiting triage
label as default to issue templates.
We could use blocked
for what we're currently using not ready for work
for, but an awaiting triage
label would be much more specific.
Our logger uses explicit ANSI color coding for log states like success, warning and error but does not use the GitHub notation making the (extremely verbose) logs less navigable and harder to review when debugging.
GitHub Workflows have some special notations for different outputs that lead to different presentation of those outputs in the UI. They are documented in the GitHub docs.
Notable notations include the ones for DEBUG, WARNING and ERROR messages:
::debug::{message}
::warning file={name},line={line},col={col}::{message}
::error file={name},line={line},col={col}::{message}
Configuring the customised logger to output messages in this format can improve the logging system drastically in terms of readability and improve debugging experience.
Push issues to CC Open Source workflow is failing since last 8 days (maybe more).
Go here
Somewhat related to #81
Error seems to be authentication failure
in both cases.
sync_community_teams.py
adds additional lines with the same pattern
For example:
* @creativecommons/technology @creativecommons/ct-vocabulary-maintainers @creativecommons/ct-vocabulary-core-committers
* @creativecommons/ct-vocabulary-collaborators
About code owners - GitHub Docs:
If you want to match two or more code owners with the same pattern, all the code owners must be on the same line. If the code owners are not on the same line, the pattern matches only the last mentioned code owner.
Both labels.json
and skills.json
are not very readable for the following reasons:
labels.json
file uses emoji as unicode glyphs, which look like two codepoints in a string. It's not possible to identify the emojis from the text.GitHub allows using "colon-style markup" instead of native emoji. We should use that as the value of the emoji key. Additionally YAML would be a much better option for files that are manually read or written.
The current implementation works. Not working on the issue is the alternative.
Currently the workflow scripts, notably those in push_data_to_ccos/
and sync_community_team/
are riddled with print()
statements serving the function of logging. While this is a perfectly workable solution, it is by no means elegant. Also the process of manually indenting the messages in the print()
statements is messy and error-prone and makes for hard to comprehend logs.
Python comes with a nifty logging
module that should be used instead. Sprinkle some ANSI colouring and ๐คฏ .
The unit test task in Vocabulary is named test
, and a functionally identical task in Vue Vocabulary is named unit
.
Two tasks that performs the same function should be named the same for consistency. The name unit
for unit tests seems to be a good one.
The Python packages of the repository have older versions of black in Pipfile
which makes various packages of the repositories.It can be a better approach to optimise these configurations by
The configuration could be optimised with that of the configs in the creativecommons/cc-licenses
While labels are neatly placed in their own file, the other constants are still put in with the main source code.
The main benefit to be gained from putting them in separate files is separation of functions from constants, verbs from nouns.
During handling of the #164 exception, another exception occurred:
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/__init__.py", line 1100, in emit
msg = self.format(record)
File "/opt/homebrew/Cellar/[email protected]/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/__init__.py", line 943, in format
return fmt.format(record)
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/log.py", line 88, in format
record.function = stack[self.cut].function
IndexError: list index out of range
do not break
For various ethical and technical reasons, we should establish best practices when writing scripts that automate code actions:
We have a @cc-open-source-bot account that should be used as the author for all actions, workflows, and scripts. A few reasons to do this:
Pull requests from the community (anyone other than CC staff) on CC Search related repos need to be moved to the "In Progress (Community)" column in the Active Sprint โ CC Search project.
We need to create a GitHub workflow in this repo that runs every hour and checks for PRs on the CC Search related repositories that don't have an associated project (PRs by CC staff are automatically moved by a separate GitHub workflow that's already deployed). We need to move these PRs into the correct project and column.
Non-engineering repos, identified by engineering_project
set to false
in the CC metadata file present in each repo should not be subjected to branch protection normalisation in the normalize_repos
workflow.
While Python does not see the label "hacktoberfest" and "Hacktoberfest" as equal, GitHub does. So if there is a case difference, the script will try to create the label and GitHub will raise an error that it already exists.
ccos-scripts/normalize_repos/models.py
Line 106 in 7d7ee1d
Python should be able to sync the names to their correct values, while accounting for different cases when matching labels to the JSON file.
Implement GitHub retries for 5xx status codes
The 502 error is on GitHub's end, and as you say it is transient. It can happen any time. Retrying your request should be the best strategy here.
Provide a
urllib3.util.retry.Retry
instance to PyGithub, that is configured to retry 5xx responses on all types of requests:from urllib3.util.retry import Retry g = Github("access_token", retry=Retry(status_forcelist=list(range(500, 600)), allowed_methods={'DELETE', 'GET', 'HEAD', 'OPTIONS', ,'POST', 'PUT', 'TRACE'}))Then PyGithub will retry requests that fail with 5xx errors for you.
https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html
In the future, the default value of the retry
parameter will be better. The code necessary has been merged:
However, at the time of writing, the code has not yet been released to stable (see Release v2.0.0-preview ยท PyGithub/PyGithub).
See occasional errors in GitHub Actions logs
<module>: Unhandled exception: Traceback (most recent call last):
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 101, in <module>
main()
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 96, in main
move_cards(args, github, backlog, done)
File "/home/runner/work/ccos-scripts/ccos-scripts/./move_closed_issues.py", line 70, in move_cards
content = card.get_content(content_type="Issue")
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/ProjectCard.py", line 135, in get_content
headers, data = self._requester.requestJsonAndCheck("GET", url)
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/Requester.py", line 442, in requestJsonAndCheck
return self.__check(
File "/opt/hostedtoolcache/Python/3.10.12/x64/lib/python3.10/site-packages/github/Requester.py", line 487, in __check
raise self.__createException(status, responseHeaders, data)
github.GithubException.GithubException: 502 {"message": "Server Error"}
Mitigate intermittent server errors so that only "serious" errors are reported.
Currently the code in the repository is organised in directories on the basis of the GitHub Actions workflow they enable, each with its own Python environment and each independent of the other. This however does not take into account that multiple workflows may share code, utility functions and constants with each other.
For example in #58, a number of constants and functions were duplicated from push_data_to_ccos/
to sync_community_team/
because the way the repos are organised prohibits sharing. I believe the code can be better organised to be more DRY, all the while maintaining a reasonable amount of separation between the different workflows.
Even when concerns are separated, constants and common functions like setting up a client to GitHub or Asana should be placed in a central location and shared.
Now that Hacktoberfest is all done and dusted with, we can go ahead and remove the Hacktoberfest labels. We'll also have to run the necessary scripts with admin rights in order to remove labels attached with the existing Hacktoberfest issues.
I can go ahead and remove the Hacktoberfest associated labels from the labels.json
file as suggested by @dhruvkb.
The use of the strict mode in branch protections requires any branch to be up to date against the main branch to be merged. This causes all the CI checks to run all over again and causes a lot of time to be wasted waiting for checks to pass. Ideally branches should be up-to-date but not we should not be enforcing them to be up-to-date. Conflicts will still be reported and CI will run again once the conflicts are resolved.
Referring creativecommons/creativecommons.github.io-source#481
On the Community Team page, the ordering of the roles under CC Catalog API and Community Building Teams seems to be in reverse order than all others (This is only the case with these two categories).
It looks a little inconsistent with the other Projects ordered differently.
Visit the Community Team page[https://opensource.creativecommons.org/community/community-team/members/]
The order should be reversed to have decreasing priority of roles.
The normalize repos script is only run manually currently. We want it to automatically run so that our GitHub repos are always up to date with the latest labels and settings.
Set up a GitHub Actions workflow in this repo to run the script every day.
The git push
action in the Generate Project Pages action is not working because of authentication issues. The push
command from GitPython
does not use the credentials supplied earlier in the file.
See https://github.com/creativecommons/ccos-scripts/runs/550457801?check_suite_focus=true for example.
origin.push()
should use the credentials that the repo was initialized with earlier in the file.
Adding a label in the normalize_repos script for issues that are meant to track a meta discussion or larger goal rather than a specific issue or feature, or an issue representing a large task that's meant to track several smaller tasks. I don't think there's any label that quite covers this exactly.
I'm not sure if "meta" is quite the right word for this, although I think it could work.
We could widen the scope of an existing label, but I don't think that any of the existing labels' descriptions would really work for this purpose if widened.
I could do this, but it would also be a quick and easy opportunity for outside contributors.
Skill labels need to be manually set up in skills.json
. This
cc-metadata.yml
under the technologies
key in each repoThe technologies key could be expanded to a more verbose structure to describe languages, libraries and frameworks and then used to eliminate the need for skills.json
entirely. This presents benefits such as
The improvements to .cc-metadata.yml
can be a good-to-have part of the feature if there isn't a consensus to proceed with that. But the automation of skill labels is still beneficial. Also it does not have to be binary, a combination of .cc-metadata.yml
and skills.json
would also be a fine solution.
Traceback (most recent call last):
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/./sync_community_teams.py", line 36, in <module>
main()
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/./sync_community_teams.py", line 30, in main
create_codeowners_for_data(get_community_team_data())
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/teams/set_codeowners.py", line 63, in create_codeowners_for_data
check_and_fix_repo(organization, repo, teams)
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/teams/set_codeowners.py", line 109, in check_and_fix_repo
set_up_repo(repo)
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/teams/set_codeowners.py", line 210, in set_up_repo
origin.pull()
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/git/remote.py", line 1045, in pull
res = self._get_fetch_info_from_stderr(proc, progress, kill_after_timeout=kill_after_timeout)
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/git/remote.py", line 848, in _get_fetch_info_from_stderr
proc.wait(stderr=stderr_text)
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/git/cmd.py", line 604, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(1)
cmdline: git pull -v -- origin
The use of tempfile.TemporaryDirectory()
would make this more robust/less fragile
Remove duplicate files (CODE_OF_CONDUCT.md, CONTRIBUTING.md) from the repository to ensure alignment with organizational defaults as outlined in the GitHub Repo Guidelines.
Also, update the readme file.
Related to:
The instructions in README mention standardize_label.py whereas the correct script name is normalize_repos.py
Master-slave is an oppressive metaphor that will and should never become fully detached from history. Aside from being unprofessional and oppressive it stifles participation
(1.1. Master-slave - Terminology, Power and Oppressive Language)
The normalize_repos/normalize_repos.py
script should be updated to support both main
and master
as the primary branch.
normalize_repos.py
creates issue spam when an issue has ๐ท๏ธ status: label work required by removing and adding that label
We shouldn't spam issues with bad script logic
2024-03-31 UTC
: Normalize Repos #1549: Scheduled
2024-03-31 00:20:07,734 โ INFO โ .....get_invalid_issues_in_repo: Checking
labels on '[Functionality] make `og:title` differ from `<title>`, so that
third-parties which support Open Graph data import/display utilize an alternate
non-hyphenated format'...
2024-03-31 00:20:09,481 โ INFO โ ......are_issue_labels_valid: Issue
'[Functionality] make `og:title` differ from `<title>`, so that third-parties
which support Open Graph data import/display utilize an alternate
non-hyphenated format' has missing labels.
2024-03-31 00:20:09,482 โ SUCCESS โ .....get_invalid_issues_in_repo: done.
vocabulary-theme:
- issue: '[Functionality] make `og:title` differ from `<title>`, so that third-parties
which support Open Graph data import/display utilize an alternate non-hyphenated
format'
reason: 'Missing labels from label groups: status'
url: https://github.com/creativecommons/vocabulary-theme/issues/41
2024-03-30 UTC
: Normalize Repos #1548: Scheduled
2024-03-30 00:18:23,462 โ INFO โ .....get_invalid_issues_in_repo: Checking labels on '[Functionality] make `og:title` differ from `<title>`, so that third-parties which support Open Graph data import/display utilize an alternate non-hyphenated format'...
2024-03-30 00:18:25,141 โ INFO โ ......are_issue_labels_valid: Issue '[Functionality] make `og:title` differ from `<title>`, so that third-parties which support Open Graph data import/display utilize an alternate non-hyphenated format' is OK.
2024-03-30 00:18:25,141 โ SUCCESS โ .....get_invalid_issues_in_repo: done.
vocabulary-theme: []
2024-03-29 UTC
: Normalize Repos #1547: Scheduled
2024-03-29 00:17:18,515 โ INFO โ .....get_invalid_issues_in_repo: Checking
labels on '[Functionality] make `og:title` differ from `<title>`, so that
third-parties which support Open Graph data import/display utilize an alternate
non-hyphenated format'...
2024-03-29 00:17:20,129 โ INFO โ ......are_issue_labels_valid: Issue
'[Functionality] make `og:title` differ from `<title>`, so that third-parties
which support Open Graph data import/display utilize an alternate
non-hyphenated format' has missing labels.
2024-03-29 00:17:20,129 โ SUCCESS โ .....get_invalid_issues_in_repo: done.
- issue: '[Functionality] make `og:title` differ from `<title>`, so that third-parties
which support Open Graph data import/display utilize an alternate non-hyphenated
format'
reason: 'Missing labels from label groups: status'
url: https://github.com/creativecommons/vocabulary-theme/issues/41
2024-03-28 UTC
: Normalize Repos #1546: Scheduled
2024-03-28 00:18:18,045 โ INFO โ .....get_invalid_issues_in_repo: Checking
labels on '[Functionality] add formats and canonical url meta areas to top
section of the `deed` and `legal code` page contexts as patch styles'...
2024-03-28 00:18:18,863 โ INFO โ ......are_issue_labels_valid: Issue
'[Functionality] add formats and canonical url meta areas to top section of the
`deed` and `legal code` page contexts as patch styles' is OK.
2024-03-28 00:18:18,864 โ SUCCESS โ .....get_invalid_issues_in_repo: done.
vocabulary-theme: []
creativecommons/vocabulary-theme#41:
GitHub Actions
sync_community_teams.py
adds teams without write permissions to CODEOWNERS
For example:
Unknown owner on line 6: make sure the team @creativecommons/ct-vocabulary-collaborators exists, is publicly visible, and has write access to the repository
* @creativecommons/ct-vocabulary-collaborators
Filter teams so that only those with write permissions (push permission according to PyGithub)
The "add community PR" workflow shows an error when a PR is already in a project e.g.
Error while adding Pull Request Card To Column [Project already has the associated issue]
(See: https://github.com/creativecommons/ccos-scripts/runs/552801037?check_suite_focus=true)
PRs that are already in a Project should not be added to a project at all, and so there should be no error.
normalize_repos.py
fails due to archived repositories
pipenv run ./normalize_repos.py -r project_creativecommons.org
Logging excerpt:
<module>: Unhandled exception: Traceback (most recent call last):
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/./normalize_repos.py", line 183, in <module>
main()
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/./normalize_repos.py", line 177, in main
validate_issue_labels(args, repos)
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/./normalize_repos.py", line 107, in validate_issue_labels
validate_issues(repos, required_label_groups)
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/norm/validate_issues.py", line 107, in validate_issues
invalid_issues[repo.name] = get_invalid_issues_in_repo(
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/norm/validate_issues.py", line 87, in get_invalid_issues_in_repo
are_valid, reason = are_issue_labels_valid(
File "/Users/timidrobot/CreativeCommons/git/ccos-scripts/ccos/norm/validate_issues.py", line 55, in are_issue_labels_valid
issue.add_to_labels(LABEL_WORK_REQUIRED_LABEL)
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/github/Issue.py", line 329, in add_to_labels
headers, data = self._requester.requestJsonAndCheck(
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/github/Requester.py", line 442, in requestJsonAndCheck
return self.__check(
File "/Users/timidrobot/.local/share/virtualenvs/ccos-scripts-8EVzyQrj/lib/python3.10/site-packages/github/Requester.py", line 487, in __check
raise self.__createException(status, responseHeaders, data)
github.GithubException.GithubException: 403 {"message": "Repository was archived so is read-only.", "documentation_url": "https://docs.github.com/rest/issues/labels#add-labels-to-an-issue"}
Archived repositories should be skipped
Many Python files in the repo use the __all__
array to limit exported objects. Every item in the list should be a string containing the names of the exported variables.
Currently the variable contains the variables themselves instead of their names, like this.
def set_labels(*args):
pass
__all__ = [set_labels]
Instead the correct way which would be this.
- __all__ = [set_labels]
+ __all__ = ["set_labels"]
from set_labels import *
TypeError: Item in set_labels.__all__ must be str, not function
Wildcard imports, while discouraged, should still work.
GitHub token authentication fails.
I think something changed between Python versions that impacted the behavior of global variables.
The end result is that the following will result in https://None:None
:
ccos-scripts/ccos/teams/set_codeowners.py
Line 279 in cebc132
Ope, I was in the middle of too many errors and didn't save good artifacts for this issue ๐ฌ
Authentication is a necessity
In the README, under the section Workflows, it is mentioned that generate_projects_page.yml
uses normalize_repos
directory and normalize_repos.yml
uses generate_project_pages
directory.
It should have been the other way round.
I am interested to solve this issue
This isn't actually an issue as it's fine to not have an empty slack key but the assignment made is meaningless, hence a reordering would be needed.
We need an accessibility
label for the following repos:
There might be more repos that need it.
Tickets related to accessibility can be tagged with this label for ease-of-classification.
I'd prefer a label like aspect:a11y
for consistency with the labelling scheme in Vocabulary repos but that would not fit in with the other repos in the list.
The script that gather informations from CC repositories doesn't recognize some repo licenses
for instance:
wp-theme-summit
at the bottom of the list. it says "GNU General Public License v2.0"We currently use AWS Lambda to run the generate project page script, but it's set up manually and is probably unnecessary. It's also hard to add new dependencies to the environment.
We do want the script run every day so that the projects page is always up to date.
Set up a GitHub Actions workflow in this repo to run the script every day.
boto3
since that's only used on AWS Lambda.lambda-git
package that's used with something else.I need to set up a second account with admin permissions and 2FA to use for CC Open Source automation. Currently, automation is performed using my token and it's not clear what I do and what happens automatically.
Thanks to @dhruvkb for raising this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.