GithubHelp home page GithubHelp logo

snyk-playground / snyk-sync Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 8.0 244 KB

A way to ensure your GitHub Repos are monitored by Snyk

Python 93.05% Dockerfile 2.16% Shell 4.79%
snyk org-ie-playground team-cse

snyk-sync's Introduction

REPOSITORY ARCHIVED

Snyk Sync has been renamed to snyk-scm-mapper. This repository (and docker images) will be kept as a public archive, all new development will be done on the new repository here.

Snyk Sync

A way to ensure your GitHub Repos containing projects that can be monitored by Snyk are infact monitored by Snyk.

How does this work?

Snyk Sync connects to GitHub to retrieve a list of repositories from one or more GitHub organizations, cross references that list with the projects it can detect in a Snyk Group, and generates a list of Targets for Snyk API Import to have Snyk attempt to monitor those unmonitored repositories.

Snyk Sync will check if a repository has a file import.yaml in the root directory .snyk.d/ this file specifies the Snyk Organization that any projects imported from the repository will be added to and any tags to ensure are added to those projects.

If there is no import.yaml file or the organization specified is not in the snyk-orgs.yaml approved list the projects will go to the default organization as configured in the snyk-sync.yaml file.

Snyk Sync can be run by hand, by a scheduler, or in a github workflow (see: config-repo for a github workflow implementation)

Assumptions:

  • A repository is considered monitored if it already has a single project (there are tools such as scm-refresh that will allow one to reprocess existing repositories and it is on the Snyk roadmap to reprocess them natively)
  • Tags are additive: Any tags specified in the import.yaml will be added to all projects from the same repository. If the tag already exists as an exact match, it will not be added, and existing tags not declared in import.yaml will not be removed. Snyk allows for duplicate Key names, so "application:database" and "application:frontend" are both valid K:V tags that could be on the same project. This is not a suggestion to do this, but pointing out it is possible.
  • Forks: Because of how GitHub's indexing works, it will not search forks. Snyk Sync uses GitHub's search functionality to detect import.yaml files (to keep API calls to a minimum). In order to add forks, use the --forks flag to have Snyk Sync search each fork individually for the import.yaml file. CAUTION: This will incur an API cost of atleast one request per fork and two if the fork contains an import.yaml

Topics

If an Org in the snyk-orgs.yaml file includes a list of topics, any repo that has matching topics will be assigned that org instead of default. This happens before import.yaml evaluation. The orgs with the highest number of matching topics is assigned in the case of multiple orgs matching for a single repo. If there is a tie in matches, the first by alphabetical order of org name is selected.

Order of Precedence (most specific wins)

If a repo has a topic matching an org's topics list, and an import.yaml listing an org, who wins? The import.yaml does.

In the import.yaml, a top level org definition applies to all repos, unless a branch has an org listed.

An instance can be considered a prefilter to the import.yaml, because it is applied to the import.yaml first, then the branch overrides are evaluated.

A Repo is found -> Does an org match the topics? Yes -> Change org ->
  Does an import.yaml exist? Yes -> Evaluate for Instance ->
  Is an Org declared? -> Change org for all listed branches ->
  Does a branch have an org? -> Change org for specific branch

Caching

If one has a large organization with many hundreds or thousands of repositories, the process of discovering all of them can be timeconsuming. In order to speed up this process, Snyk Sync builds a 'watchlist' in a cache directory (by default cache). It will only perform a sync (querying both GitHub and Snyk APIs) if the data is more than 60 minutes old (change with: --cache-timeout) or a sync is forced (--sync). This allows for the targets and tags subcommands to operate much more quickly. Depending on the size of the targets list given to snyk-api-import, it may take a long time for the project imports to complete, after which another sync should be performed and the tags command run to ensure any new projects that didn't exist before are now updated with their associated tags.

Setup

See scenarios

Snyk Sync expects a GITHUB_TOKEN and SNYK_TOKEN environment variables to be present, along with a snyk-sync.yaml file, snyk-orgs.yaml file, and a folder to store the cache in (it will not create this folder). See the example directory for a starting point.

example
├── cache
├── snyk-orgs.yaml
└── snyk-sync.yaml
  • GITHUB_TOKEN: this access token must have read access to all repositories in all GitHub organizations one wishes to import
  • SNYK_TOKEN: this should be a group level service account that has admin access to create new projects and tag them

Minimum snyk-sync.yaml contents:

---
schema: 1
github_orgs:
  - <<Name of GitHub Org>>
snyk:
  group: <<Group ID from Snyk>>
default:
  orgName: ie-playground
  integrationName: github-enterprise

Example minimum snyk-orgs.yaml:

---
ie-playground:
  orgId: 39ddc762-b1b9-41ce-ab42-defbe4575bd6
  integrations:
    github-enterprise: b87e1473-37ab-4f09-a4e3-a0139a50e81e

To get the Organization ID, navigate to the settings page of the organization in question https://app.snyk.io/org/<org-name>/manage/settings

To get the GitHub Enterprise integration ID (currently the GitHub Enterprise integration is the only supported integration for snyk sync, but it can be used with a GitHub.com Org as well) navigate to: https://app.snyk.io/org/<org-name>/manage/integrations/github-enterprise

Help

Base snyk-sync flags/environment variables

Usage: cli.py [OPTIONS] COMMAND [ARGS]...

Options:
  --cache-dir DIRECTORY    Cache location  [env var: SNYK_SYNC_CACHE_DIR;
                           default: cache]
  --cache-timeout INTEGER  Maximum cache age, in minutes  [env var:
                           SNYK_SYNC_CACHE_TIMEOUT; default: 60]
  --forks / --no-forks     Check forks for import.yaml files  [env var:
                           SNYK_SYNC_FORKS; default: no-forks]
  --conf FILE              [env var: SNYK_SYNC_CONFIG; default: snyk-
                           sync.yaml]
  --targets-file FILE      [env var: SNYK_SYNC_TARGETS_FILE]
  --snyk-orgs-file FILE    Snyk orgs to watch  [env var: SNYK_SYNC_ORGS]
  --default-org TEXT       Default Snyk Org to use from Orgs file.  [env var:
                           SNYK_SYNC_DEFAULT_ORG]
  --default-int TEXT       Default Snyk Integration to use with Default Org.
                           [env var: SNYK_SYNC_DEFAULT_INT]
  --snyk-group UUID        Group ID, required but will scrape from ENV  [env
                           var: SNYK_SYNC_GROUP; required]
  --snyk-token UUID        Snyk access token  [env var: SNYK_TOKEN; required]
  --sync                   Forces a sync regardless of cache status
  --github-token TEXT      GitHub access token  [env var: GITHUB_TOKEN;
                           required]
  --help                   Show this message and exit.

Commands:
  status   Return if the cache is out of date
  sync     Force a sync of the local cache of the GitHub / Snyk data.
  tags     Returns list of project id's and the tags said projects are...
  targets  Returns valid input for api-import to consume

targets command: Outputs the list of targets to stdout or saves them to a file. The output is formated json that snyk-api-import accepts.

Usage: cli.py targets [OPTIONS]

  Returns valid input for api-import to consume

Options:
  --save  Write targets to disk, otherwise print to stdout
  --help  Show this message and exit.
Usage: cli.py tags [OPTIONS]

  Returns list of project id's and the tags said projects are missing

Options:
  --update  Updates tags on projects instead of outputting them
  --save    Write tags to disk, otherwise print to stdout
  --help    Show this message and exit.

Container Build Steps

This pushes to GitHub's container registry.

docker build --force-rm -f Dockerfile -t snyk-sync:latest .
docker tag snyk-sync:latest ghcr.io/snyk-playground/snyk-sync:latest
docker push ghcr.io/snyk-playground/snyk-sync:latest

Container Run Steps

docker pull ghcr.io/snyk-playground/snyk-sync:latest
docker tag ghcr.io/snyk-playground/snyk-sync:latest snyk-sync:latest
docker run --rm -it -e GITHUB_TOKEN -e SNYK_TOKEN -v "${PWD}":/runtime snyk-sync:latest --sync target

Using a custom CA Root Certificate / proxies

If using a proxy, ensure that you are passing HTTP_PROXY and HTTPS_PROXY environment variables to the container runtimes.

-e HTTP_PROXY -e HTTPS_PROXY will create env variables with the same values as the machine that ran the docker command.

Custom Certificates

Naming the custom certificate bundle custom-ca.crt and placing it in base directory you are mounting as runtime, both the snyk-sycn and api-import entrypoints will detect and set appropriate environment variables for Python and Node respectively. In most cases this is the same as the config-repo itself, and would be mounted to /runtime, which is the containers workdir.

So in most cases:

  • Rename your custom certificate bundle as custom-ca.crt
  • Ensure custom-ca.crt is in the root of your config-repo

If you want to specify your own path to the certificate bundle, ensure that file is present before the entrypoints run, and set the following environment flags:

REQUESTS_CA_BUNDLE="/custom/path/to/ca-bundle.crt"
NODE_EXTRA_CA_CERTS="/custom/path/to/ca-bundle.crt"

snyk-sync's People

Contributors

mrzarquon avatar nathan-roys avatar scott-es avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

snyk-sync's Issues

add support for filtering candidate repos for import

In order to operationalize, some snyk administrators will need the ability to filter which repositories this automation will consider for import into Snyk.

The first iteration could filter by repository name which would work well for repos using standardized naming conventions.

Could potentially filter on other metadata about the repository such as github topics or labels if that data is available.

A "jq-style" filter would be flexible enough to both exclude or include candidate repos as needed.

Feat: Add support for multi group importing

Need to investigate how to pull down the data properly / decide on it.

It should be possible to build a map of Token -> Group -> Orgs in group

Then have a separate targets task that just pulls out the targets to import that match orgs whatever the one active SNYK_TOKEN can see and/or if the --importer flag matches?

Do not load cache when forcing sync

Errors are sometimes thrown if there is a bad cache entry in cache/data.json, but we don't need to attempt to load the cache at all when forcing a sync (--sync) since its not used. We should avoid loading cache so any errors related to the bad cached objects are not shown to the user (there is no real issue here).

@Rocco-Hash1

Repository organization not updated when topics changed in snyk-orgs.yaml

When there is already a cached list of repositories and the list of topics in snyk-orgs.yaml is changed in a manner that causes a repository to be mapped to a new Snyk organization, no change is made.

  • sync.py(107) if self.has_repo(repo.id): returns true
  • sync.py(111) if existing_repo.is_older(repo.updated_at): returns false
  • The add_repo function then returns without making any change. The old snyk organization is left in place, even though it changed.

The "is_older" if statement should probably contain an additional check to see if the org_name changed as well. Also note that snyk.py(125) may have a bug as well - if the repository changes back to the default that code will be skipped.

Document 'topics' structure in snyk-orgs.yaml

The README.md mentions that "If an Org in the snyk-orgs.yaml file includes a list of topics, any repo that has matching topics will be assigned that org instead of default".

However, there are no examples or documentation of how to specify topics in the snyk-orgs.yaml file. Can this please be documented?

Targets command fails when using integration other than github-enterprise

Running the command python cli.py --github-token <token here> --conf snyk-sync.yaml --snyk-orgs-file snyk-orgs.yaml --cache-dir cache targets results in an unhandled exception:

Traceback (most recent call last):
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\snyk_sync\cli.py", line 670, in <module>
    app()
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\click\core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\click\core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\click\core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\click\core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\click\core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\.venv\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "C:\Users\username\documents\source\prodsec-snyk-config\snyk-sync\snyk_sync\cli.py", line 439, in targets
    int_id = branch.integrations["github-enterprise"]
KeyError: 'github-enterprise'

The snyk-sync.yaml file has the following:

default:
   orgName: orgNameHere
   integrationName: github

Line 436 and 439 of cli.py appears to have a bug - the YAML key github-enterprise is hardcoded. For users who are using regular GitHub and not GitHub Enterprise, these lines will always crash with an error. They should be updated to select the proper integration from configuration rather than being hard-coded.

Improvement: Refactor to individual modules per command

Per: https://typer.tiangolo.com/tutorial/commands/context/

The POC phase left this codebase with all the logic inside the single cli.py file, as the features grow this should be broken up into individual modules for each command.

The main changes would be:
Move the remove the global data (watchlist / s) and replace with using contexts.
Instead of referencing this common data, just store and pass the arguments (or the parsed data?) as a context obj, which all the downstream modules would get. IE:

file: main.py

import typer
import subcommand

app.add_typer(subcommand.app, name="subcommand")

@app.callback(invoke_without_command=True)
def main(ctx: typer.Context, otherinput....):
  ctx.obj = parse(otherinput)

Then the subcommand, which only has one command itself:

file: subcommand.py
@app.callback(invoke_without_command=True)
def generate( ctx: typer.Context)
  parsed_configuration_from_main = ctx.obj

This doesn't make the subcommand stand alone, but would allow main.py --foo=bar subcommand to access foo in subcommand.py easily enough.

Add support for GitHub workflow usage without pull requests

The example at https://github.com/snyk-playground/config-repo/blob/main/.github/workflows/perform-import.yml shows how to use a GitHub workflow to run the snyk-sync tool. To store it's cache, it requires continual commits (via Pull Requests) back to the repository.

Instead, using the official actions/cache GitHub action could allow the sync to work using native GitHub workflow caching. Ideally, the "sync" or "config" repository shouldn't ever require commits for regular operation - perhaps with an update to the example workflow and some code adjustments this would be possible?

Deleted Snyk projects not removed from cache

The sync() function in cli.py will load all existing repos from the cache (load_watchlist()), list all GitHub repositories, then list all Snyk projects matching those repositories and add or update information as appropriate.

However, it does not check whether a Snyk project no longer exists for a repository and, if so, remove it from the cache. Regardless of the intent, it leads to multiple issues:

  • If a Snyk project is accidentally deleted or intentionally deleted to force the importer to reimport it, the importer might believe it's already imported and skip it.
  • The "tag" command will fail with a 404 when it tries to add a tag to a project that does not exist.

The later is causing our synchronization to fail - it continually tries to add tags to projects that don't exist and crashes.

support custom CA cert when connecting to SCM

urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='snyk.io', port=443): Max retries exceeded with url: /api/v1/orgs (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1129)')))

Add support for recording github repo ID/node id

While we access and work with repos by name, a repository can be renamed, but the id and node id should remain the same. We should record those for future uses of detecting a renamed (instead of one repo removed, one added) repo:
API Snippet:

"id": 1296269,
"node_id": "MDEwOlJlcG9zaXRvcnkxMjk2MjY5"

Allow topics in snyk-orgs.yaml to specify Snyk tags

We're planning on using a centralized snyk-orgs.yaml file to drive the majority of our "snyk-sync" imports. Our GitHub repositories will have topics that act a bit like team/platform/application tags. We also currently tag our projects in Snyk so that our vulnerability management platform can integrate all of our data on a team-by-team basis.

I'd like the ability to specify one or more Snyk key-value tag pairs for each topic in the snyk-orgs.yaml file. For example:

  1. Does an org match the topics for a repo? Yes -> include any tags specified for that org-topic match in the snyk-orgs.,yaml file.
  2. Does an import.yaml file exist? Yes -> include any additional tags specified in the import.yaml file.

Bonus points if it doesn't add multiple tags for the same key, case-insensitive. This is a weird quirk of Snyk that is different than every other tagging system out there, and causes us quite a few headaches :)

support per branch import inclusion/exclusion

With adding support for branches, we should add support for per branch tags / importer:

branches:
  - development
  - main:
     importer: 
        - <snyk-sync import instance>
     tags:
       - key:value

With the above metadata added to an import.yaml file, snyk-sync targets --custom-branches --importer 'appsec-team' will import default branch or a custom branch with importer: appsec-team. If a branch is imported, its tags override the repo tags in the instance of duplicate key names.

Running snyk-sync targets --custom-branches without --importer specified will pull in all custom branches, --no-custom-branches will just pull in default branches

--no/-importer can be chained:

--no-importer 'appsec-team' will invert the match and exclude appsec-team

snyk-sync targets --custom-branches --no-default --no-importer 'appsec-team' --importer 'dev-team' --importer 'devops'

would exclude default branches, branches with importer='appsec-team', but would include branches flagged as 'dev-team' and 'devops'

After filtering if no branches remain, the target is removed from the targets list.

Do not require "groups" in snyk-sync

In the current implementation, there's a design conflict between the sync-sync.yaml file and the CLI parameters.

I tried invoking the CLI with the argument --snyk-token and leaving out either the token_env_name parameter for a group or the groups entirely. In both cases, I was given an error.

This means that the --snyk-token CLI argument is effectively useless; it's currently a requirement to place the token in an environment variable and then reference the environment variable from the config file. However, the vast majority of users will only ever have one group - a simplified way of invoking the CLI without having to a list of groups would be advantageous. If that change doesn't make sense, then the --snyk-token argument should be removed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.