18f / concourse-compliance-testing Goto Github PK

Concourse CI assets for Compliance Toolkit

Home Page: https://compliance-viewer.18f.gov/

License: Other

Shell 5.83% Ruby 94.17%

concourse-compliance-testing's Introduction

Concourse CI Compliance Testing

This is a Concourse pipeline that scans sites for vulnerabilities using OWASP ZAP. This is part of 18F's Compliance Toolkit project, and is essentially the back end of Compliance Viewer.

Adding a Project

The config/targets.json is a list of the projects to be scanned. Since ZAP can inject junk data if it's successful in finding certain vulnerabilities, we suggest using a staging URL. To get a new project added:

Submit a pull request to this repository to add an entry in config/targets.json like this:

{
  // Needs to be all lower-case.
  "name": "NAME",
  // (optional) Channel in the 18F Slack to get notifications in.
  "slack_channel": "CHANNEL",
  // Links to scan.
  "links": [
    {
      "url": "URL"
    }
  ]
}

After the pull request is merged, ask someone in #cloud-gov-highbar to run
```
TARGET=<fly_target> rake init_targets
TARGET=<fly_target> rake deploy
```

Attributes

name - This should be all lowercase.
slack_channel (optional) - This should be the channel where you'd like to get alerts for completed scans. If left out, the alerts will be sent to the default channel, currently #ct-bot-attack.
links - An array of links that should be scanned with ZAP. The results will be concatenated together.

Process Overview

Inputs

The running pipeline depends on this repository for the tasks to be performed and targets to scan. By default, the pipeline pulls the master branch for these tasks, but it can be pointed at a different branch for testing.

Outputs

Normal users of Compliance Toolkit do not need access to the Concourse CI. The pipeline publishes output in a few different modes.

Primarily, the pipeline publishes the ZAP scan results as a JSON file to S3. This is the information that is consumed by the user via Compliance Viewer.

The pipeline also published two types of Slack notifications. The first is a heartbeat notification; it is published to a central channel (currently #ct-bot-attack, but configurable in the pipeline) after every run to confirm that the run happened. This is for the Compliance Toolkit team to monitor that the process is functioning.

The second is for the project teams. It is published to the channel defined in targets.json, or the central channel (as the above notifications) if no channel is defined. It is only published if there is a change in the results. It also includes a link to the results in Compliance Viewer.

Process

For each project, there are two jobs defined, a scheduled job, and an on-demand job. This is due to an oddity in the way Concourse jobs are triggered. If there is a time-based trigger defined, you can not run it at another time. The scheduled job runs every day at midnight. All the project scans are triggered simultaneously, but there are a limited number of workers available. The scans will be queued until a worker becomes available.

Each scan is a multi-step process:

Triggered at 12:00 AM.
Retrieves scripts to run from the GitHub repository.
Retrieves the prior scan results from S3.
Performs some filtering/scrubbing of the prior scan results.
Run the ZAP scan via zap-cli. The ZAP scan has several sub steps of its own:
1. Run the spider the current target.
2. Run the AJAX spider for the current target.
3. Scan the target.
4. Output the detected alerts.
Repeat i-iv for every target defined for the project in targets.json.
Concatenate the results files into a single file.
Upload the results file to S3.
Summarize the results and the difference between the prior and current scan.
Post the two slack messages (heartbeat & notification, described above)
Upload the summary results to S3.

These steps are performed for each project in a parallelized fashion.

Feedback

Give us your feedback! We'd love to hear it. Open an issue and tell us what you think.

Public domain

This project is in the worldwide public domain. As stated in CONTRIBUTING:

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

concourse-compliance-testing's People

Contributors

Stargazers

Watchers

Forkers

davidebest jdrew1303 darach 95rade seanturner83 isabella232

concourse-compliance-testing's Issues

Move to a new CI/CD system

This repository has been using Travis CI, however access to Travis CI is now turning off. When you need to use this repository again, convert it to a CI/CD system which is in the ITSP such as Circle CI.

remove dependence on Team API

A few issues:

Team API development has been stalled for a while (for various reasons), and the data isn't getting updated.
We are overriding the Team API data for most entries in targets.json, so there's cognitive and technical complexity in merging the data.
The Team API isn't relevant for projects not created/maintained by 18F, so there's diminishing benefit as we open up Compliance Toolkit to a broader range of users.
This project will likely need to point to staging URLs rather than production (as scanning sites can be destructive/disruptive), so Compliance Toolkit will want a different list of URLs than the ones teams provide for the Team API.

@mtorres253 FYI!

shrink the pipeline

Short of Concourse having additional features to handle what I'm calling multi-object builds, having the large number of groups in a single pipeline is no longer working; for starters, the group names no longer wrap, so the ones off to the right are no longer clickable.

Since Concourse now supports teams (and #127 will recommend having a dedicated team for these scans), I suggest we instead generate one pipeline per site, rather than one group per site. There are tradeoffs (like not being able to easily pause all at once), but I think it's worthwhile.

Add a mechanism to support badges for projects scanned via the compliance pipeline.

Almost all of the repos provide 'at a glance' information for static code analysis (Code Climate, Quantified Code, etc) via a badging mechanism. The same should be possible for ZAP scan information.

As a user, I want to know the results of static analysis

This applies to:

Static security analysis
Dependency analysis

We cover this in https://pages.18f.gov/before-you-ship/security/static-analysis/, and I think it still makes sense to rely on external service(s) like Gemnasium for this, but would be helpful to have [links to] all of the information in one place. Maybe an MVP is having the ability to display badges from the various services?

C2 returns (0/0/0/0) NEW SITE when there are prior and current results.

The results, according to https://compliance-viewer.18f.gov/results/c2/current, should be (0/2/290/0).
The build results show that there should be 2 HIGH alerts.

This is potentially a capitalization issue, but may also be related to the way the scanner returns high results directly instead of waiting for the call to alerts.

The ZAP scan should be tunable based on features of the target site.

https://18f.slack.com/archives/compliance-toolkit/p1461964310000617 and https://18f.slack.com/archives/dap/p1461963187000393

Static sites have a different security profile than a dynamic one. We should tune the scans to match.

reduce the number of notifications to the central channel

Hearing about every single scan ends up being a lot of noise. Reduce it to only be the changes, the same as would be posted in the project channels.

look into failing builds for new projects

Seems to be consistent across all new projects after following the new project steps. In filter-project-data:

/tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `initialize': No such file or directory @ rb_sysopen - /tmp/build/af6792a7/project-data/project.json (Errno::ENOENT)
    from /tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `new'
    from /tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `read_json'
    from ./scripts/tasks/filter-project-data/task.rb:6:in `<main>'

e.g. https://ci.cloud.gov/pipelines/zap/jobs/zap-ondemand-cg-landing/builds/2

/cc @dlapiduz

add bundle install instruction to CONTRIBUTING.md

As a user with zero or limited ruby experience, would like to make sure every step is explicitly written out.

add usage instructions

Local testing
Deploying the pipeline to https://ci.cloud.gov

Scanning OpenOpps failed with a seemingly random error.

resource script '/opt/resource/check []' failed: exit status 1

stderr:
failed to fetch digest: 401 Unauthorized

This sort of error appears to be a random docker/concourse error, not related to the actual job of running scans. Is this something that we can catch or retry?

http://ci-tooling.cloud.gov/pipelines/zap/jobs/zap-scheduled-openopps/builds/1
https://trello.com/c/5FGAUZtq/135-increase-reliability-of-individual-scans

filter-team-data is case sensitive, but it shouldn't be?

I added c2 in the targets.json, which caused me to get

WARN: `c2` is missing from Team API data.

in Concourse.

The project's name in the Team API is C2. Confusingly (but only tangentially related), https://team-api.18f.gov/public/api/projects/C2/ returns a 404, but https://team-api.18f.gov/public/api/projects/c2/ returns successfully.

consider integrating TLS/header scanning tools

...or integrating with tools like

look into using tool to generate the ZAP pipeline

Right now, we build the pipeline with ERB, but just realized the Cloud Foundry community has a tool for doing templated YAML:

https://github.com/cloudfoundry-incubator/spiff

investigate using alternate scanner

ZAP has been harder to work with than expected...we should look into alternatives.

Lists

Specifics

clean up repository

Remove unused tasks/pipelines
Put tasks/pipelines into respective folders

@DavidEBest Happy to help with this – any quick pointers on which aren't needed?

test the tests

Would be really nice to be able to automatically verify that a pipeline works in a pull request (and have the commit status updated), without needing to take the contributor's word for it or pulling down and running the pipeline locally. Not sure how difficult this would be to do.

Implement a continuous deployment pipeline for the compliance toolkit pipelines.

As a pipeline developer, I'd like to have commits merged to master pushed to the Concourse server.

It seems like regardless of Pipeline Resource there is a chicken/egg problem re: deploying pipelines and security. This is currently blocked, due to authorization issues.
Some discussion here: https://concourseci.slack.com/archives/general/p1459011568001972

atf-eregs scan hangs creating a new ZAP session.

We saw this problem before, when we were relying on the new session command to loop over all the projects. When I ran fly intercept I found that it was stuck in ZAP, attempting to create a new session.

The ATF project consists of two VERY LARGE sites. I feel that there might be a bug with the new session creation code when there is a lot of data in the previous session. A possible work around/test to run would be to remove the new session mechanism, and just recreate the ZAP instance for every site. That'll incur a greater startup penalty, but that is less of a concern now that we aren't looping over all the projects within a single session.

http://ci-tooling.cloud.gov/pipelines/zap/jobs/zap-scheduled-atf-eregs/builds/1

https://trello.com/c/od2IZwIu/134-atf-eregs-scan-hangs-creating-a-new-zap-session

rename repository?

This repository is difficult to remember/find...maybe something like compliance-viewer-pipeline would make more sense? Or maybe we should actually merge this into https://github.com/18F/compliance-viewer? That being said, "Compliance Viewer" could be more aptly named "Compliance Checker" or something...

Create test-only rubocop config

The rubocop config we have is good, but too strict in some ways for running on test code. We should create a more relaxed version for running on the tests.

As a user, I want my site to be automatically scanned for 508 compliance

Could have sworn we had an issue about this already, but would be helpful for the automated scans to also include accessibility checks.

https://pages.18f.gov/accessibility/tools/#automated-testing

Fix new site "waiting on a suitable set of input versions" bug

This is in master.
If a site to be scanned does not have

any results in s3-bucket/results
any cached versions in Concourse

It will hang forever on waiting for a suitable set of input versions

More info + screenshots here: https://18f.slack.com/archives/compliance-toolkit/p1461085625000067

reduce false positives in uptime-check

Currently, that task reports "No links" for a number of projects that we want to ignore:

No `links` for about_yml.
No `links` for authdelegate.
No `links` for 18f-identity.
No `links` for dodsbir-scrape.
No `links` for fec-cms.
No `links` for fec-style.
No `links` for hmacauth.
No `links` for laptop.
No `links` for openFEC-web-app.
No `links` for SBIR-EZ.
No `links` for sbirez.
No `links` for team_api.
No `links` for uscis.
https://team-browser.18f.gov/ is NOT up.
https://18f.gsa.gov/team-browser/ is NOT up.
https://www.google.com/calendar/embed?src=gsa.gov_0samf7guodi7o2jhdp0ec99aks%40group.calendar.google.com&ctz=America/Los_Angeles is NOT up.
https://team-browser.18f.gov/team-browser/ is NOT up.

Issues identified in that list:

SBIR-EZ is a defunct project.
- Is there a good way to indicate this in the about.yml?
about_yml and laptop don't have standalone sites—should links include a reference to the repository? We might want to say explicitly one way or another in the about.yml instructions.

Ideas for improvement:

Reach out to projects/teams where the about.yml seems out-of-date
Filter by status
- I see this in some of the entries in /projects, but it's not listed in the instructions.
Filter by ____?
- My initial thinking was "oh, there's just some field

Resources:

@mtorres253 @ertzeid @ccostino The background here is that I'm pulling a list of URLs from the links provided in the Projects list of the Team API, for automatically checking if the sites are up, doing security scans, etc. Any advice on how to do better filtering of that list?

As a user, I want to scan sites that have authentication

Broken out from 18F/before-you-ship#158.

Investigate running CodeClimate VMs via the Concourse.CI environment.

There are several standardized Code Climate scanning engines that should be compatible with Concourse.CI, given recent changes (codeclimate/codeclimate#299 (comment)). When/if we move to do more static code analysis in house it'd be beneficial to use them.

simplify targets configuration

Ideas:

Only accept a single URL per target
Convert to YAML instead of JSON

https://github.com/18F/concourse-compliance-testing/blob/master/config/targets.json

Wire Result Scanner into ZAP Pipeline

Tracking what's needed for this trello card:
https://trello.com/c/b3QE1wod/67-as-a-user-i-want-to-be-notified-if-my-zap-alerts-change

Blocked by cloud-gov/s3-resource-simple#5
Download and Rename last results from S3
Move Result Scanner into tasks/, add a README, etc.
Run Result Scanner over last+current.
Address CodeClimate violations for ResultScanner

Add cloud.gov sites to scheduled scans

Can we add:

To the regular scans?

Thanks!

/cc @konklone @mogul @adelevie

Bug: No such file or directory - project-data/project.json

filter-project-data step seems to be broken for all jobs 😢

https://ci.cloud.gov/pipelines/zap/jobs/zap-scheduled-cg-landing/builds/14

/tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `initialize': No such file or directory @ rb_sysopen - /tmp/build/af6792a7/project-data/project.json (Errno::ENOENT)
    from /tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `new'
    from /tmp/build/af6792a7/scripts/lib/team_data_filterer.rb:32:in `read_json'
    from ./scripts/tasks/filter-project-data/task.rb:6:in `<main>'