GithubHelp home page GithubHelp logo

18f / site-scanning Goto Github PK

View Code? Open in Web Editor NEW
18.0 18.0 9.0 1.2 MB

The code base for the first Site Scanning engine

Home Page: https://digital.gov/site-scanning

Shell 12.08% Python 59.70% CSS 0.04% HTML 27.51% Dockerfile 0.66%

site-scanning's People

Contributors

afeijoo avatar alexbielen avatar anjunainaustin avatar danielnaab avatar dependabot[bot] avatar gbinal avatar heymatthenry avatar iamjolly avatar michelle-rago avatar ondrae avatar timothy-spencer avatar vickimcfadden avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

site-scanning's Issues

add further metadata to the 200 scanner

As a consumer of the 200 scanner output, I want to be able to easily identify which agency and branch of government a domain belongs to so that I can better track agency compliance. I also want to know the destination location of scan results that successfully resolve but at a different location than the original target so that I can understand how agencies are implementing this work.

Data.json pages for OPP data.gov product owner

As an OPP data.gov product owner, I need to discover agency.gov/data.json pages, so that I can incorporate results onto data.gov

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Separate site presents data for TTS domains

As a TTS senior leader, I need to see a summary of my products and their performance, so that I can make informed decisions about the products and serve as a model for gov-wide implementation of site scanning.

As an agency webmaster, I need to see an example of someone has visualized the results of site scanning, so that I may easily replicate these steps to stand up my own visualization layer for my agency performance.

TTS maintains a simple yml or json (or whatever) file and we design a way to pull in the results of our scans for those domains into a table that combines the various results. Imagine a table with the leftmost column being a list of domains and then each scan has a column for its results.

Privacy pages for OMB examiner

As an OMB examiner, I need to confirm whether an agency has a agency.gov/privacy page, so that I can confirm compliance with OMB memo.

Task: review memo, applicable language
https://www.whitehouse.gov/sites/whitehouse.gov/files/omb/memoranda/2017/m-17-06.pdf

“Each agency must maintain a central resource page dedicated to its privacy program on the agency’s principal website. The agency’s Privacy Program Page must serve as a central source for information about the agency’s practices with respect to PII. The agency’s Privacy Program Page must be located at www.[agency].gov/privacy and must be accessible through the agency’s “About” page.”

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Brainstorm subdomain and URL structure

As a product owner, I want to give the project proper URLs so that the site looks good and is easy to use and share.

  • Brainstorm possible subdomains for the project.
  • Begin to map out a folder structure for site.

Developer pages for external developer

As an external developer, I need to discover open APIs, so that I can access underlying data assets and incorporate them in my software application.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Developer pages for OMB examiner

As an OMB examiner, I need to discover agency.gov/developer pages, so that I can ensure agency compliance with OMB memo.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Document program permissions

As a product owner, I would like to document the desired permission levels for each part of the program so that we can easily set those up and so that future collaborators can understand how the project works more readily.

USWDS usage for USWDS product owner

As the USWDS product owner, I need to know which agencies are using the USWDS, so that I can track usage.

Note: The PO thinks looking for the style sheet might be the easiest way, which starts with the prefix "-usa". Another possible avenue is to look for the font, "public sans."

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

User stories for TTS leadership

As TTS leadership (TTS Deputy Commissioner and OPP Assistant Commissioner), I need to see data about my programs in a useful way, so that I can advocate for my programs (impact, additional resources, new gov-wide mandates, etc) effectively.

Front-end time for USWDS investigation

As the USWDS product owner, I need an engineer to look at implementations of USWDS to identify common components, so I can search for those components in my site scan.

Acceptance criteria:

  • person identified, not necessarily that their work is complete

Write API for Django

As somebody who is interested in finding out about US Government websites, I need to be able to find the results of scans being conducted by this 10x project.

Acceptance criteria:

  • All scan results can be listed
  • Scans can be filtered by domain or by scan type
  • Scan results are in json
  • API can be run locally or in cloud.gov

Automate API deploy and scans

As a user of this 10x project, I would like the scanner API to be running, up to date with the latest code, and with scans run daily.

Acceptance Criteria:

  • The scanner API should be automatically deployed to cloud.gov whenever there is a change to the master branch.
  • The scanner engine that collects the data should be run daily.
  • The process for setting this up should be documented so that others can do it.

MVP the CSV and/or HTML 'presentation layers'

As a TTS product owner (data.gov, api.data.gov, search.gov), I want to see a non-JSON presentation of the data, so that I'm not intimidated and can easily find the information that I'm looking for

Acceptance criteria:

  • CSV and/or HTML for 200 scanner available for usability testing
  • CSV and/or HTML for the USWDS scanner available for usability testing

Make the s3 bucket public and document links

As a user of this service, I want to be able to easily access the underlying JSON flat files as well as be able to point others to them.

To that end, we should make the s3 bucket public and document in /docs/ the permanent location of the individual files or folder.

Link scans to docs

As a site scanning user, I want to easily be able to find the results of the scans, so that I can use them to improve my website performance.

Acceptance criteria:

  • documentation about where to find/how to use scans is easily findable

Account for sloppy but close agency results in the 200 scanner

For any of the 200 scans, as a stakeholder, I would like to note results that are not exactly correct in their location and thus are not being caught by the scanner but are close and thus represent important agency work so that I can know which agencies are mostly compliant and just need a tweak to go the rest of the way.

For example, OMB policy requires agency API hubs to be found at agency.gov/developer, but in many cases, the agency instead placed it at agency.gov/developers. We would like to work with such an agency to set up a redirect so that /developer and /developers both resolve to the same page, but it is still very helpful to know about the hub.

Generate list of v0.1 user stories

As the engineer creating a data presentation prototype, I need to understand how the data may be used, so that I can present options that make sense for the users.

E.g. As the owner of data.gov, I want to see a list of every domain that has a /data.json file.

Get 10x cloud.gov credentials

As the site scanning product owner, I need access to the sandbox version of cloud.gov, to allow more flexibility to bring on resources over time.

Proper indexing for agency webmaster

As an agency webmaster, I need to know that spiders are receiving proper instructions about how to index my websites and what to index from them, so that I can shape public access to the site via search engines, and the spiders’ behavior while they’re working.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Move from sandbox to 10x cloud.gov

As the site scanning product owner, I'd like to control the permissions of the product, so that I can maintain flexibility as the product matures

Set up a project email address

As I product owner, I want to create a team google group so that we can circulate emails to the team easily, but also so that we can publish it externally for outside agencies and the public to get in touch if they have questions.

USWDS implementation range

As a 10x product manager, I want to know what a good, healthy range of implementation is to so that we can get the results to detect on the scanner.

Data.json pages for OMB examiner

As an OMB examiner, I need to discover agency.gov/data.json pages, so that I can ensure agency compliance with law.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Contributions for USWDS product owner

As the USWDS product owner, I need to know which agencies are using the USWDS, so that I can inquire with agencies and have them contribute back to USWDS.

Note: The PO thinks looking for the style sheet might be the easiest way, which starts with the prefix "-usa". Another possible avenue is to look for the font, "public sans."

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Stand up the pilot USWDS code and run a trial scan

As a project team member, I'm interested in researching the code that we already have access to for USWDS scanning in order to evaluate its potential and flaws.

There's one (possibly two) pilot code projects to see:

top-level domains that are using USWDS:

CitizenScience.gov
ClinicalTrials.gov
code.mil
cloud.gov
cbp.gov
dds.mil
dnfsb.gov
commerce.gov
dhs.gov
dietaryguidelines.gov
dotgov.gov
epa.gov
fca.gov
fcsic.gov
fec.gov
ffb.gov
fpc.gov
fedramp.gov
foia.gov
gsa.gov
healthcare.gov
imls.gov
iawg.gov
irs.gov
itdashboard.gov
login.gov
manufacturing.gov
medicaid.gov
move.mil
mymedicare.gov
nih.gov
floodsmart.gov
opioids.gov
performance.gov
plainlanguage.gov
pclob.gov
search.gov
sba.gov
stopbullying.gov
upremecourt.gov
tsa.gov
usagm.gov
usaid.gov
usda.gov
dol.gov
treasury.gov
va.gov
usds.gov
flra.gov
uscis.gov
uscourts.gov
usich.gov
unlocktalent.gov
usa.gov
usaid.gov
usajobs.gov
usaspending.gov
usgs.gov
vote.gov
whitehouse.gov
worker.gov

USWDS implementation for OMB Examiner

As an OMB examiner, I need to confirm which agencies have implemented USWDS, so I can compare this data to new .gov domains and ensure compliance with the law (IDEA).

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Developer pages for api.data.gov product owner

As an OPP API product owner, I need to discover agency.gov/developer pages, so that I can find potential users of the api.data.gov shared service.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Developer pages for agency CIO

As an agency CIO or CTO, I need to discover agency.gov/developer pages, so that I can discover API programs at my agency and integrate them into my agency API coordination effort.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

Implement updated permissions

As a product owner, I would like the engineer to update the permissions to our system so that we can ensure redundancy and best collaborate on the project.

Implement the Cloud.gov and AWS permission changes listed here - #32

The GitHub ones are already done.

SEO support elements for Search.gov Product Owner

As the OPP Search.gov product owner, I need to know which agencies are providing core SEO support elements, so I can support them appropriately in making their content findable.

Acceptance criteria:

  • Records are updated at least daily
  • Data is available as JSON

CSV/json export from search UI

As a user of the scan search UI, I need to be able to download scan data that I have selected in the various scan search pages in some standard formats so that I can import the data into software that I can use to visualize the data in customized ways.

I imagine that users would like to be able to export the scans of the sites that they have searched for as json and CSV.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.