18f / site-scanning Goto Github PK
View Code? Open in Web Editor NEWThe code base for the first Site Scanning engine
Home Page: https://digital.gov/site-scanning
The code base for the first Site Scanning engine
Home Page: https://digital.gov/site-scanning
Bring notes over from mural and put in GitHub - https://github.com/18F/site-scanning/blob/master/docs/candidate-scans.md
As a consumer of the 200 scanner output, I want to be able to easily identify which agency and branch of government a domain belongs to so that I can better track agency compliance. I also want to know the destination location of scan results that successfully resolve but at a different location than the original target so that I can understand how agencies are implementing this work.
As an OPP data.gov product owner, I need to discover agency.gov/data.json pages, so that I can incorporate results onto data.gov
Acceptance criteria:
As a TTS senior leader, I need to see a summary of my products and their performance, so that I can make informed decisions about the products and serve as a model for gov-wide implementation of site scanning.
As an agency webmaster, I need to see an example of someone has visualized the results of site scanning, so that I may easily replicate these steps to stand up my own visualization layer for my agency performance.
TTS maintains a simple yml or json (or whatever) file and we design a way to pull in the results of our scans for those domains into a table that combines the various results. Imagine a table with the leftmost column being a list of domains and then each scan has a column for its results.
As an OMB examiner, I need to confirm whether an agency has a agency.gov/privacy page, so that I can confirm compliance with OMB memo.
Task: review memo, applicable language
https://www.whitehouse.gov/sites/whitehouse.gov/files/omb/memoranda/2017/m-17-06.pdf
“Each agency must maintain a central resource page dedicated to its privacy program on the agency’s principal website. The agency’s Privacy Program Page must serve as a central source for information about the agency’s practices with respect to PII. The agency’s Privacy Program Page must be located at www.[agency].gov/privacy and must be accessible through the agency’s “About” page.”
Acceptance criteria:
As a product owner, I want to give the project proper URLs so that the site looks good and is easy to use and share.
As an external developer, I need to discover open APIs, so that I can access underlying data assets and incorporate them in my software application.
Acceptance criteria:
As an OMB examiner, I need to discover agency.gov/developer pages, so that I can ensure agency compliance with OMB memo.
Acceptance criteria:
As a product owner, I would like to document the desired permission levels for each part of the program so that we can easily set those up and so that future collaborators can understand how the project works more readily.
As a federal website owner, I want to be able to benefit from the 200 scanner for my website, regardless of which agency I work at or branch of government I'm in, so that I can improve my website and monitor my compliance with laws and mandates.
To that end, we should transition to using this source file for our scans:
https://github.com/GSA/data/blob/master/dotgov-domains/current-federal.csv
As the USWDS product owner, I need to know which agencies are using the USWDS, so that I can track usage.
Note: The PO thinks looking for the style sheet might be the easiest way, which starts with the prefix "-usa". Another possible avenue is to look for the font, "public sans."
Acceptance criteria:
As the site scanning developer, I want a subdomain list to work off of, so that it's not extremely difficult to stand up a prototype for subdomain scans.
As TTS leadership (TTS Deputy Commissioner and OPP Assistant Commissioner), I need to see data about my programs in a useful way, so that I can advocate for my programs (impact, additional resources, new gov-wide mandates, etc) effectively.
Design what the output will look like.
As the USWDS product owner, I need an engineer to look at implementations of USWDS to identify common components, so I can search for those components in my site scan.
Acceptance criteria:
As somebody who is interested in finding out about US Government websites, I need to be able to find the results of scans being conducted by this 10x project.
Acceptance criteria:
As a user of this 10x project, I would like the scanner API to be running, up to date with the latest code, and with scans run daily.
Acceptance Criteria:
As a TTS product owner (data.gov, api.data.gov, search.gov), I want to see a non-JSON presentation of the data, so that I'm not intimidated and can easily find the information that I'm looking for
Acceptance criteria:
As a user of this service, I want to be able to easily access the underlying JSON flat files as well as be able to point others to them.
To that end, we should make the s3 bucket public and document in /docs/ the permanent location of the individual files or folder.
As a site scanning user, I want to easily be able to find the results of the scans, so that I can use them to improve my website performance.
Acceptance criteria:
For any of the 200 scans, as a stakeholder, I would like to note results that are not exactly correct in their location and thus are not being caught by the scanner but are close and thus represent important agency work so that I can know which agencies are mostly compliant and just need a tweak to go the rest of the way.
For example, OMB policy requires agency API hubs to be found at agency.gov/developer, but in many cases, the agency instead placed it at agency.gov/developers. We would like to work with such an agency to set up a redirect so that /developer and /developers both resolve to the same page, but it is still very helpful to know about the hub.
As the engineer creating a data presentation prototype, I need to understand how the data may be used, so that I can present options that make sense for the users.
E.g. As the owner of data.gov, I want to see a list of every domain that has a /data.json file.
As a product owner, I want to begin getting feedback from potential stakeholders of the 200 scanner results so that we can best iterate the initial MVP and ensure that it's useful to those stakeholders.
As the site scanning product owner, I need access to the sandbox version of cloud.gov, to allow more flexibility to bring on resources over time.
As an agency webmaster, I need to know that spiders are receiving proper instructions about how to index my websites and what to index from them, so that I can shape public access to the site via search engines, and the spiders’ behavior while they’re working.
Acceptance criteria:
As the TTS IDEA POCs, I need to ensure the optics on the scanners is appropriate, so that TTS doesn't inadvertently alarm agencies on the results.
https://gsa-tts.slack.com/archives/CJ9JT2F2T/p1564151623064900
As a project archeologist, I would like to dive back into slack logs to find intelligence on past scanning efforts so that we can make better informed plans for the future scan.
As the site scanning product owner, I'd like to control the permissions of the product, so that I can maintain flexibility as the product matures
As I product owner, I want to create a team google group so that we can circulate emails to the team easily, but also so that we can publish it externally for outside agencies and the public to get in touch if they have questions.
For #48.
As a website user, I want the site to still be live/not to rebuild when files in the doc file are edited.
As the site scanning developer, I need to get a sketch of what the site scanning product owner is envisioning, so that I can appropriately build out the visualization layer.
As a 10x product manager, I want to know what a good, healthy range of implementation is to so that we can get the results to detect on the scanner.
As the site scanning product owner, I want to experiment with scans at the subdomain level, so that I can test whether results are useful to agencies.
As an OMB examiner, I need to discover agency.gov/data.json pages, so that I can ensure agency compliance with law.
Acceptance criteria:
As the USWDS product owner, I need to know which agencies are using the USWDS, so that I can inquire with agencies and have them contribute back to USWDS.
Note: The PO thinks looking for the style sheet might be the easiest way, which starts with the prefix "-usa". Another possible avenue is to look for the font, "public sans."
Acceptance criteria:
As a project team member, I'm interested in researching the code that we already have access to for USWDS scanning in order to evaluate its potential and flaws.
There's one (possibly two) pilot code projects to see:
top-level domains that are using USWDS:
CitizenScience.gov
ClinicalTrials.gov
code.mil
cloud.gov
cbp.gov
dds.mil
dnfsb.gov
commerce.gov
dhs.gov
dietaryguidelines.gov
dotgov.gov
epa.gov
fca.gov
fcsic.gov
fec.gov
ffb.gov
fpc.gov
fedramp.gov
foia.gov
gsa.gov
healthcare.gov
imls.gov
iawg.gov
irs.gov
itdashboard.gov
login.gov
manufacturing.gov
medicaid.gov
move.mil
mymedicare.gov
nih.gov
floodsmart.gov
opioids.gov
performance.gov
plainlanguage.gov
pclob.gov
search.gov
sba.gov
stopbullying.gov
upremecourt.gov
tsa.gov
usagm.gov
usaid.gov
usda.gov
dol.gov
treasury.gov
va.gov
usds.gov
flra.gov
uscis.gov
uscourts.gov
usich.gov
unlocktalent.gov
usa.gov
usaid.gov
usajobs.gov
usaspending.gov
usgs.gov
vote.gov
whitehouse.gov
worker.gov
As the USWDS, I need to expand upon the known USWDS list, so that I can make my results more accurate.
As an OMB examiner, I need to confirm which agencies have implemented USWDS, so I can compare this data to new .gov domains and ensure compliance with the law (IDEA).
Acceptance criteria:
As an OPP API product owner, I need to discover agency.gov/developer pages, so that I can find potential users of the api.data.gov shared service.
Acceptance criteria:
As an agency CIO or CTO, I need to discover agency.gov/developer pages, so that I can discover API programs at my agency and integrate them into my agency API coordination effort.
Acceptance criteria:
As the USWDS product owner, I need to understand and provide feedback on which components to search for in a USWDS scan, so that the results present the most accurate data feasible for my decision-making.
As a product owner, I would like the engineer to update the permissions to our system so that we can ensure redundancy and best collaborate on the project.
Implement the Cloud.gov and AWS permission changes listed here - #32
The GitHub ones are already done.
As the OPP Search.gov product owner, I need to know which agencies are providing core SEO support elements, so I can support them appropriately in making their content findable.
Acceptance criteria:
As a user of the scan search UI, I need to be able to download scan data that I have selected in the various scan search pages in some standard formats so that I can import the data into software that I can use to visualize the data in customized ways.
I imagine that users would like to be able to export the scans of the sites that they have searched for as json and CSV.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.