Comments (10)
Marina, what do you envision going in this field? Free-form descriptive text? I don't know anything about the processes within government that will track this documentation—is the most logical way to relate a dataset to its private-only rationale to do so within a field like this?
from project-open-data.github.io.
Hi Marina,
Apparently I replied to wrong address. Resending.
Begin forwarded message:
From: Bernadette Hyland [email protected]
Subject: Re: [project-open-data.github.io] Add operatingUnit Field (#89)
Date: July 29, 2013 4:31:50 PM EDT
To: "project-open-data/project-open-data.github.io" [email protected]
Cc: Fadi Maali [email protected], John Erickson [email protected]Hi Marina,
A lot is known about the problem you describe & some really smart people have already cracked this nut. Agencies collect, curate and publish data in all sorts of ways with all sorts of contact information, often office emails & telephone numbers. Contact info comes in a wide variety of formats containing no name, first name, last name, salutation, title, agency, organization, and addresses -- the bane of many data administrators existence. There are policy & technology issues at play here.IMO, the Open Data Project could do a great service to get behind, through providing input, a standard vocabulary for describing government published datasets. One such effort that has benefited from some really smart people dedicated to Web standards & government transparency is the RDF vocabulary called DCAT.[1] I'm sure there are others too, but I'm familiar with this project.
DCAT is nearly publication as an open Web standard and has been produced in a transparent, peer-reviewed manner. I encourage you to post your questions & feedback to [email protected] so we can work cooperatively to advance open government publication efforts. If you're facing some things that haven't been contemplated by DCAT, now would be a great time to address this.
Cheers,
Bernadette Hyland
CEO, 3 Round Stones, Inc
co-chair W3C Government Linked Data WG[1] http://www.w3.org/TR/vocab-dcat/
On Jul 22, 2013, at 8:51 PM, MarinaMartin [email protected] wrote:
While datasets are ultimately owned by an agency, they are really collected and maintained on an operating unit basis. While contact names and emails may change, a dataset's associated operating unit probably will not. Making this a new, required field makes it clearer where to go with questions for the public consuming the data, the agency officials responsible for updating the metadata, and other agencies looking to access the data. It can also help agencies assess internal compliance with publishing data, and is likely to be part of an agency internal data management system for workflow purposes.
Different agencies call their sub-units different things: departments, POCs, bureaus, etc. In asking around, "operating unit" was most generic, but I'm open to an even more generic term.
What do you all think?
—
Reply to this email directly or view it on GitHub.
from project-open-data.github.io.
Ideally, there would be something like a FOIA-type system, where if data doesn't meet one of a number of criteria for nonrelease it would be required to be released, and thus this field would need to be one of the predefined criteria. Logistically, however, this may be too ambitious. It may be good for us to create a set of "acceptable" criteria that we could give to agencies as suggested guidance for why a dataset may not be releasable (and the reverse).
What about NonReleaseJustification or RestrictionJustification?
from project-open-data.github.io.
@waldoj Yes I envision it as being a free-text field. The agencies already have to collect this information for each new dataset created/collected that's not going to be released, going forward. So isn't it logical to store this reason in the Enterprise data inventory (which, remember, is private -- not the public inventory)? They're storing it anyway -- but without a field they will, if I were to guess, store them separately and in a harder-to-find-internally spot.
@seanherron I think the list of options here is way too broad and will be defined by agencies' general counsels. I would suggest leaving this as a free text field and not providing criteria.
from project-open-data.github.io.
P.S. I have no problem with changing the name of this suggested field.
from project-open-data.github.io.
@BernHyland We made great efforts to match DCAT in this schema wherever possible -- the only two existing fields that do not match DCAT are accessLevel and systemOfRecord. This issue is specifically about giving agencies a place to document the reason for NOT releasing a particular dataset, in their internal-only enterprise data asset inventories. I'm not so so sure that is widely applicable enough to warrant inclusion in a standard like DCAT but I appreciate the reminder to stay involved in those conversations!
from project-open-data.github.io.
Does the benefit of encouraging better behavior outweigh the complexity that adding this brings? I don't think it's an overly large addition to the agency workload but it is an added lift. In general, I always worry about the Christmas tree effect when it comes to adding further to what each agency is required to do.
from project-open-data.github.io.
@gbinal I think if some sort of rationale isn't included people will either a) assume that the intent is nefarious and that we are hiding the data for no good reason, b) email the POC and ask for clarification/release, or c) forget about it entirely. For high-volume and frequently desired datasets (maybe some of the HHS data that has potential for PII, etc) putting a reasonable statement out there as to why it's private is good for transparency and will reduce the number of queries to the POC and angry tweets if they assume it's private for a questionable reason.
My concern would be that agencies wouldn't provide this information for legal reasons or would provide obtuse legalese that is difficult to parse and understand. If it's not going to be used, then there's not a lot of value in adding it.
from project-open-data.github.io.
I understand the Christmas tree argument, but in this case it seems merited. It'll only add work for non-released datasets (which are already saving a lot of work by not being released!), and agencies should have an on-the-record reason for not releasing a dataset anyway.
I also support keeping this a free text field, rather than selecting a preset exemption, to encourage a descriptive rationale. The field won't mean much if it doesn't communicate more than a category.
from project-open-data.github.io.
The discussion has moved towards combining the intent of this proposed field with the accessDetails field in #90. I'm closing this discussion -- please chat over at #90.
from project-open-data.github.io.
Related Issues (20)
- Update Geospatial Metadata/POD Crosswalks (CSDGM, ISO 19115, and ISO 19115-1)
- Guidance and migration pathway for globally unique IDs
- GraphQL HOT 3
- display latest quarterly milestone at the top HOT 1
- Design overhaul to refocus on automated metrics HOT 1
- CUI alignment... & A-119
- Federal open ArcGIS data from the top down HOT 5
- Migration pathway for updating namespace URIs from http to https
- Glossary refinement
- Expand Use of Digital Object Identifiers (DOI)s in the POD Metadata HOT 2
- Program Code description should be clarified regarding required status
- Harmonize POD with Google Dataset Search guidance HOT 1
- Align metadata schema with DCAT v2
- Add dcat:CatalogRecord as optional
- Licensing guidance for government forks of open source projects
- Evaluate new fields as required by Evidence Act HOT 2
- Need updated, canonical source for bureau codes
- data.gov redirect HOT 1
- JSON Schema $ref use issue
- Unverified user accounts HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from project-open-data.github.io.