GithubHelp home page GithubHelp logo

ckanext-geodatagov's Introduction

Github Actions PyPI version

Data.gov

Data.gov is an open data website created by the U.S. General Services Administration that is based on two robust open source projects: CKAN and WordPress. The data catalog at catalog.data.gov is powered by CKAN, while the content seen at Data.gov is powered by WordPress.

For all code, bugs, and feature requests related to Data.gov, see the project wide Data.gov issue tracker.

Currently this repository is only used for source version control on the code for the CKAN extension for geospatial data, but you can see all of the Data.gov relevant repos listed in the GSA Data.gov README file.

CKAN Extension for Geospatial Data

Most Data.gov specific CKAN customizations are contained within this extension, but the extension also provides additional geospatial capabilities.

Customization

Due to CKAN 2.3 and 2.8 migrations, some features should be removed or moved to the official community versions:

Requirements

Package Notes
ckanext-harvest --
ckanext-spatial --
PyZ3950 --
werkzeug This only effects the tests. For all intents and purposes, this should be tracking upstream

This extension is compatible with these versions of CKAN.

CKAN version Compatibility
<=2.8 no
2.9 0.1.37 (last supported)
2.10 >=0.2.0

Tests

All the tests live in the /ckanext/geodatagov/tests folder. Github actions is configured to run the tests against CKAN 2.10 when you open a pull request.

Using the Docker Dev Environment

Build Environment

To start environment, run: docker-compose build docker-compose up

CKAN will start at localhost:5000

To shut down environment, run:

docker-compose down

To docker exec into the CKAN image, run:

docker-compose exec app /bin/bash

Testing

They follow the guidelines for testing CKAN extensions.

To run the extension tests, start the containers with make up, then:

$ make test

Lint the code.

$ make lint

Debugging

We have not determined a good way for most IDE native debugging, however you can use the built in Python pdb debugger. Simply run make debug, which will run docker with an interactive shell. Add import pdb; pdb.set_trace() anywhere you want to start debugging, and if the code is triggered you should see a command prompt waiting in the shell. Use a pdb cheat sheet when starting to learn like this.

When you edit/add/remove code, the server is smart enough to restart. If you are editing logic that is not part of the webserver (ckan command, etc) then you should be able to run the command after edits and get the same debugger prompt.

Matrix builds

The existing development environment assumes a full catalog.data.gov test setup. This makes it difficult to develop and test against new versions of CKAN (or really any dependency) because everything is tightly coupled and would require us to upgrade everything at once which doesn't really work. A new make target test-new is introduced with a new docker-compose file.

The "new" development environment drops as many dependencies as possible. It is not meant to have feature parity with GSA/catalog.data.gov. Tests should mock external dependencies where possible.

In order to support multiple versions of CKAN, or even upgrade to new versions of CKAN, we support development and testing through the CKAN_VERSION environment variable.

$ make CKAN_VERSION=2.10 test

Command line interface

The following operations can be run from the command line as described underneath::

  geodatagov sitemap-to-s3 [{upload_to_s3}] [{page_size}] [{max_per_page}]
    - Generates sitemap and uploads to s3

  geodatagov db-solr-sync [{dryrun}] [{cleanup_solr}] [{update_solr}]
    - DB Solr sync. 

  geodatagov tracking-update [{start_date}]
    - ckan tracking update with customized options and output

Credit / Copying

Original work written by the HealthData.gov team. It has been modified in support of Data.gov.

As a work of the United States Government, this package is in the public domain within the United States. Additionally, we waive copyright and related rights in the work worldwide through the CC0 1.0 Universal public domain dedication (which can be found at http://creativecommons.org/publicdomain/zero/1.0/).

Ways to Contribute

We're so glad you're thinking about contributing to ckanext-datajson!

Before contributing to ckanext-datajson we encourage you to read our CONTRIBUTING guide, our LICENSE, and our README (you are here), all of which should be in this repository. If you have any questions, you can email the Data.gov team at [email protected].

ckanext-geodatagov's People

Contributors

adborden avatar ajturner avatar amercader avatar avdata99 avatar btylerburton avatar chris-macdermaid avatar david-blubaugh avatar dependabot[bot] avatar fuhuxia avatar hkdctol avatar jbrown-xentity avatar jin-sun-tts avatar jjediny avatar joetsoi avatar johnmartin avatar kalxas avatar kindly avatar kvuppala avatar mogul avatar nickumia-reisys avatar philipashlock avatar robert-bryson avatar rossjones avatar rshewitt avatar thejuliekramer avatar tobes avatar ydave-reisys avatar ziadhaddad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ckanext-geodatagov's Issues

Collections Templates

What we need:

name needs to be coupled to online resource

We need urls pulled from digform elements for fgdc to have their names tracked with the urls. Example fgdc is here: http://catalog.data.gov/harvest/object/4e56965f-e304-43bc-87ad-7692f5bc5327/original Which creates this record: http://catalog.data.gov/dataset/usgs-water-quality-data-for-the-nation-national-water-information-system-nwis The issue is that this call to the CI_OnlineResource template: https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/data/fgdcrse2iso19115-2.xslt#L2514 is down at the networka level where the formname element is not available. The formname needs to be attached to the networka element in ISO and on data.gov.

Map viewer enhacements

  • External WMS/KML viewer
    • Advanced viewer filling whole space
    • Link to advanced viewer on resource page
  • Preview ArcGIS Online Maps

Harvest source user experience and workflows

OK, so in order to improve the workflow of the harvest source forms and org admin stuff there needs to be:

  • A public listing for harvest sources within an organization so that the breadcrumbs link to a public page
  • Within the harvest source admin area of a org, there should be an error count/link to last job (if a job exists)
  • When in admin mode for a harvest source and said harvest source is owned by an org there needs to be a way to get back to the "org/admin/harvest source" page
  • When a harvest source get's created and it is linked to an organization, the redirect should go to the "org/admin/harvest source" page

Additional Info dataset tweaks

Change additional info to something less debug-like, using English names, remove bracket quotes, for the following items:

Harvest Source
Resource Type
Data Publication Date (dataset-reference-date)
Tags
Responsible Party
Contact Email

Select which fields to extract in import stage.

Need to find out what core fields needs to be extracted. This should include the Xpath to the extra resources with this xpath

gmd:identificationInfo/srv:SV_ServiceIdentification/gmd:citation/gmd:CI_Citation/gmd:citedResponsibleParty/gmd:CI_ResponsibleParty/gmd:contactInfo/gmd:CI_Contact/gmd:onlineResource/gmd:CI_OnlineResource

transformation error causing USGS records to not get harvested

Many records from USGS are resulting in a transformation error due to an error in the fgdc to iso XSL.

This is one of the CSDGM records causing this error:
http://data.usgs.gov/metadata/Mineral_Resources_On-Line_Spatial_Data/535e99ace4b08e65d60f8e2b.xml

This is the error message:

Transformation to ISO failed
The transformation service returned an error for object {0}: [409] net.sf.saxon.trans.XPathException: A sequence of more than one item is not allowed as the first argument of normalize-space() ("http://pubs.usgs.gov/of/1997/o...", "http://pubs.usgs.gov/of/1997/o...", ...)

This is where the transform is taking all URLs from all elements into ONE CI_OnlineResource/linkage/URL element:
https://github.com/GSA/ckanext-geodatagov/blob/master/conversiontool/fgdc2iso/fgdcrse2iso19115-2.xslt#L4308

Thanks in advance for looking into this. I'm available for question or further testing if needed.

Expand and contract facet lists

Should be able to expand and contract facet lists by clicking the title.

Things we need to do:

  • Change facet icons to chevrons icons to show collapsibility
  • Add class for hiding facet contents

Unable to create 'PyZ3950_parsetab.py'

I've installed the latest version of ckanext-geodatagov as of 10/25/2016 with "pip install -e git+https://github.com/GSA/ckanext-geodatagov#egg=ckanext-geodatagov". This is running as part of CKAN 2.3.4. I'm getting this error:

[:error] [pid 1532] Unable to create 'PyZ3950_parsetab.py'
[:error] [pid 1532] [Errno 13] Permission denied: 'PyZ3950_parsetab.py'

I also have an older version of running of ckanext-geodatagov (v1.1-alpha) running with CKAN 2.1aand it's been reporting this error for years without noticeable issues. This was installed before I took over maintaining CKAN.

I've tried everything I can think of to get this error to go away. It doesn't seem to be a permission or ownership problem at least within the ckanext-geodatagov directory structure and I've disabled SELinux. Where is ckanext-geodatagov trying to create this file? What issues will this cause if it's not fixed?

Transform issues

I received an email with someone asking about the transform issues and an error that they were getting. (I am the original author of the transform and worked with Doug, Anna, and Ted on subsequent versions.) Here is an excerpt from the email about the error:
Error:
The transformation service returned an error for object {0}: [409] net.sf.saxon.trans.XPathException: A sequence of more than one item is not allowed as the value in 'cast as' expression
Transformation to ISO failed

Kishore has informed me that this error relates specifically to repeated occurrences of within a single section. He says that the ISO Transform rule allows only a single occurrence within a section in order for a CSDGM record to successfully transform to ISO.

There seems to be issues with the data.gov transform when it comes to multiple links and multiple originator tags. I have narrowed down the issues to commented out sections and new ones rewritten.

Look at lines 2498-2534 and lines 921-3928. These edits seem to be the issues. I am wondering if there were some issues that caused these edits to the functioning transforms.

I'd be happy to work with someone on fixing transform issues as time allows.

Thanks!

Dashboard update

The dashboard needs to be updated on data.gov to be:

  • Only 5 organizations with a link to see all my orgs
  • Also should have groups under the same logic

Homepage data tweaks

  • Adding dataset counts: Total datasets, Datasets including collection datasets, & counts of: “map services (esriRest + WMS), data services (WFS, WPS, Esri Feature service), collections, data sets, applications (executable and web, incl webmaps), web pages”
  • Adding latest modified/added datasets (resources) ‘feed’

Onlink mandatory for FGDC records?

We've finally been able to start testing harvests of native geospatial records, and are finding that FGDC records that have no onlink parameter fail the harvest with an error message of "No resources invalid metadata". Since online linkage is a mandatory-if-applicable field, is there a reason this particular rule is being enforced? Could it possibly be relaxed?

Spatial filter for dataset search

At the geodatagov extension level this includes:

  • Custom simple geocoding service with the locations and bboxes provided.
  • Text autocomplete powered by the previous service on the dataset search field
  • Map widget to allow drawing an area of interest
  • Integration between the two widgets (eg, draw the corresponding bbox when selecting an item on the locations list

Search placeholders

  • Global search input should be: "Search datasets..."
  • Search for /datasets to be: "Search datasets..."
  • Search with collection: "Search collection..."

Small UI tweaks

They are:

  • Smaller font-size on <h1>
  • Dataset listings shouldn't truncate the dataset title
  • Move /dataset helper text into tooltip
  • Remove last line of homepage alert (the one about blog post)
  • Remove 'Geospatial Data' image replace with standard font
  • Change all serif text to sans-serif on homepage (+ others)
  • Remove search label on homepage replace placeholder on input with: "Search geospatial datasets..."

Change harvest "reharvest" confirmation message

Generic:

This will re-run the harvesting for this source. Any updates at the source will overwrite the local datasets. Please confirm you would like to start reharvesting.

Specific

This will re-run the harvesting for this source. Any updates at the source will overwrite the local datasets. Large collections may take a couple of days to finish harvesting. Please confirm you would like to start reharvesting.

See ckan/ckanext-harvest#26 (comment) for context

Resource items within dataset listings tweaks

  • Display item for all resources that are associated (not condensed) to a dataset
  • Resources should link to either the download or resource page (see logic on dataset page)
  • When there are > 6 resources display extras as condensed link to dataset page
  • Also data-type="data" should be html image and should be Web page within dataset listings

"Harvest Sources" root nav is marked as selected within /datasets (and vice-versa)

OK this one is a complicated one. I'm going to write a quick fix within ckanext-geodatagov but we really probably should fix this in ckan or ckanext-harvest

Here's some background:

On ckanext-geodatagov we've added "Harvest Sources" as a main nav item. See: https://github.com/okfn/ckanext-geodatagov/blob/master/ckanext/geodatagov/templates/header.html#L86

Now when you visit http://geo.gov.ckan.org/ and visit the harvest source or dataset index pages you get:

Screen Shot 2013-03-12 at 14 33 27

Not good.

After some investigation. It all comes down to this: https://github.com/okfn/ckan/blob/master/ckan/lib/helpers.py#L302

Basically because _link_active is returning true in both instances because within ckanext-harvest the routes harvest_search and package_search are the same controller + action.

@amercader @tobes I'll write the template fix quickly now, but what's the correct fix for this behaviour?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.