GithubHelp home page GithubHelp logo

ckanapi's Issues

LocalCKAN seems to be broken

Traceback (most recent call last):
  File "3500_unclassified_metadata.py", line 8, in <module>
    ckan = ckanapi.LocalCKAN()
  File "/home/vagrant/projects/ckan/ckanapi/ckanapi/localckan.py", line
 19, in __init__
    username = self.get_site_username()
  File "/home/vagrant/projects/ckan/ckanapi/ckanapi/localckan.py", line
 25, in get_site_username
    user = self._get_action('get_site_user')({'ignore_auth': True}, ())
  File "/home/vagrant/projects/ckan/ckan/ckan/logic/__init__.py", line 
419, in wrapped
    result = _action(context, data_dict, **kw)
  File "/home/vagrant/projects/ckan/ckan/ckan/logic/action/get.py", lin
e 1962, in get_site_user
    user = model.User.get(site_id)
  File "/home/vagrant/projects/ckan/ckan/ckan/model/user.py", line 61, 
in get
    return query.first()
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2156, in first
    ret = list(self[0:1])
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2023, in __getitem__
    return list(res)
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2227, in __iter__
    return self._execute_and_instances(context)
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
    close_with_result=True)
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
    **kw)
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/session.py", line 727, in connection
    bind = self.get_bind(mapper, clause=clause, **kw)
  File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/session.py", line 975, in get_bind
    ', '.join(context)))
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configure
d on mapper Mapper|User|user, SQL expression or this Session

Same traceback on CKAN 2.2, 2.2.1 and master.

This is probably CKAN's fault not ckanapis.

When importing resources between CKAN instances, url is not preserved if url_type='upload'

Reproducing:

  1. Create a dataset on CKAN1, uploading a file for a resource
  2. Import this dataset into CKAN2:
ckan1 = ckanapi.RemoteCKAN('http://ckan1.org', apikey=...)
ckan2 = ckanapi.RemoteCKAN('http://ckan2.org, apikey=...)

package = ckan1.action.package_show(id='package1')
ckan2.action.package_create(**package)

Expected result should be that, either:
a. The dataset will be migrated, along with it's resource files
b. The dataset metadata will be migrated, with the resource file links still pointing to CKAN1

Instead, uploaded resource files are not migrated but their url bases are changed to http://ckan2.org instead of http://ckan1.org

This happens when resource['url_type'] is set to upload. The bug can be fixed by migrating the dataset like so:

for res in package['resources']:
  res.pop('url_type')

ckan2.action.package_create(**package)

I am not sure whether this is a CKAN issue, or a ckanapi issue, so I'm filing it here. The relevant code in CKAN is here.

Another thing to note, which does seem to be a CKAN bug, is that the resource is saved in the database with it's proper url - pointing to http://ckan1.org. It is only indexed in Solr with the wrong url.

Error while using "dump datasets"

I'm getting this error while trying to download the datasets from a ckan instance

Traceback (most recent call last):
File "c:\Anaconda\Scripts\ckanapi-script.py", line 9, in
load_entry_point('ckanapi==3.3', 'console_scripts', 'ckanapi')()
File "c:\Anaconda\lib\site-packages\ckanapi\cli\main.py", line 95, in main
return dump_things(ckan, thing[0], arguments)
File "c:\Anaconda\lib\site-packages\ckanapi\cli\dump.py", line 67, in dump_things
for job_ids, finished, result in pool:
File "c:\Anaconda\lib\site-packages\ckanapi\cli\workers.py", line 93, in worker_poo
readable, _, _ = select.select(worker_fds, [], [])
select.error: (10038, 'An operation was attempted on something that is not a socket')

ckanapi dump datasets --all doesn't work for catalog.data.gov

I am trying to download all metadata from catalog.data.gov. The following command returns an API error.

ckanapi dump datasets --all -O catalog.data.gov.jsonl -p 1 -r http://catalog.data.gov

The error:

Traceback (most recent call last):
  File "/home/ekzhu/anaconda/bin/ckanapi", line 11, in <module>
    sys.exit(main())
  File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/cli/main.py", line 128, in main
    return dump_things(ckan, thing[0], arguments)
  File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/cli/dump.py", line 60, in dump_things
    names = ckan.call_action(get_thing_list, {})
  File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 83, in call_action
    return reverse_apicontroller_action(url, status, response)
  File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 131, in reverse_apicontroller_action
    raise CKANAPIError(repr([url, status, response]))
ckanapi.errors.CKANAPIError: ['http://catalog.data.gov/api/action/package_list', 302, u'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>302 Found</title>\n</head><body>\n<h1>Found</h1>\n<p>The document has moved <a href="http://catalog.data.gov/api/action/package_search">here</a>.</p>\n</body></html>\n']

It looks like the catalog.data.gov have redirected the package_list to package_search. Is there a way to handle this? Am I missing something?

Weird Error On Resource Upload

import StringIO
import ckanapi
import json
import os

CKAN_URL = 'http://www.opendatadc.org'
CKAN_API_KEY = os.environ['CKAN_API_KEY']

ckan = ckanapi.RemoteCKAN(
    CKAN_URL,
    apikey=CKAN_API_KEY,
)
payload = {'description': 'Scraped from http://geospatial.dcgis.dc.gov/ocf/getData.aspx',
 'format': 'json',
 'id': u'808ebae5-d23c-451f-970e-a657b3d3d540',
 'mimetype': 'application/json',
 'name': 'Committees Running Per Year',
 'package_id': 'campaign-finance',
 'upload': StringIO.StringIO('["dfd", "dfd"]'),
 'url': 'http://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540'}

ckan.action.resource_update(**payload)

Gives me a weird exception

CKANAPIError                              Traceback (most recent call last)
<ipython-input-4-0261da11b7f6> in <module>()
     19  'url': 'http://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540'}
     20
---> 21 ckan.action.resource_update(**payload)

/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/common.pyc in action(**kwargs)
     47                 return self._ckan.call_action(name,
     48                     data_dict=nonfiles,
---> 49                     files=files)
     50             return self._ckan.call_action(name, data_dict=kwargs)
     51         return action

/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
     80         else:
     81             status, response = self._request_fn(url, data, headers, files)
---> 82         return reverse_apicontroller_action(url, status, response)
     83
     84     def _request_fn(self, url, data, headers, files):

/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
    104
    105     # don't recognize the error
--> 106     raise CKANAPIError(repr([url, status, response]))

CKANAPIError: ['http://www.opendatadc.org/api/action/resource_update', 400, u'"Bad request - JSON Error: Error decoding JSON data. Error: JSONDecodeError(\'Expecting value: line 1 column 2 (char 1)\',) JSON data extracted from the request: \'--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"mimetype\\"\\\\r\\\\n\\\\r\\\\napplication/json\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"description\\"\\\\r\\\\n\\\\r\\\\nScraped from http://geospatial.dcgis.dc.gov/ocf/getData.aspx\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"format\\"\\\\r\\\\n\\\\r\\\\njson\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"url\\"\\\\r\\\\n\\\\r\\\\nhttp://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"package_id\\"\\\\r\\\\n\\\\r\\\\ncampaign-finance\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"id\\"\\\\r\\\\n\\\\r\\\\n808ebae5-d23c-451f-970e-a657b3d3d540\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"name\\"\\\\r\\\\n\\\\r\\\\nCommittees Running Per Year\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"upload\\"; filename=\\"upload\\"\\\\r\\\\nContent-type: text/plain\\\\r\\\\n\\\\r\\\\n[\\"dfd\\", \\"dfd\\"]\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;--\'"']

resource_create not working for 2.3

I'm attempting to add new resources to packages following the api instructions but I'm getting:

ckanapi.errors.CKANAPIError: ['http://xxxx/api/action/resource_create', 400, u'"Bad request - JSON Error: No request body data"']

Here's my code.

import ckanapi
mysite = ckanapi.RemoteCKAN('http://xxx',
apikey=api_key,
mysite.action.resource_create(
package_id='XX-XXX',
upload=open('/Users/s/stuff/stuff.csv'))

Any ideas?

Handle CLI errors more gracefully

Things like:

  • errors raised as part of a ckanapi action call
  • badly formatted JSONL input
  • unexpected errors when using ckanapi dump or ckanapi load with a local CKAN instance

all result in tracebacks, but aren't ckanapi bugs. We should catch these and display them better. Perhaps with an attractive shade of red text.

Ckan API fails on OS X ; missing pylons dependency

Latest version of CKAN and Pylons is installed in my virtualenv, yet I get:

import ckanapi
File "/Users/peder/source/ckanapi/ckanapi.py", line 64, in
from ckan.logic import (ParameterError, NotAuthorized, NotFound,
File "/Users/peder/Envs/ckan-datatools/lib/python2.7/site-packages/ckan-2.0b-py2.7.egg/ckan/logic/init.py", line 8, in
import ckan.lib.base as base
File "/Users/peder/Envs/ckan-datatools/lib/python2.7/site-packages/ckan-2.0b-py2.7.egg/ckan/lib/base.py", line 9, in
from pylons import c, cache, config, g, request, response, session
ImportError: cannot import name c

re-installing with Travis as part of the mix?

Ian, I had previously installed version 2.0 just with 'python setup.py install'. I think you've introduce a travis build for which I have little experience. Do I need to have Travis CI installed in order to run the yml in order that the requirements.txt get installed properly. Sorry, I'm a bit green around these details .. so if you were able to spell it out in the readme, that would be great!

Ref your thread on the ckan-dev forum (https://lists.okfn.org/pipermail/ckan-dev/2014-February/006774.html), it seems like the functionality has expanded alot, which is great. I just need to figure things out for myself from the few examples you gave. In terms of the bulk load from datasets.jsonl.gz, is there any way you might be able to add a sample file to the repo so I can see what this format looks like. Is there a dump feature in the CLI that outputs datasets in this format?

I'm still not clear on whether I can use the ckanapi to modify data in the datastore? Is it just limited to operations on the meta data in the main ckan meta-data database? If not, would you have an example of how ckanapi can be used against a table for a resource, maybe using demo.ckan.org?

Thanks for your help with this. Colum

Running actions fails with missing dependencies

I installed the ckanapi package as follows

$ pip install ckanapi

Running the package_list action fails as follows. Note this is a remote ckan instance that I'm accessing as a user. I don't want to install or run a ckan server locally:

$ ckanapi action package_list http://ckanhost/api
Traceback (most recent call last):
(...)
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: PasteScript

Installing PasteScript reveals a further depencency on ckan itself:

$ pip install PasteScript
$ ckanapi action package_list http://ckanhost/api
Traceback (most recent call last):
  File ".virtualenvs/geops/local/lib/python2.7/site-packages/ckanapi/cli/paster.py", line 2, in <module>
    from ckan.lib.cli import CkanCommand
ImportError: No module named ckan.lib.cli

I did continue installing ckan, which however requires to pull in more and more dependencies, eventually rending up with a pylon import error:

$ <lots of pip install's>
$ ckanapi action package_list http://ckanhost/api
    from pylons import g, c, request, session, response
ImportError: cannot import name g

IMHO one should not be required to install the server (AFAIK that's what the ckan package provides, if I'm not mistaken) just to use the API client.

What am I missing?

python3 issue

I got this error while trying to run the following command from the readme:

ckanapi action group_list -r http://dados.gov.br
Traceback (most recent call last):
  File "/usr/local/bin/ckanapi", line 11, in <module>
    load_entry_point('ckanapi==3.6', 'console_scripts', 'ckanapi')()
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 564, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2621, in load_entry_point
    return ep.load()
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2281, in load
    return self.resolve()
  File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2287, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/local/lib/python3.5/dist-packages/ckanapi/cli/main.py", line 112
    except CLIError, e:
                   ^
SyntaxError: invalid syntax

the except clause has not been made compatible with python 3.x syntax

SSLError: hostname 'datahub.io' doesn't match either of '*.ckan.io', 'ckan.io'

Hi,

I try to connect to https://datahub.io/ using the ckanapi Python Module. My code is this:

import ckanapi

ckan = ckanapi.RemoteCKAN('http://datahub.io/api',
apikey = 'my_api_key_here',
user_agent='ckanapiexample/1.0 (+http://example.com/my/website)')

packages = ckan.action.package_list()
print packages

I get the following error:
Traceback (most recent call last):
File "test-ckanapi.py", line 7, in
packages = ckan.action.package_list()
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\common.py", line 51, in action
return self._ckan.call_action(name, data_dict=kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\remoteckan.py", line 82, in call_action
status, response = self._request_fn(url, data, headers, files, requests_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\remoteckan.py", line 86, in _request_fn
r = requests.post(url, data=data, headers=headers, files=files, *_requests_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\api.py", line 88, in post
return request('post', url, data=data, *_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 383, in request
resp = self.send(prep, *_send_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 506, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 168, in resolve_redirects
allow_redirects=False,
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 486, in send
r = adapter.send(request, *kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\adapters.py", line 389, in send
raise SSLError(e)
requests.exceptions.SSLError: hostname 'datahub.io' doesn't match either of '
.ckan.io', 'ckan.io'

Am I doing something wrong?

How to migrate users between CKANs using ckanapi CLI

I'm trying to migrate users between CKAN catalogues using the ckanapi CLI.

Is there a way to migrate users, seeing that ckanapi dump users does not dump their password hashes? Without those, naturally ckanapi load users will fail like this (usernames anonymised):

ckanapi dump users --all -r http://sourceckan -a my-sysadmin-api-key-source | ckanapi load users -r http://targetckan -a my-sysadmin-api-key-target
0 [1] --- None asdasdf
1 [2] 0.61s None asdfasdf
2 [3] 0.28s None asdfasdf
3 [4] 0.26s None asdfasdf
4 [5] 0.25s None asdfasdf
5 [6] 0.26s None asdasdf
6 [7] 0.25s None asdfasdf
7 [None] 0.33s None asdfasdf
1 [2] --- create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
2 [3] 0.68s update ValidationError {"__type":"Validation Error","email":["Missing value"]}
3 [4] 0.73s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
4 [5] 0.68s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
5 [6] 0.67s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
6 [7] 0.71s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
7 [8] 0.70s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
8 [None] 0.68s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}

This was run on a machine separate from the two (source and target) CKAN instances, using the respective sysadmin's API key.

Validation/missing value error on `ckanapi dump datasets` (on data.noaa.gov)

ckanapi usage and error below. @wardi on IRC suggested I open this issue

Thank you in advance!

$ ckanapi dump datasets -r http://data.noaa.gov --all -O json.json
    Traceback (most recent call last):
      File "/usr/local/bin/ckanapi", line 9, in <module>
        load_entry_point('ckanapi==3.3', 'console_scripts', 'ckanapi')()
      File "/Library/Python/2.7/site-packages/ckanapi/cli/main.py", line 95, in main
        return dump_things(ckan, thing[0], arguments)
      File "/Library/Python/2.7/site-packages/ckanapi/cli/dump.py", line 33, in dump_things
        return dump_things_worker(ckan, thing, arguments)
      File "/Library/Python/2.7/site-packages/ckanapi/cli/dump.py", line 135, in dump_things_worker
        'include_datasets': False})
      File "/Library/Python/2.7/site-packages/ckanapi/remoteckan.py", line 82, in call_action
        return reverse_apicontroller_action(url, status, response)
      File "/Library/Python/2.7/site-packages/ckanapi/common.py", line 103, in reverse_apicontroller_action
        raise ValidationError(err)
    ckanapi.errors.ValidationError: {u'__type': u'Validation Error', u'name_or_id': u'Missing value'}

Harvest sources crash dump datasets

Tested environments:

  • datacats CKAN, latest master (2.5.0a)
  • source install CKAN, latest master (2.5.0a), upgraded since 2.3 master
  • ckanext-harvest with a few harvest sources defined
  • custom schema via ckanext-scheming (no "extra" fields allowed in dataset dict)
ckanapi dump datasets --all -p 6 -r http://SOURCE > datasets.jsonl

This will crash with a server error and broken pipe on the harvest sources, which are specialised datasets and collide with ckanext-scheming's well-defined, but rigid idea of dataset schemas.

Workarounds:

  • delete harvest sources at source CKAN
  • ckanapi dump chunks of datsets (with offset and max. parameters) omitting exactly only the harvest sources.

On a side note, some old datasets, which have been edited last before we switched to ckanext-scheming, still have "extra" fields, which will crash during ckanapi load. Had to manually remove the extra keys from those datasets to make ckanapi load them.

Cant purge a dataset

I cant seem to use dataset_purge (as stated in the API) but i can delete. Am I missing something_

ckan.action.dataset_purge(id='s1a_iw_grdh_1sdv_20141205t060219_20141205t060244_003580_00439e_129b')
ckanapi.errors.CKANAPIError: ['IP:5000/api/action/dataset_purge', 400, u'"Bad request - Action name not known: dataset_purge"']

ckan.action.package_delete(id='s1a_iw_grdh_1sdv_20141205t060219_20141205t060244_003580_00439e_129b')
No Erros

Data exported with dump datasets should be importable with load datasets

Documentation says:

Example: dumping datasets from CKAN into a local file with 4 processes:

$ ckanapi dump datasets --all -O datasets.jsonl.gz -z -p 4 -r http://localhost

Example: load datasets from a dataset dump file with 3 processes in parallel:

$ ckanapi load datasets -I datasets.jsonl.gz -z -p 3 -c /etc/ckan/production.ini

But unfortunately what was exported with dump datasets can't be imported to an empty CKAN database with load datasets. There are some workarounds, but I think importing data that was exported using same tool should just work without any workarounds.

Trailing newline in JSONL file raises exception

If a JSONL input file has a trailing newline an exception is raised:

  File "/usr/lib/ckan/default/bin/ckanapi", line 9, in <module>
    load_entry_point('ckanapi==3.6-dev', 'console_scripts', 'ckanapi')()
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 86, in main
    return _switch_to_paster(arguments)
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 132, in _switch_to_paster
    sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/paster.py", line 29, in command
    return main.main(running_with_paster=True)
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 118, in main
    return load_things(ckan, thing[0], arguments)
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/load.py", line 47, in load_things
    return load_things_worker(ckan, thing, arguments)
  File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/load.py", line 162, in load_things_worker
    obj = json.loads(line.decode('utf-8'))
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded

However, the JSONL standard does allow a trailing newline.

Dump command hangs when using -c

I can dump ok using -r localhost:5000 but not -c ckan.ini - it just hangs after printing the first response to the console. ckanapi is waiting for output on stdout but nothing comes, because it is ending up on stderr because that is where sys.stdout is pointing:

(Pdb) print sys.stdout
<open file '', mode 'w' at 0x7f7a69dd4270>

due to this in dump.py:

    # hack so that "print debugging" can work in extension/ckan
    # code called by this worker
    sys.stdout = sys.stderr
/home/co/ckan/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py:585: SAWarning: Unicode type received non-unicodebind param value.
  processors[key](compiled_params[key])
2016-12-07 08:41:19,553 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2016-12-07 08:41:19,561 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2016-12-07 08:41:19,573 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2016-12-07 08:41:19,577 DEBUG [ckanext.harvest.model] Harvest tables already exist
/home/co/ckan/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py:585: SAWarning: Unicode type received non-unicodebind param value.
  processors[key](compiled_params[key])
2016-12-07 08:41:27,289 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2016-12-07 08:41:27,293 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2016-12-07 08:41:27,309 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2016-12-07 08:41:27,313 DEBUG [ckanext.harvest.model] Harvest tables already exist
["2016-12-07T08:41:27.913396",null,{"license_title":"UK Open Government Licence (OGL)","maintainer":null,"groups":[],"temporal_coverage-
...
e2a63c182527","resource_type":"file"}],"update_frequency":"","revision_id":"000ef760-c074-41f1-94ab-8ceb7e33f3ac","date_released":"25/2/2009","foi-phone":"","sla":"","theme-primary":"Health"}]
^CTraceback (most recent call last):
  File "/home/co/ckan/bin/ckanapi", line 11, in <module>
    load_entry_point('ckanapi', 'console_scripts', 'ckanapi')()
  File "/src/ckanapi/ckanapi/cli/main.py", line 94, in main
    return _switch_to_paster(arguments)
  File "/src/ckanapi/ckanapi/cli/main.py", line 141, in _switch_to_paster
    sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
  File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/src/ckanapi/ckanapi/cli/paster.py", line 29, in command
    return main.main(running_with_paster=True)
  File "/src/ckanapi/ckanapi/cli/main.py", line 128, in main
    return dump_things(ckan, thing[0], arguments)
  File "/src/ckanapi/ckanapi/cli/dump.py", line 39, in dump_things
    return dump_things_worker(ckan, thing, arguments)
  File "/src/ckanapi/ckanapi/cli/dump.py", line 164, in dump_things_worker
    for line in iter(stdin.readline, b''):
KeyboardInterrupt

Problems with package_create in api 2.5.2

I have a script which seeds CKAN instances with groups, organizations, datasets and resources After creating resources it downloads the resource dictionary and then uploads a file (I coudl not find a way to do it in one step). My script works perfectly when run against CKAN 2.4.0 but is failing when I run it against 2.5.2. Was there an API change between versions that effects ckanapi?

In the newer version it appears that the package is not being returned as a dictionary in this line:
dataset = ckan.action.package_create

When I attempt to look at dataset with pprint it returns -1.

Otherwise it fails with this error:
Traceback (most recent call last):
File "C:/Users/akoebrick/PycharmProjects/NG911/seed.py", line 160, in
resourceId = dataset['resources'][0]['id']
TypeError: 'int' object has no attribute 'getitem'

Here is my relevant code:

# Now add the packages
##############################
for package in datasets:
    packageTitle = package['title'] + " - " + countyPlain
    packageName = munge(package['title'])

    #Add county name to packageName to make it unique
    packageName = packageName + "-" + countyName
    #Add the datasets
    try:
        dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
    except:
        e = sys.exc_info()[0]
        print "Error: %s : " %e
        exc_type, exc_value, exc_traceback = sys.exc_info()
        lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
        print ''.join('!! ' + line for line in lines)  # Log it or whatever here


    #Update the resource so it is a file upload rather than a url.  Does not seem to be possible in initial loop
    resourceId = dataset['resources'][0]['id']
    fileName = dataset['resources'][0]['name']
    fileName = 'data/' + fileName
    resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))

How to install and use ckanapi?

I'm reading through the readme and the open and closed issues but I'm not seen what the recommended way of using this is.

I don't see this being deployed anywhere as a pip package or as a single python file I can just include in my directory.

I don't want to clone the whole project into my current project because the requirements.text and various other things will collide, and I'm not sure if I want to clone it into a subdirectory because I don't know how importing that in my python program would work...

I see the releases here on GitHub, but how do use it? do I unzip it? Where in my project folder should I put it?

pip install not installing requests

Did a "pip install ckanapi", but when I tried to import it in Python 3, got an error about "requests" module. After "pip install requests", it worked fine.
Maybe a missing dep?

syncing.SynchronizationClient is obscure

It seems like something really useful, to automatically sync datasets to a CKAN instance, but I can't figure out how it works. The documentation is very vague. It says:

data โ€“ Data to be synchronized. Should be a dict (or dict-like) with top level keys coresponding to the object type, mapping to dictionaries of {'id': object}.

But so what is <object> supposed to be like ? What dictionary should I call that method with ? An example usage would really be welcome.

Error uploading file

Hi, I was just trying ckanapi, and if I can create a dataset without problems, I get instead an error uploading a file (CKAN 2.2a). This is the script that I am using (pretty much the one in the readme):

import ckanapi

mysite = ckanapi.RemoteCKAN('http://url',
    apikey='personalapikey',
    user_agent='test')
mysite.action.resource_create(
    package_id='dataset_prova',
    description='prova', 
    upload=open('/Users/Nicola/Downloads/Contatti.csv'))

This is the error

Traceback (most recent call last):
File "upload.py", line 9, in <module>
upload=open('/Users/Nicola/Downloads/Contatti.csv'))
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 49,    in action
files=files)
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 74, in call_action
return reverse_apicontroller_action(url, status, response)
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 106, in reverse_apicontroller_action
raise CKANAPIError(url, status, response)
TypeError: __init__() takes at most 2 arguments (4 given)

Am I missing something?

dump datasets performance: use package_search for ckan >= 2.2

with package_search we can dump all datasets in far fewer API calls.

issues:

  • ckan < 2.2 returns different dataset data from package_search and package_show so we'll need to maintain the old code as well
  • we need to request the datasets ordered by id, not modification date, so that we know we have a complete dump and to replicate the current behaviour
  • ckan sites may have limited the number of packages returned from package_search in different ways, maybe detect the limit and work with what we're given, or just revert to package_show method?

UnicodeEncodeError when uploading a file

When calling resource_create and passing the name argument with a unicode string containing non-ascii characters and passing an upload argument, for example:

                    resource = ckan.action.resource_create(
                        name=u"tรซรŸt resource",
                        package_id=dataset_name,
                        url=None,
                        upload=open("test_data_file.txt"),
                    )

We get a UnicodeEncodeError:

Traceback (most recent call last):
  File "create_test_resources.py", line 130, in <module>
    main()
  File "create_test_resources.py", line 124, in main
    upload=open("test_data_file.txt"),
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/common.py", line 50, in action
    files=files)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 81, in call_action
    status, response = self._request_fn(url, data, headers, files)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 85, in _request_fn
    r = requests.post(url, data=data, headers=headers, files=files)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 87, in post
    return request('post', url, data=data, **kwargs)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/sessions.py", line 276, in request
    prep = req.prepare()
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 224, in prepare
    p.prepare_body(self.data, self.files)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 369, in prepare_body
    (body, content_type) = self._encode_files(files, data)
  File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 107, in _encode_files
    new_fields.append((field, builtin_str(val)))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)

Error on uploading file to datahub.io

I am having a problem uploading a resource file to datahub.io

import StringIO
import ckanapi
import os

CKAN_URL = 'http://datahub.io'
CKAN_DATASET_ID = 'dc-campaign-finance'
CKAN_API_KEY = os.environ['CKAN_API_KEY']

ckan = ckanapi.RemoteCKAN(
    CKAN_URL,
    apikey=CKAN_API_KEY,
)

ckan.call_action('resource_create', {'package_id': CKAN_DATASET_ID}, files={'upload': StringIO.StringIO('text')})
# ckan.action.resource_create(package_id=CKAN_DATASET_ID, upload=StringIO.StringIO('text')) gives me the same error

Gives me this exception.

---------------------------------------------------------------------------
CKANAPIError                              Traceback (most recent call last)
<ipython-input-3-18f64da4d932> in <module>()
     12 )
     13
---> 14 ckan.call_action('resource_create', {'package_id': CKAN_DATASET_ID}, files={'upload': StringIO.StringIO('text')})

/Users/saul/.virtualenvs/finance-scraper-pusher/lib/python2.7/site-packages/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
     80         else:
     81             status, response = self._request_fn(url, data, headers, files)
---> 82         return reverse_apicontroller_action(url, status, response)
     83
     84     def _request_fn(self, url, data, headers, files):

/Users/saul/.virtualenvs/finance-scraper-pusher/lib/python2.7/site-packages/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
    104
    105     # don't recognize the error
--> 106     raise CKANAPIError(repr([url, status, response]))

CKANAPIError: ['http://datahub.io/api/action/resource_create', 400, u'"Bad request - JSON Error: Error decoding JSON data. Error: JSONDecodeError(\'Expecting value: line 1 column 2 (char 1)\',) JSON data extracted from the request: \'--e9074b3226cf431b92b6f52d5aa65461;\\\\r\\\\nContent-Disposition: form-data; name=\\"package_id\\"\\\\r\\\\n\\\\r\\\\ndc-campaign-finance\\\\r\\\\n--e9074b3226cf431b92b6f52d5aa65461;\\\\r\\\\nContent-Disposition: form-data; name=\\"upload\\"; filename=\\"upload\\"\\\\r\\\\nContent-type: text/plain\\\\r\\\\n\\\\r\\\\ntext\\\\r\\\\n--e9074b3226cf431b92b6f52d5aa65461;--\'"']

user_role_update api - no error but no use

Hello,

I used user_role_update api to update a user's role to "editor" however the user still cant edit the dataset.
I ran API with success using user_role_update.

What I try to do is to grant permission for a user to only be able to edit 1 dataset within the organisation assigned. Do I undestand the purpose of this API correctly?

Thank you
Peddie

some problems on using ckanapi for large files ... any suggestion of better approach?

Ian,
I've been experimenting a bit over the past week with your excellent ckanapi. I'm hitting some obstacles in handling large files though, in terms of running out of memory on a 4GB ubuntu VM and I was hoping you might be able point me in the right direction to try to resolve. In order to debug, I've been using an ipython notebook using pandas and ckanapi, as well as a few other modules. My use case is to be able to push 500k records (22 fields) to a datastore table. This equates to about 140MB in csv form and >300MB in what I think is the equivalent of your jsonl format.

I abandoned trying to feed a file upload, since CKAN doesn't even attempt to ingest this file size. I also tried pointing at a URL to the file on S3, but again, datapusher doesn't even try to tackle this.

So I'm trying to use the datastore action API commands. At present, in order to prevent the kernel on the notebook from constantly crashing from being out of memory, I'm splitting a dataframe containing the circa 500k records into various pieces ... adding a single line in conjunction with a datastore_create and then the rest in 100k chunks using datastore ... and while this kind of works ... it still comes back with a 504 error (see bottom) and tries to emit back all the added records in the 'out' cell in the notebook (I'm not sure if there's a recommended way to suppress this).

dfprev1=dfprev[:1]
dfprev2=dfprev[2:100000]
dfprev3=dfprev[100001:200000]
dfprev4=dfprev[200001:300000]
dfprev5=dfprev[300001:400000]
dfprev6=dfprev[400001:]

This is some of the code I'm using to transform the dataframe object in pandas into something equivalent to jsonl. I referenced this SO thread: http://stackoverflow.com/questions/20639631/how-to-convert-pandas-dataframe-to-the-desired-json-format

output = StringIO.StringIO() # a stringio used to convert to jsonl
dfprev6.to_json(path_or_buf=output, date_format='iso', orient='records') #dfprev6 is a slice of the larger dataframe
contents = output.getvalue() # this is used to bring back the json
records_new = pd.json.loads(contents) # and then assign this to a records string
mysite.action.datastore_upsert(resource_id='0a8462d3-4c81-474a-bf84-3f2941ac67c0',
records=records_new,
force=True, primary_key=['ID_BB_GLOBAL'])

Presumably some sort of streaming method to feed the large dataframe in chunks and passing this to the ckanapi would work better, but I'm not sure the best approach. I was wondering if you might have some sample code that would achieve this.

In your readme, you have some command-line examples of feeding a jsonl file. It's not clear to me if I can use this for datastore data ... if so, would I include a resource_id at the beginning of the file? I couldn't find a sample jsonl file in the repo to see the structure that would include ckan meta-data. I can see from jsonl.org that the format I generate and assign to records (described above) aligns. Would I still have to find some mechanism to chunk the data to overcome the memory issues?

Thanks for your input on this. Colum

This is the 504 error I get back when trying to push 100k records in increments using datastore_upsert.

CKANAPIError Traceback (most recent call last)
in ()
1 mysite.action.datastore_upsert(resource_id='0a8462d3-4c81-474a-bf84-3f2941ac67c0',
2 records=records_new,
----> 3 force=True, primary_key=['ID_BB_GLOBAL'])

/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/common.pyc in action(**kwargs)
48 data_dict=nonfiles,
49 files=files)
---> 50 return self._ckan.call_action(name, data_dict=kwargs)
51 return action
52

/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
80 else:
81 status, response = self._request_fn(url, data, headers, files)
---> 82 return reverse_apicontroller_action(url, status, response)
83
84 def _request_fn(self, url, data, headers, files):

/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
104
105 # don't recognize the error
--> 106 raise CKANAPIError(repr([url, status, response]))

CKANAPIError: ['http://172.17.0.2/api/action/datastore_upsert', 504, u'\r\n<title>504 Gateway Time-out</title>\r\n\r\n

504 Gateway Time-out

\r\n
nginx/1.1.19\r\n\r\n\r\n']

"url" parameter is mandatory for resource_create

Hi !

In the README.md you have the following snippet of code as an example:

mysite.action.resource_create(
    package_id='my-dataset-with-files',
    upload=open('/path/to/file/to/upload.csv'))

But that call doesn't work, because it appears that the "url" parameter is mandatory.

unicode(v) doesn't work on python 3+

When iterating over data_dict if there is a file to upload, there is a call to unicode that breaks on Python 3.

            if isinstance(v, (int, float)):
                v = unicode(v)
            data_dict[k.encode('utf-8')] = v.encode('utf-8')

https://github.com/ckan/ckanapi/blob/master/ckanapi/common.py#L80-L82

adding

from builtins import str as text

and changing to

            if isinstance(v, (int, float)):
                v = text(v)
            data_dict[k.encode('utf-8')] = v.encode('utf-8')

Seems to resolve the issue. PR forthcoming.

datastore_search call fails in LocalCKAN mode

try:
local = ckanapi.LocalCKAN()
local.action.datastore_search(resource_id = '...')

same call works when RemoteCKAN() is used with a valid api key

This seems to happen when the call is executed as part of a Paste command. (example: https://github.com/deniszgonjanin/ckanext-recombinant/blob/master/ckanext/recombinant/commands.py#L31)

When running this code inside the paster shell with the full CKAN environment loaded, the same command works.

Here is a stack trace:
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/ckanapi/localckan.py", line 44, in call_action
return self._get_action(action)(dict(context), dict(data_dict))
File "/source/virtualenv/statscan2/src/ckan/ckan/logic/init.py", line 329, in wrapped
return _action(context, data_dict, **kw)
File "/source/virtualenv/statscan2/src/ckan/ckan/logic/init.py", line 386, in wrapper
return action(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/logic/action.py", line 277, in datastore_search
result = db.search(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 1124, in search
return search_data(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 930, in search_data
_insert_links(data_dict, limit, offset)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 837, in _insert_links
if not toolkit.request.environ:
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/paste/registry.py", line 137, in getattr
return getattr(self._current_obj(), attr)
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/paste/registry.py", line 197, in _current_obj
'thread' % self.__name)
TypeError: No object (name: request) has been registered for this thread

Any ideas?

load datasets: unexpected field 'id', for ckan < 2.3

If you do a wholesale ckanapi dump datasets --all -O data.jl and then ckanapi load datasets -I data.jl, the command will fail on a base CKAN instance because CKAN does not allow you to set the ID of a new dataset.

Curiously, this works just fine for resources, and groups/organizations.

Ability to handle URL to an online file location for resource_create

Hi. I haven't been able to find detailed documentation beyond the readme and some other references on the ckan-dev forum. I was playing around with this syntax, trying to get the 'open' to work against a URL to an online file. Is this possible? I see there is another StringIO method too. In what instance is that used.

mysite.call_action('resource_create',
{'package_id': 'my-dataset-with-files'},
files={'upload': open('/path/to/file/to/upload.csv')})

I was also wondering what format does the '/path/to/file' have to be in for windows. I tried various approaches .. 'C:\Users\Downloads\file.txt.zip' and 'C://Users/Downloads/file.txt.zip', but none seemed to work. Any ideas?
Thanks for your work on this API.

Pagination

I'm trying to figure out how to page through results.

Create the object:

dg = ckanapi.RemoteCKAN('http://catalog.data.gov', apikey=_API_KEY)

Execute query:

r = dg.action.package_search(q='fish')

This gives me the first ten results. However, there doesn't appear to be a way to get the next ten results. I get the exact same results whether I call the last command, or:

dg.action.package_search(q='fish', offset=20)

or

dg.action.package_search(q='fish', XYZ=20)

They all return successful.

Unspecified dependencies

Installing via pip on Python2, I ran into the following missing dependencies:

  • formencode
  • vdm
  • sqlalchemy (and when installed, failed because the version was above 0.8)

I did not have any problems after a pip install in Python3.

Best way to copy datasets to another instance?

Hello, I'm trying to copy all the dataset metadata from opendataphilly.org to a local CKAN instance I have running on a digitalocean droplet. I'm using the command from the README:

$ ckanapi dump datasets --all -q -r https://opendataphilly.org | ckanapi load datasets -c $CKAN_INI

And I'm getting the error create ValidationError {"owner_org":["Organization does not exist"]} repeatedly. I've tried creating the organization, but I assume because owner_org references the organizations ID instead of its slug/name, and the one I just created has a brand new unique ID, it's still not working.

I've tried doing a dump & load of the organizations but I get a stack trace of a python error:

Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/ckanapi", line 9, in <module>
    load_entry_point('ckanapi==3.4', 'console_scripts', 'ckanapi')()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 70, in main
    return _switch_to_paster(arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 112, in _switch_to_paster
    sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
    result = self.command()
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/paster.py", line 29, in command
    return main.main(running_with_paster=True)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 98, in main
    return load_things(ckan, thing[0], arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 41, in load_things
    return load_things_worker(ckan, thing, arguments)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 184, in load_things_worker
    r = ckan.call_action(thing_create, obj)
  File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/localckan.py", line 50, in call_action
    return self._get_action(action)(dict(context), dict(data_dict))
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 424, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 857, in group_create
    return _group_or_org_create(context, data_dict)
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 723, in _group_or_org_create
    group = model_save.group_dict_save(data, context)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/model_save.py", line 389, in group_dict_save
    group = d.table_dict_save(group_dict, Group, context)
  File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/__init__.py", line 139, in table_dict_save
    setattr(obj, key, value)
AttributeError: can't set attribute

Any idea the best way to copy these datasets over?

Also note that it doesn't have to be perfect as this is just for testing a script before using it on the production version of opendataphilly.

Command to migrate data

The migrate data command would act on all resources of a dataset, and if those resources are referencing an external file, the command would download that file and store them in CKAN instead.

Use cases:

  • When you want your data to live inside CKAN, for whatever purpose
  • When you are migrating metadata from one CKAN to another. Right now only the metadata is migrated, and you need to migrate the data via other methods

SSL error when dumping

There seem to be an error when dumping CKAN catalogues that uses SSL

/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
Traceback (most recent call last):
  File "/usr/local/bin/ckanapi", line 9, in <module>
    load_entry_point('ckanapi==3.6-dev', 'console_scripts', 'ckanapi')()
  File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/cli/main.py", line 105, in main
    return dump_things(ckan, thing[0], arguments)
  File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/cli/dump.py", line 58, in dump_things
    names = ckan.call_action(get_thing_list, {})
  File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/remoteckan.py", line 80, in call_action
    status, response = self._request_fn_get(url, data_dict, headers, requests_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/remoteckan.py", line 90, in _request_fn_get
    r = requests.get(url, params=data_dict, headers=headers, **requests_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/api.py", line 69, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/api.py", line 50, in request
    response = session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/adapters.py", line 431, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

add support for file uploads to ckans < 2.2

Example client code:
https://github.com/OCHA-DAP/dap-scrapers/blob/master/ckan/endtoend.py

"""
curl $CKAN_INSTANCE/api/storage/auth/form/$DIRECTORY/$FILENAME -H Authorization:$CKAN_APIKEY > phase1
curl $CKAN_INSTANCE/storage/upload_handle -H Authorization:$CKAN_APIKEY --form file=@$FILENAME --form "key=$DIRECTORY/$FILENAME" > phase2
curl http://ckan.megginson.com/api/3/action/resource_create --data '{"package_id":"51b25ca0-9c2e-4e66-85e3-37a13c19a85d", "url":"'$CKAN_INSTANCE'/storage/f/'$DIRECTORY'/'$FILENAME'"}' -H Authorization:$CKAN_APIKEY > phase3
"""

import sys
import os
import requests
import datetime
import lxml.html
import json

def do(f, args=[], kwargs={}):
    while True:
        try:
            x=f(*args, **kwargs)
            return x
        except Exception, e:
            print "EXCEPTION: ",e
            pass

def raise_for_status(response):
    try:
        response.raise_for_status()
    except:
        print "FAILED RFS: %r" % response.content
        raise

def get_parameters(filepath=None):
    params = {}
    params['CKAN_INSTANCE'] = os.getenv("CKAN_INSTANCE")
    params['CKAN_APIKEY'] = os.getenv("CKAN_APIKEY")
    if not params['CKAN_INSTANCE'] or not params['CKAN_APIKEY']:
        raise RuntimeError("Enviroment variables CKAN_INSTANCE / CKAN_APIKEY not set.")

    if filepath is None:
        if len(sys.argv) != 2:
            raise RuntimeError("Takes one argument: filename")
        else:
            filepath = sys.argv[1]
    params['FILEPATH'] = filepath
    params['FILENAME'] = os.path.basename(params['FILEPATH'])

    params['NOW'] = datetime.datetime.now().isoformat()
    params['DIRECTORY'] = params['NOW'].replace(":", "").replace("-", "")
    return params

def request_permission():  # phase1
    response = requests.get("{CKAN_INSTANCE}/api/storage/auth/form/{DIRECTORY}/{FILENAME}".format(**params), headers=headers)
    response.raise_for_status()
    j = response.json()
    assert "action" in j
    assert "fields" in j
    print j
    return j

def upload_file(permission):  # phase 2
    response = requests.post("{CKAN_INSTANCE}{action}".format(action=permission['action'], **params),
                             headers=headers,
                             files={'file': (params['FILENAME'], open(params['FILEPATH']))},
                             data={permission['fields'][0]['name']: permission['fields'][0]['value']}
                             )
    response.raise_for_status()
    root = lxml.html.fromstring(response.content)
    h1, = root.xpath("//h1/text()")
    assert " Successful" in h1
    url, = root.xpath("//h1/following::a[1]/@href")
    assert params['FILENAME'] in url  # might be issues with URLencoding
    return url

def create_resource(url, **kwargs):  # phase 3
    data = {"url": url,
            "package_id": "51b25ca0-9c2e-4e66-85e3-37a13c19a85d"}
    newheader = dict(headers)
    newheader['Content-Type'] = "application/x-www-form-urlencoded"  # http://trac.ckan.org/ticket/2942
    data.update(kwargs)
    print data
    response = requests.post("{CKAN_INSTANCE}/api/3/action/resource_create".format(**params),
    # response = requests.post("http://httpbin.org/post".format(**params),
                             headers=newheader,
                             data=json.dumps(data)
                             )
    #response.raise_for_status()
    try:
        assert response.json()["success"]
    except:
        print "FAILED: %r" % response.content
        raise
    print response.content


def upload(resource_info=None, filename=None):
    global params
    global headers
    params = get_parameters(filename)
    headers = {"Authorization": params['CKAN_APIKEY']}
    if resource_info is None:
        print "No resource_info specified, using defaults"
        resource_info = {
            "package_id": "51b25ca0-9c2e-4e66-85e3-37a13c19a85d",
            "revision_id": params['NOW'],
            "description": "Indicators scraped from a variety of sources by ScraperWiki",
            "format": "Zipped CSV",
            # "hash": None,
            "name": "scraped.csv.zip",
            # "resource_type": None,
            "mimetype": "application/zip",
            "mimetype_inner": "text/csv",
            # "webstore_url": None,
            # "cache_url": None,
            # "size": None,
            "created": params['NOW'],
            "last_modified": params['NOW'],
            # "cache_last_updated": None,
            # "webstore_last_updated": None,
            }
    j = do(request_permission)
    url = do(upload_file, [j])
    do(create_resource, [url], resource_info)

if __name__=="__main__": upload()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.