ckan / ckanapi Goto Github PK
View Code? Open in Web Editor NEWA command line interface and Python module for accessing the CKAN Action API
License: Other
A command line interface and Python module for accessing the CKAN Action API
License: Other
Traceback (most recent call last):
File "3500_unclassified_metadata.py", line 8, in <module>
ckan = ckanapi.LocalCKAN()
File "/home/vagrant/projects/ckan/ckanapi/ckanapi/localckan.py", line
19, in __init__
username = self.get_site_username()
File "/home/vagrant/projects/ckan/ckanapi/ckanapi/localckan.py", line
25, in get_site_username
user = self._get_action('get_site_user')({'ignore_auth': True}, ())
File "/home/vagrant/projects/ckan/ckan/ckan/logic/__init__.py", line
419, in wrapped
result = _action(context, data_dict, **kw)
File "/home/vagrant/projects/ckan/ckan/ckan/logic/action/get.py", lin
e 1962, in get_site_user
user = model.User.get(site_id)
File "/home/vagrant/projects/ckan/ckan/ckan/model/user.py", line 61,
in get
return query.first()
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2156, in first
ret = list(self[0:1])
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2023, in __getitem__
return list(res)
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2227, in __iter__
return self._execute_and_instances(context)
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2240, in _execute_and_instances
close_with_result=True)
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/query.py", line 2231, in _connection_from_session
**kw)
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/session.py", line 727, in connection
bind = self.get_bind(mapper, clause=clause, **kw)
File "/home/vagrant/.virtualenvs/ckan/local/lib/python2.7/site-packag
es/sqlalchemy/orm/session.py", line 975, in get_bind
', '.join(context)))
sqlalchemy.exc.UnboundExecutionError: Could not locate a bind configure
d on mapper Mapper|User|user, SQL expression or this Session
Same traceback on CKAN 2.2, 2.2.1 and master.
This is probably CKAN's fault not ckanapis.
So how do you pass your authentication key with ckanapi ? It seems like you just need to add it to the header as it specifies here: http://docs.ckan.org/en/latest/api/index.html#authentication-and-api-keys
Reproducing:
ckan1 = ckanapi.RemoteCKAN('http://ckan1.org', apikey=...)
ckan2 = ckanapi.RemoteCKAN('http://ckan2.org, apikey=...)
package = ckan1.action.package_show(id='package1')
ckan2.action.package_create(**package)
Expected result should be that, either:
a. The dataset will be migrated, along with it's resource files
b. The dataset metadata will be migrated, with the resource file links still pointing to CKAN1
Instead, uploaded resource files are not migrated but their url bases are changed to http://ckan2.org instead of http://ckan1.org
This happens when resource['url_type']
is set to upload. The bug can be fixed by migrating the dataset like so:
for res in package['resources']:
res.pop('url_type')
ckan2.action.package_create(**package)
I am not sure whether this is a CKAN issue, or a ckanapi issue, so I'm filing it here. The relevant code in CKAN is here.
Another thing to note, which does seem to be a CKAN bug, is that the resource is saved in the database with it's proper url - pointing to http://ckan1.org. It is only indexed in Solr with the wrong url.
I'm getting this error while trying to download the datasets from a ckan instance
Traceback (most recent call last):
File "c:\Anaconda\Scripts\ckanapi-script.py", line 9, in
load_entry_point('ckanapi==3.3', 'console_scripts', 'ckanapi')()
File "c:\Anaconda\lib\site-packages\ckanapi\cli\main.py", line 95, in main
return dump_things(ckan, thing[0], arguments)
File "c:\Anaconda\lib\site-packages\ckanapi\cli\dump.py", line 67, in dump_things
for job_ids, finished, result in pool:
File "c:\Anaconda\lib\site-packages\ckanapi\cli\workers.py", line 93, in worker_poo
readable, _, _ = select.select(worker_fds, [], [])
select.error: (10038, 'An operation was attempted on something that is not a socket')
I am trying to download all metadata from catalog.data.gov. The following command returns an API error.
ckanapi dump datasets --all -O catalog.data.gov.jsonl -p 1 -r http://catalog.data.gov
The error:
Traceback (most recent call last):
File "/home/ekzhu/anaconda/bin/ckanapi", line 11, in <module>
sys.exit(main())
File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/cli/main.py", line 128, in main
return dump_things(ckan, thing[0], arguments)
File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/cli/dump.py", line 60, in dump_things
names = ckan.call_action(get_thing_list, {})
File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 83, in call_action
return reverse_apicontroller_action(url, status, response)
File "/home/ekzhu/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 131, in reverse_apicontroller_action
raise CKANAPIError(repr([url, status, response]))
ckanapi.errors.CKANAPIError: ['http://catalog.data.gov/api/action/package_list', 302, u'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>302 Found</title>\n</head><body>\n<h1>Found</h1>\n<p>The document has moved <a href="http://catalog.data.gov/api/action/package_search">here</a>.</p>\n</body></html>\n']
It looks like the catalog.data.gov
have redirected the package_list
to package_search
. Is there a way to handle this? Am I missing something?
import StringIO
import ckanapi
import json
import os
CKAN_URL = 'http://www.opendatadc.org'
CKAN_API_KEY = os.environ['CKAN_API_KEY']
ckan = ckanapi.RemoteCKAN(
CKAN_URL,
apikey=CKAN_API_KEY,
)
payload = {'description': 'Scraped from http://geospatial.dcgis.dc.gov/ocf/getData.aspx',
'format': 'json',
'id': u'808ebae5-d23c-451f-970e-a657b3d3d540',
'mimetype': 'application/json',
'name': 'Committees Running Per Year',
'package_id': 'campaign-finance',
'upload': StringIO.StringIO('["dfd", "dfd"]'),
'url': 'http://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540'}
ckan.action.resource_update(**payload)
Gives me a weird exception
CKANAPIError Traceback (most recent call last)
<ipython-input-4-0261da11b7f6> in <module>()
19 'url': 'http://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540'}
20
---> 21 ckan.action.resource_update(**payload)
/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/common.pyc in action(**kwargs)
47 return self._ckan.call_action(name,
48 data_dict=nonfiles,
---> 49 files=files)
50 return self._ckan.call_action(name, data_dict=kwargs)
51 return action
/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
80 else:
81 status, response = self._request_fn(url, data, headers, files)
---> 82 return reverse_apicontroller_action(url, status, response)
83
84 def _request_fn(self, url, data, headers, files):
/Users/saul/.virtualenvs/tempenv-28c0218950914/lib/python2.7/site-packages/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
104
105 # don't recognize the error
--> 106 raise CKANAPIError(repr([url, status, response]))
CKANAPIError: ['http://www.opendatadc.org/api/action/resource_update', 400, u'"Bad request - JSON Error: Error decoding JSON data. Error: JSONDecodeError(\'Expecting value: line 1 column 2 (char 1)\',) JSON data extracted from the request: \'--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"mimetype\\"\\\\r\\\\n\\\\r\\\\napplication/json\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"description\\"\\\\r\\\\n\\\\r\\\\nScraped from http://geospatial.dcgis.dc.gov/ocf/getData.aspx\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"format\\"\\\\r\\\\n\\\\r\\\\njson\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"url\\"\\\\r\\\\n\\\\r\\\\nhttp://www.opendatadc.orgdatset/campaign-finance/resource/808ebae5-d23c-451f-970e-a657b3d3d540\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"package_id\\"\\\\r\\\\n\\\\r\\\\ncampaign-finance\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"id\\"\\\\r\\\\n\\\\r\\\\n808ebae5-d23c-451f-970e-a657b3d3d540\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"name\\"\\\\r\\\\n\\\\r\\\\nCommittees Running Per Year\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;\\\\r\\\\nContent-Disposition: form-data; name=\\"upload\\"; filename=\\"upload\\"\\\\r\\\\nContent-type: text/plain\\\\r\\\\n\\\\r\\\\n[\\"dfd\\", \\"dfd\\"]\\\\r\\\\n--11da1b66348f4c54b15ed923a219bd17;--\'"']
I get errors trying to upload file into CKAN using resource_create from URL with the below
from ckanapi import RemoteCKAN
ua = 'ckanapiexample/1.0 (+http://example.com/my/website)'
fileurl = "http://goog.com/go/file.csv"
mysite = RemoteCKAN('http://myckan.example.com', apikey='real-key', user_agent=ua)
mysite.action.resource_create(
package_id='my-dataset-with-files',
url='dummy-value', # ignored but required by CKAN<=2.5.x
upload=open(fileurl, 'rb'))
I'm attempting to add new resources to packages following the api instructions but I'm getting:
ckanapi.errors.CKANAPIError: ['http://xxxx/api/action/resource_create', 400, u'"Bad request - JSON Error: No request body data"']
Here's my code.
import ckanapi
mysite = ckanapi.RemoteCKAN('http://xxx',
apikey=api_key,
mysite.action.resource_create(
package_id='XX-XXX',
upload=open('/Users/s/stuff/stuff.csv'))
Any ideas?
Things like:
ckanapi action
callckanapi dump
or ckanapi load
with a local CKAN instanceall result in tracebacks, but aren't ckanapi bugs. We should catch these and display them better. Perhaps with an attractive shade of red text.
Latest version of CKAN and Pylons is installed in my virtualenv, yet I get:
import ckanapi
File "/Users/peder/source/ckanapi/ckanapi.py", line 64, in
from ckan.logic import (ParameterError, NotAuthorized, NotFound,
File "/Users/peder/Envs/ckan-datatools/lib/python2.7/site-packages/ckan-2.0b-py2.7.egg/ckan/logic/init.py", line 8, in
import ckan.lib.base as base
File "/Users/peder/Envs/ckan-datatools/lib/python2.7/site-packages/ckan-2.0b-py2.7.egg/ckan/lib/base.py", line 9, in
from pylons import c, cache, config, g, request, response, session
ImportError: cannot import name c
this makes print statements and interactive debugging of code the worker calls fail
Ian, I had previously installed version 2.0 just with 'python setup.py install'. I think you've introduce a travis build for which I have little experience. Do I need to have Travis CI installed in order to run the yml in order that the requirements.txt get installed properly. Sorry, I'm a bit green around these details .. so if you were able to spell it out in the readme, that would be great!
Ref your thread on the ckan-dev forum (https://lists.okfn.org/pipermail/ckan-dev/2014-February/006774.html), it seems like the functionality has expanded alot, which is great. I just need to figure things out for myself from the few examples you gave. In terms of the bulk load from datasets.jsonl.gz, is there any way you might be able to add a sample file to the repo so I can see what this format looks like. Is there a dump feature in the CLI that outputs datasets in this format?
I'm still not clear on whether I can use the ckanapi to modify data in the datastore? Is it just limited to operations on the meta data in the main ckan meta-data database? If not, would you have an example of how ckanapi can be used against a table for a resource, maybe using demo.ckan.org?
Thanks for your help with this. Colum
I installed the ckanapi package as follows
$ pip install ckanapi
Running the package_list
action fails as follows. Note this is a remote ckan instance that I'm accessing as a user. I don't want to install or run a ckan server locally:
$ ckanapi action package_list http://ckanhost/api
Traceback (most recent call last):
(...)
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: PasteScript
Installing PasteScript
reveals a further depencency on ckan
itself:
$ pip install PasteScript
$ ckanapi action package_list http://ckanhost/api
Traceback (most recent call last):
File ".virtualenvs/geops/local/lib/python2.7/site-packages/ckanapi/cli/paster.py", line 2, in <module>
from ckan.lib.cli import CkanCommand
ImportError: No module named ckan.lib.cli
I did continue installing ckan, which however requires to pull in more and more dependencies, eventually rending up with a pylon import error:
$ <lots of pip install's>
$ ckanapi action package_list http://ckanhost/api
from pylons import g, c, request, session, response
ImportError: cannot import name g
IMHO one should not be required to install the server (AFAIK that's what the ckan package provides, if I'm not mistaken) just to use the API client.
What am I missing?
I got this error while trying to run the following command from the readme:
ckanapi action group_list -r http://dados.gov.br
Traceback (most recent call last):
File "/usr/local/bin/ckanapi", line 11, in <module>
load_entry_point('ckanapi==3.6', 'console_scripts', 'ckanapi')()
File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 564, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2621, in load_entry_point
return ep.load()
File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2281, in load
return self.resolve()
File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 2287, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/local/lib/python3.5/dist-packages/ckanapi/cli/main.py", line 112
except CLIError, e:
^
SyntaxError: invalid syntax
the except clause has not been made compatible with python 3.x syntax
Hi,
I try to connect to https://datahub.io/ using the ckanapi Python Module. My code is this:
import ckanapi
ckan = ckanapi.RemoteCKAN('http://datahub.io/api',
apikey = 'my_api_key_here',
user_agent='ckanapiexample/1.0 (+http://example.com/my/website)')
packages = ckan.action.package_list()
print packages
I get the following error:
Traceback (most recent call last):
File "test-ckanapi.py", line 7, in
packages = ckan.action.package_list()
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\common.py", line 51, in action
return self._ckan.call_action(name, data_dict=kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\remoteckan.py", line 82, in call_action
status, response = self._request_fn(url, data, headers, files, requests_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\ckanapi-3.6_dev-py2.7.egg\ckanapi\remoteckan.py", line 86, in _request_fn
r = requests.post(url, data=data, headers=headers, files=files, *_requests_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\api.py", line 88, in post
return request('post', url, data=data, *_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\api.py", line 44, in request
return session.request(method=method, url=url, *_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 383, in request
resp = self.send(prep, *_send_kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 506, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 168, in resolve_redirects
allow_redirects=False,
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\sessions.py", line 486, in send
r = adapter.send(request, *kwargs)
File "C:\Program Files (x86)\Python27\lib\site-packages\requests-2.2.0-py2.7.egg\requests\adapters.py", line 389, in send
raise SSLError(e)
requests.exceptions.SSLError: hostname 'datahub.io' doesn't match either of '.ckan.io', 'ckan.io'
Am I doing something wrong?
I'm trying to migrate users between CKAN catalogues using the ckanapi CLI.
Is there a way to migrate users, seeing that ckanapi dump users
does not dump their password hashes? Without those, naturally ckanapi load users
will fail like this (usernames anonymised):
ckanapi dump users --all -r http://sourceckan -a my-sysadmin-api-key-source | ckanapi load users -r http://targetckan -a my-sysadmin-api-key-target
0 [1] --- None asdasdf
1 [2] 0.61s None asdfasdf
2 [3] 0.28s None asdfasdf
3 [4] 0.26s None asdfasdf
4 [5] 0.25s None asdfasdf
5 [6] 0.26s None asdasdf
6 [7] 0.25s None asdfasdf
7 [None] 0.33s None asdfasdf
1 [2] --- create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
2 [3] 0.68s update ValidationError {"__type":"Validation Error","email":["Missing value"]}
3 [4] 0.73s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
4 [5] 0.68s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
5 [6] 0.67s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
6 [7] 0.71s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
7 [8] 0.70s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
8 [None] 0.68s create ValidationError {"password":["Missing value"],"__type":"Validation Error"}
This was run on a machine separate from the two (source and target) CKAN instances, using the respective sysadmin's API key.
easy performance improvement, just need to provide a way to close the session for long-running programs and document
ckanapi
usage and error below. @wardi on IRC suggested I open this issue
Thank you in advance!
$ ckanapi dump datasets -r http://data.noaa.gov --all -O json.json
Traceback (most recent call last):
File "/usr/local/bin/ckanapi", line 9, in <module>
load_entry_point('ckanapi==3.3', 'console_scripts', 'ckanapi')()
File "/Library/Python/2.7/site-packages/ckanapi/cli/main.py", line 95, in main
return dump_things(ckan, thing[0], arguments)
File "/Library/Python/2.7/site-packages/ckanapi/cli/dump.py", line 33, in dump_things
return dump_things_worker(ckan, thing, arguments)
File "/Library/Python/2.7/site-packages/ckanapi/cli/dump.py", line 135, in dump_things_worker
'include_datasets': False})
File "/Library/Python/2.7/site-packages/ckanapi/remoteckan.py", line 82, in call_action
return reverse_apicontroller_action(url, status, response)
File "/Library/Python/2.7/site-packages/ckanapi/common.py", line 103, in reverse_apicontroller_action
raise ValidationError(err)
ckanapi.errors.ValidationError: {u'__type': u'Validation Error', u'name_or_id': u'Missing value'}
Tested environments:
ckanapi dump datasets --all -p 6 -r http://SOURCE > datasets.jsonl
This will crash with a server error and broken pipe on the harvest sources, which are specialised datasets and collide with ckanext-scheming's well-defined, but rigid idea of dataset schemas.
Workarounds:
On a side note, some old datasets, which have been edited last before we switched to ckanext-scheming, still have "extra" fields, which will crash during ckanapi load
. Had to manually remove the extra
keys from those datasets to make ckanapi load them.
I cant seem to use dataset_purge (as stated in the API) but i can delete. Am I missing something_
ckan.action.dataset_purge(id='s1a_iw_grdh_1sdv_20141205t060219_20141205t060244_003580_00439e_129b')
ckanapi.errors.CKANAPIError: ['IP:5000/api/action/dataset_purge', 400, u'"Bad request - Action name not known: dataset_purge"']
ckan.action.package_delete(id='s1a_iw_grdh_1sdv_20141205t060219_20141205t060244_003580_00439e_129b')
No Erros
Documentation says:
Example: dumping datasets from CKAN into a local file with 4 processes:
$ ckanapi dump datasets --all -O datasets.jsonl.gz -z -p 4 -r http://localhost
Example: load datasets from a dataset dump file with 3 processes in parallel:
$ ckanapi load datasets -I datasets.jsonl.gz -z -p 3 -c /etc/ckan/production.ini
But unfortunately what was exported with dump datasets
can't be imported to an empty CKAN database with load datasets
. There are some workarounds, but I think importing data that was exported using same tool should just work without any workarounds.
If a JSONL input file has a trailing newline an exception is raised:
File "/usr/lib/ckan/default/bin/ckanapi", line 9, in <module>
load_entry_point('ckanapi==3.6-dev', 'console_scripts', 'ckanapi')()
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 86, in main
return _switch_to_paster(arguments)
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 132, in _switch_to_paster
sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
result = self.command()
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/paster.py", line 29, in command
return main.main(running_with_paster=True)
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/main.py", line 118, in main
return load_things(ckan, thing[0], arguments)
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/load.py", line 47, in load_things
return load_things_worker(ckan, thing, arguments)
File "/usr/lib/ckan/default/src/ckanapi/ckanapi/cli/load.py", line 162, in load_things_worker
obj = json.loads(line.decode('utf-8'))
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 384, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
However, the JSONL standard does allow a trailing newline.
I can dump ok using -r localhost:5000
but not -c ckan.ini
- it just hangs after printing the first response to the console. ckanapi is waiting for output on stdout but nothing comes, because it is ending up on stderr because that is where sys.stdout
is pointing:
(Pdb) print sys.stdout
<open file '', mode 'w' at 0x7f7a69dd4270>
due to this in dump.py:
# hack so that "print debugging" can work in extension/ckan
# code called by this worker
sys.stdout = sys.stderr
/home/co/ckan/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py:585: SAWarning: Unicode type received non-unicodebind param value.
processors[key](compiled_params[key])
2016-12-07 08:41:19,553 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2016-12-07 08:41:19,561 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2016-12-07 08:41:19,573 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2016-12-07 08:41:19,577 DEBUG [ckanext.harvest.model] Harvest tables already exist
/home/co/ckan/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py:585: SAWarning: Unicode type received non-unicodebind param value.
processors[key](compiled_params[key])
2016-12-07 08:41:27,289 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2016-12-07 08:41:27,293 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2016-12-07 08:41:27,309 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2016-12-07 08:41:27,313 DEBUG [ckanext.harvest.model] Harvest tables already exist
["2016-12-07T08:41:27.913396",null,{"license_title":"UK Open Government Licence (OGL)","maintainer":null,"groups":[],"temporal_coverage-
...
e2a63c182527","resource_type":"file"}],"update_frequency":"","revision_id":"000ef760-c074-41f1-94ab-8ceb7e33f3ac","date_released":"25/2/2009","foi-phone":"","sla":"","theme-primary":"Health"}]
^CTraceback (most recent call last):
File "/home/co/ckan/bin/ckanapi", line 11, in <module>
load_entry_point('ckanapi', 'console_scripts', 'ckanapi')()
File "/src/ckanapi/ckanapi/cli/main.py", line 94, in main
return _switch_to_paster(arguments)
File "/src/ckanapi/ckanapi/cli/main.py", line 141, in _switch_to_paster
sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
invoke(command, command_name, options, args[1:])
File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
exit_code = runner.run(args)
File "/home/co/ckan/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
result = self.command()
File "/src/ckanapi/ckanapi/cli/paster.py", line 29, in command
return main.main(running_with_paster=True)
File "/src/ckanapi/ckanapi/cli/main.py", line 128, in main
return dump_things(ckan, thing[0], arguments)
File "/src/ckanapi/ckanapi/cli/dump.py", line 39, in dump_things
return dump_things_worker(ckan, thing, arguments)
File "/src/ckanapi/ckanapi/cli/dump.py", line 164, in dump_things_worker
for line in iter(stdin.readline, b''):
KeyboardInterrupt
I have a script which seeds CKAN instances with groups, organizations, datasets and resources After creating resources it downloads the resource dictionary and then uploads a file (I coudl not find a way to do it in one step). My script works perfectly when run against CKAN 2.4.0 but is failing when I run it against 2.5.2. Was there an API change between versions that effects ckanapi?
In the newer version it appears that the package is not being returned as a dictionary in this line:
dataset = ckan.action.package_create
When I attempt to look at dataset with pprint it returns -1.
Otherwise it fails with this error:
Traceback (most recent call last):
File "C:/Users/akoebrick/PycharmProjects/NG911/seed.py", line 160, in
resourceId = dataset['resources'][0]['id']
TypeError: 'int' object has no attribute 'getitem'
Here is my relevant code:
# Now add the packages
##############################
for package in datasets:
packageTitle = package['title'] + " - " + countyPlain
packageName = munge(package['title'])
#Add county name to packageName to make it unique
packageName = packageName + "-" + countyName
#Add the datasets
try:
dataset = ckan.action.package_create(name=packageName, title=packageTitle, notes=package['notes'], groups=package['group'],resources=package['resources'], owner_org=orgId)
except:
e = sys.exc_info()[0]
print "Error: %s : " %e
exc_type, exc_value, exc_traceback = sys.exc_info()
lines = traceback.format_exception(exc_type, exc_value, exc_traceback)
print ''.join('!! ' + line for line in lines) # Log it or whatever here
#Update the resource so it is a file upload rather than a url. Does not seem to be possible in initial loop
resourceId = dataset['resources'][0]['id']
fileName = dataset['resources'][0]['name']
fileName = 'data/' + fileName
resourceUpdate = ckan.action.resource_update(id=resourceId, upload=open(fileName, "rb"))
The problem is that url
is required on resource_created
. I feel that's a problem with CKAN itself, so I created the issue ckan/ckan#2769.
I'm reading through the readme and the open and closed issues but I'm not seen what the recommended way of using this is.
I don't see this being deployed anywhere as a pip package or as a single python file I can just include in my directory.
I don't want to clone the whole project into my current project because the requirements.text and various other things will collide, and I'm not sure if I want to clone it into a subdirectory because I don't know how importing that in my python program would work...
I see the releases here on GitHub, but how do use it? do I unzip it? Where in my project folder should I put it?
Did a "pip install ckanapi", but when I tried to import it in Python 3, got an error about "requests" module. After "pip install requests", it worked fine.
Maybe a missing dep?
It seems like something really useful, to automatically sync datasets to a CKAN instance, but I can't figure out how it works. The documentation is very vague. It says:
data โ Data to be synchronized. Should be a dict (or dict-like) with top level keys coresponding to the object type, mapping to dictionaries of {'id': object}.
But so what is <object>
supposed to be like ? What dictionary should I call that method with ? An example usage would really be welcome.
Hi, I was just trying ckanapi, and if I can create a dataset without problems, I get instead an error uploading a file (CKAN 2.2a). This is the script that I am using (pretty much the one in the readme):
import ckanapi
mysite = ckanapi.RemoteCKAN('http://url',
apikey='personalapikey',
user_agent='test')
mysite.action.resource_create(
package_id='dataset_prova',
description='prova',
upload=open('/Users/Nicola/Downloads/Contatti.csv'))
This is the error
Traceback (most recent call last):
File "upload.py", line 9, in <module>
upload=open('/Users/Nicola/Downloads/Contatti.csv'))
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 49, in action
files=files)
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 74, in call_action
return reverse_apicontroller_action(url, status, response)
File "/Users/Nicola/anaconda/lib/python2.7/site-packages/ckanapi/common.py", line 106, in reverse_apicontroller_action
raise CKANAPIError(url, status, response)
TypeError: __init__() takes at most 2 arguments (4 given)
Am I missing something?
with package_search we can dump all datasets in far fewer API calls.
issues:
When calling resource_create and passing the name
argument with a unicode string containing non-ascii characters and passing an upload
argument, for example:
resource = ckan.action.resource_create(
name=u"tรซรt resource",
package_id=dataset_name,
url=None,
upload=open("test_data_file.txt"),
)
We get a UnicodeEncodeError:
Traceback (most recent call last):
File "create_test_resources.py", line 130, in <module>
main()
File "create_test_resources.py", line 124, in main
upload=open("test_data_file.txt"),
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/common.py", line 50, in action
files=files)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 81, in call_action
status, response = self._request_fn(url, data, headers, files)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/ckanapi/remoteckan.py", line 85, in _request_fn
r = requests.post(url, data=data, headers=headers, files=files)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 87, in post
return request('post', url, data=data, **kwargs)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/sessions.py", line 276, in request
prep = req.prepare()
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 224, in prepare
p.prepare_body(self.data, self.files)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 369, in prepare_body
(body, content_type) = self._encode_files(files, data)
File "/home/seanh/.virtualenvs/ckan/local/lib/python2.7/site-packages/requests/models.py", line 107, in _encode_files
new_fields.append((field, builtin_str(val)))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128)
I am having a problem uploading a resource file to datahub.io
import StringIO
import ckanapi
import os
CKAN_URL = 'http://datahub.io'
CKAN_DATASET_ID = 'dc-campaign-finance'
CKAN_API_KEY = os.environ['CKAN_API_KEY']
ckan = ckanapi.RemoteCKAN(
CKAN_URL,
apikey=CKAN_API_KEY,
)
ckan.call_action('resource_create', {'package_id': CKAN_DATASET_ID}, files={'upload': StringIO.StringIO('text')})
# ckan.action.resource_create(package_id=CKAN_DATASET_ID, upload=StringIO.StringIO('text')) gives me the same error
Gives me this exception.
---------------------------------------------------------------------------
CKANAPIError Traceback (most recent call last)
<ipython-input-3-18f64da4d932> in <module>()
12 )
13
---> 14 ckan.call_action('resource_create', {'package_id': CKAN_DATASET_ID}, files={'upload': StringIO.StringIO('text')})
/Users/saul/.virtualenvs/finance-scraper-pusher/lib/python2.7/site-packages/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
80 else:
81 status, response = self._request_fn(url, data, headers, files)
---> 82 return reverse_apicontroller_action(url, status, response)
83
84 def _request_fn(self, url, data, headers, files):
/Users/saul/.virtualenvs/finance-scraper-pusher/lib/python2.7/site-packages/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
104
105 # don't recognize the error
--> 106 raise CKANAPIError(repr([url, status, response]))
CKANAPIError: ['http://datahub.io/api/action/resource_create', 400, u'"Bad request - JSON Error: Error decoding JSON data. Error: JSONDecodeError(\'Expecting value: line 1 column 2 (char 1)\',) JSON data extracted from the request: \'--e9074b3226cf431b92b6f52d5aa65461;\\\\r\\\\nContent-Disposition: form-data; name=\\"package_id\\"\\\\r\\\\n\\\\r\\\\ndc-campaign-finance\\\\r\\\\n--e9074b3226cf431b92b6f52d5aa65461;\\\\r\\\\nContent-Disposition: form-data; name=\\"upload\\"; filename=\\"upload\\"\\\\r\\\\nContent-type: text/plain\\\\r\\\\n\\\\r\\\\ntext\\\\r\\\\n--e9074b3226cf431b92b6f52d5aa65461;--\'"']
Hello,
I used user_role_update api to update a user's role to "editor" however the user still cant edit the dataset.
I ran API with success using user_role_update.
What I try to do is to grant permission for a user to only be able to edit 1 dataset within the organisation assigned. Do I undestand the purpose of this API correctly?
Thank you
Peddie
Ian,
I've been experimenting a bit over the past week with your excellent ckanapi. I'm hitting some obstacles in handling large files though, in terms of running out of memory on a 4GB ubuntu VM and I was hoping you might be able point me in the right direction to try to resolve. In order to debug, I've been using an ipython notebook using pandas and ckanapi, as well as a few other modules. My use case is to be able to push 500k records (22 fields) to a datastore table. This equates to about 140MB in csv form and >300MB in what I think is the equivalent of your jsonl format.
I abandoned trying to feed a file upload, since CKAN doesn't even attempt to ingest this file size. I also tried pointing at a URL to the file on S3, but again, datapusher doesn't even try to tackle this.
So I'm trying to use the datastore action API commands. At present, in order to prevent the kernel on the notebook from constantly crashing from being out of memory, I'm splitting a dataframe containing the circa 500k records into various pieces ... adding a single line in conjunction with a datastore_create and then the rest in 100k chunks using datastore ... and while this kind of works ... it still comes back with a 504 error (see bottom) and tries to emit back all the added records in the 'out' cell in the notebook (I'm not sure if there's a recommended way to suppress this).
dfprev1=dfprev[:1]
dfprev2=dfprev[2:100000]
dfprev3=dfprev[100001:200000]
dfprev4=dfprev[200001:300000]
dfprev5=dfprev[300001:400000]
dfprev6=dfprev[400001:]
This is some of the code I'm using to transform the dataframe object in pandas into something equivalent to jsonl. I referenced this SO thread: http://stackoverflow.com/questions/20639631/how-to-convert-pandas-dataframe-to-the-desired-json-format
output = StringIO.StringIO() # a stringio used to convert to jsonl
dfprev6.to_json(path_or_buf=output, date_format='iso', orient='records') #dfprev6 is a slice of the larger dataframe
contents = output.getvalue() # this is used to bring back the json
records_new = pd.json.loads(contents) # and then assign this to a records string
mysite.action.datastore_upsert(resource_id='0a8462d3-4c81-474a-bf84-3f2941ac67c0',
records=records_new,
force=True, primary_key=['ID_BB_GLOBAL'])
Presumably some sort of streaming method to feed the large dataframe in chunks and passing this to the ckanapi would work better, but I'm not sure the best approach. I was wondering if you might have some sample code that would achieve this.
In your readme, you have some command-line examples of feeding a jsonl file. It's not clear to me if I can use this for datastore data ... if so, would I include a resource_id at the beginning of the file? I couldn't find a sample jsonl file in the repo to see the structure that would include ckan meta-data. I can see from jsonl.org that the format I generate and assign to records (described above) aligns. Would I still have to find some mechanism to chunk the data to overcome the memory issues?
Thanks for your input on this. Colum
CKANAPIError Traceback (most recent call last)
in ()
1 mysite.action.datastore_upsert(resource_id='0a8462d3-4c81-474a-bf84-3f2941ac67c0',
2 records=records_new,
----> 3 force=True, primary_key=['ID_BB_GLOBAL'])
/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/common.pyc in action(**kwargs)
48 data_dict=nonfiles,
49 files=files)
---> 50 return self._ckan.call_action(name, data_dict=kwargs)
51 return action
52
/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/remoteckan.pyc in call_action(self, action, data_dict, context, apikey, files)
80 else:
81 status, response = self._request_fn(url, data, headers, files)
---> 82 return reverse_apicontroller_action(url, status, response)
83
84 def _request_fn(self, url, data, headers, files):
/usr/local/lib/python2.7/dist-packages/ckanapi-3.3_dev-py2.7.egg/ckanapi/common.pyc in reverse_apicontroller_action(url, status, response)
104
105 # don't recognize the error
--> 106 raise CKANAPIError(repr([url, status, response]))
CKANAPIError: ['http://172.17.0.2/api/action/datastore_upsert', 504, u'\r\n<title>504 Gateway Time-out</title>\r\n\r\n
@mattleduc do you have time to do this, or would you like me to add some unit tests?
Hi !
In the README.md you have the following snippet of code as an example:
mysite.action.resource_create(
package_id='my-dataset-with-files',
upload=open('/path/to/file/to/upload.csv'))
But that call doesn't work, because it appears that the "url" parameter is mandatory.
Allow to specify tuple ('filename.xls', open('filename.xls'))
for 'upload' or any other field.
Right now using upload: open('filename.xls')
ckan won't receive the filename which will result in dummy 'upload' file, ie. http://localhost:5000/dataset/1b469259-02e2-43fa-864a-bf018eeef059/resource/799fb449-6249-4ea6-8ddd-24550b5b8dba/download/upload
When iterating over data_dict
if there is a file to upload, there is a call to unicode
that breaks on Python 3.
if isinstance(v, (int, float)):
v = unicode(v)
data_dict[k.encode('utf-8')] = v.encode('utf-8')
https://github.com/ckan/ckanapi/blob/master/ckanapi/common.py#L80-L82
adding
from builtins import str as text
and changing to
if isinstance(v, (int, float)):
v = text(v)
data_dict[k.encode('utf-8')] = v.encode('utf-8')
Seems to resolve the issue. PR forthcoming.
try:
local = ckanapi.LocalCKAN()
local.action.datastore_search(resource_id = '...')
same call works when RemoteCKAN() is used with a valid api key
This seems to happen when the call is executed as part of a Paste command. (example: https://github.com/deniszgonjanin/ckanext-recombinant/blob/master/ckanext/recombinant/commands.py#L31)
When running this code inside the paster shell with the full CKAN environment loaded, the same command works.
Here is a stack trace:
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/ckanapi/localckan.py", line 44, in call_action
return self._get_action(action)(dict(context), dict(data_dict))
File "/source/virtualenv/statscan2/src/ckan/ckan/logic/init.py", line 329, in wrapped
return _action(context, data_dict, **kw)
File "/source/virtualenv/statscan2/src/ckan/ckan/logic/init.py", line 386, in wrapper
return action(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/logic/action.py", line 277, in datastore_search
result = db.search(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 1124, in search
return search_data(context, data_dict)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 930, in search_data
_insert_links(data_dict, limit, offset)
File "/source/virtualenv/statscan2/src/ckan/ckanext/datastore/db.py", line 837, in _insert_links
if not toolkit.request.environ:
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/paste/registry.py", line 137, in getattr
return getattr(self._current_obj(), attr)
File "/source/virtualenv/statscan2/lib/python2.6/site-packages/paste/registry.py", line 197, in _current_obj
'thread' % self.__name)
TypeError: No object (name: request) has been registered for this thread
Any ideas?
If you do a wholesale ckanapi dump datasets --all -O data.jl
and then ckanapi load datasets -I data.jl
, the command will fail on a base CKAN instance because CKAN does not allow you to set the ID of a new dataset.
Curiously, this works just fine for resources, and groups/organizations.
Hi. I haven't been able to find detailed documentation beyond the readme and some other references on the ckan-dev forum. I was playing around with this syntax, trying to get the 'open' to work against a URL to an online file. Is this possible? I see there is another StringIO method too. In what instance is that used.
mysite.call_action('resource_create',
{'package_id': 'my-dataset-with-files'},
files={'upload': open('/path/to/file/to/upload.csv')})
I was also wondering what format does the '/path/to/file' have to be in for windows. I tried various approaches .. 'C:\Users\Downloads\file.txt.zip' and 'C://Users/Downloads/file.txt.zip', but none seemed to work. Any ideas?
Thanks for your work on this API.
I'm trying to figure out how to page through results.
Create the object:
dg = ckanapi.RemoteCKAN('http://catalog.data.gov', apikey=_API_KEY)
Execute query:
r = dg.action.package_search(q='fish')
This gives me the first ten results. However, there doesn't appear to be a way to get the next ten results. I get the exact same results whether I call the last command, or:
dg.action.package_search(q='fish', offset=20)
or
dg.action.package_search(q='fish', XYZ=20)
They all return successful.
Installing via pip on Python2, I ran into the following missing dependencies:
I did not have any problems after a pip install in Python3.
Hello, I'm trying to copy all the dataset metadata from opendataphilly.org to a local CKAN instance I have running on a digitalocean droplet. I'm using the command from the README:
$ ckanapi dump datasets --all -q -r https://opendataphilly.org | ckanapi load datasets -c $CKAN_INI
And I'm getting the error create ValidationError {"owner_org":["Organization does not exist"]}
repeatedly. I've tried creating the organization, but I assume because owner_org
references the organizations ID instead of its slug/name, and the one I just created has a brand new unique ID, it's still not working.
I've tried doing a dump & load of the organizations but I get a stack trace of a python error:
Traceback (most recent call last):
File "/usr/lib/ckan/default/bin/ckanapi", line 9, in <module>
load_entry_point('ckanapi==3.4', 'console_scripts', 'ckanapi')()
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 70, in main
return _switch_to_paster(arguments)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 112, in _switch_to_paster
sys.exit(load_entry_point('PasteScript', 'console_scripts', 'paster')())
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 104, in run
invoke(command, command_name, options, args[1:])
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 143, in invoke
exit_code = runner.run(args)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/paste/script/command.py", line 238, in run
result = self.command()
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/paster.py", line 29, in command
return main.main(running_with_paster=True)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/main.py", line 98, in main
return load_things(ckan, thing[0], arguments)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 41, in load_things
return load_things_worker(ckan, thing, arguments)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/cli/load.py", line 184, in load_things_worker
r = ckan.call_action(thing_create, obj)
File "/usr/lib/ckan/default/local/lib/python2.7/site-packages/ckanapi/localckan.py", line 50, in call_action
return self._get_action(action)(dict(context), dict(data_dict))
File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 424, in wrapped
result = _action(context, data_dict, **kw)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 857, in group_create
return _group_or_org_create(context, data_dict)
File "/usr/lib/ckan/default/src/ckan/ckan/logic/action/create.py", line 723, in _group_or_org_create
group = model_save.group_dict_save(data, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/model_save.py", line 389, in group_dict_save
group = d.table_dict_save(group_dict, Group, context)
File "/usr/lib/ckan/default/src/ckan/ckan/lib/dictization/__init__.py", line 139, in table_dict_save
setattr(obj, key, value)
AttributeError: can't set attribute
Any idea the best way to copy these datasets over?
Also note that it doesn't have to be perfect as this is just for testing a script before using it on the production version of opendataphilly.
The migrate data command would act on all resources of a dataset, and if those resources are referencing an external file, the command would download that file and store them in CKAN instead.
Use cases:
There seem to be an error when dumping CKAN catalogues that uses SSL
/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/packages/urllib3/util/ssl_.py:90: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
Traceback (most recent call last):
File "/usr/local/bin/ckanapi", line 9, in <module>
load_entry_point('ckanapi==3.6-dev', 'console_scripts', 'ckanapi')()
File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/cli/main.py", line 105, in main
return dump_things(ckan, thing[0], arguments)
File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/cli/dump.py", line 58, in dump_things
names = ckan.call_action(get_thing_list, {})
File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/remoteckan.py", line 80, in call_action
status, response = self._request_fn_get(url, data_dict, headers, requests_kwargs)
File "/usr/local/lib/python2.7/dist-packages/ckanapi-3.6_dev-py2.7.egg/ckanapi/remoteckan.py", line 90, in _request_fn_get
r = requests.get(url, params=data_dict, headers=headers, **requests_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/api.py", line 69, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/api.py", line 50, in request
response = session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/sessions.py", line 465, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/sessions.py", line 573, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests-2.7.0-py2.7.egg/requests/adapters.py", line 431, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Example client code:
https://github.com/OCHA-DAP/dap-scrapers/blob/master/ckan/endtoend.py
"""
curl $CKAN_INSTANCE/api/storage/auth/form/$DIRECTORY/$FILENAME -H Authorization:$CKAN_APIKEY > phase1
curl $CKAN_INSTANCE/storage/upload_handle -H Authorization:$CKAN_APIKEY --form file=@$FILENAME --form "key=$DIRECTORY/$FILENAME" > phase2
curl http://ckan.megginson.com/api/3/action/resource_create --data '{"package_id":"51b25ca0-9c2e-4e66-85e3-37a13c19a85d", "url":"'$CKAN_INSTANCE'/storage/f/'$DIRECTORY'/'$FILENAME'"}' -H Authorization:$CKAN_APIKEY > phase3
"""
import sys
import os
import requests
import datetime
import lxml.html
import json
def do(f, args=[], kwargs={}):
while True:
try:
x=f(*args, **kwargs)
return x
except Exception, e:
print "EXCEPTION: ",e
pass
def raise_for_status(response):
try:
response.raise_for_status()
except:
print "FAILED RFS: %r" % response.content
raise
def get_parameters(filepath=None):
params = {}
params['CKAN_INSTANCE'] = os.getenv("CKAN_INSTANCE")
params['CKAN_APIKEY'] = os.getenv("CKAN_APIKEY")
if not params['CKAN_INSTANCE'] or not params['CKAN_APIKEY']:
raise RuntimeError("Enviroment variables CKAN_INSTANCE / CKAN_APIKEY not set.")
if filepath is None:
if len(sys.argv) != 2:
raise RuntimeError("Takes one argument: filename")
else:
filepath = sys.argv[1]
params['FILEPATH'] = filepath
params['FILENAME'] = os.path.basename(params['FILEPATH'])
params['NOW'] = datetime.datetime.now().isoformat()
params['DIRECTORY'] = params['NOW'].replace(":", "").replace("-", "")
return params
def request_permission(): # phase1
response = requests.get("{CKAN_INSTANCE}/api/storage/auth/form/{DIRECTORY}/{FILENAME}".format(**params), headers=headers)
response.raise_for_status()
j = response.json()
assert "action" in j
assert "fields" in j
print j
return j
def upload_file(permission): # phase 2
response = requests.post("{CKAN_INSTANCE}{action}".format(action=permission['action'], **params),
headers=headers,
files={'file': (params['FILENAME'], open(params['FILEPATH']))},
data={permission['fields'][0]['name']: permission['fields'][0]['value']}
)
response.raise_for_status()
root = lxml.html.fromstring(response.content)
h1, = root.xpath("//h1/text()")
assert " Successful" in h1
url, = root.xpath("//h1/following::a[1]/@href")
assert params['FILENAME'] in url # might be issues with URLencoding
return url
def create_resource(url, **kwargs): # phase 3
data = {"url": url,
"package_id": "51b25ca0-9c2e-4e66-85e3-37a13c19a85d"}
newheader = dict(headers)
newheader['Content-Type'] = "application/x-www-form-urlencoded" # http://trac.ckan.org/ticket/2942
data.update(kwargs)
print data
response = requests.post("{CKAN_INSTANCE}/api/3/action/resource_create".format(**params),
# response = requests.post("http://httpbin.org/post".format(**params),
headers=newheader,
data=json.dumps(data)
)
#response.raise_for_status()
try:
assert response.json()["success"]
except:
print "FAILED: %r" % response.content
raise
print response.content
def upload(resource_info=None, filename=None):
global params
global headers
params = get_parameters(filename)
headers = {"Authorization": params['CKAN_APIKEY']}
if resource_info is None:
print "No resource_info specified, using defaults"
resource_info = {
"package_id": "51b25ca0-9c2e-4e66-85e3-37a13c19a85d",
"revision_id": params['NOW'],
"description": "Indicators scraped from a variety of sources by ScraperWiki",
"format": "Zipped CSV",
# "hash": None,
"name": "scraped.csv.zip",
# "resource_type": None,
"mimetype": "application/zip",
"mimetype_inner": "text/csv",
# "webstore_url": None,
# "cache_url": None,
# "size": None,
"created": params['NOW'],
"last_modified": params['NOW'],
# "cache_last_updated": None,
# "webstore_last_updated": None,
}
j = do(request_permission)
url = do(upload_file, [j])
do(create_resource, [url], resource_info)
if __name__=="__main__": upload()
Our staging server requires basic authentication. Is this possible with ckanapi?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.