gdcc / pydataverse Goto Github PK

View Code? Open in Web Editor NEW

64.0 13.0 45.0 953 KB

Python module for Dataverse Software (dataverse.org).

Home Page: http://pydataverse.readthedocs.io/

License: MIT License

Python 98.54% HTML 0.71% Shell 0.75%

dataverse api python

pydataverse's Introduction

pyDataverse

pyDataverse is a Python module for Dataverse. It helps to access the Dataverse API's and manipulate, validate, import and export all Dataverse data-types (Dataverse, Dataset, Datafile).

Find out more: Read the Docs

Running tests

In order to run the tests, you need to have a Dataverse instance running. We have prepared a shell script that will start a Dataverse instance using Docker that runs all tests in a clean environment. To run the tests, execute the following command:

# Defaults to Python 3.11
./run_tests.sh

# To run the tests with a specific Python version
./run_tests.sh -p 3.8

Once finished, you can find the test results in the dv/unit-tests.log file and in the terminal.

Manual setup

If you want to run single tests you need to manually set up the environment and set up the necessary environment variables. Please follow the instructions below.

1. Start the Dataverse instance

docker compose \
    -f ./docker/docker-compose-base.yml \
    --env-file local-test.env \
    up -d

2. Set up the environment variables

export BASE_URL=http://localhost:8080
export DV_VERSION=6.2 # or any other version
export $(grep "API_TOKEN" "dv/bootstrap.exposed.env")
export API_TOKEN_SUPERUSER=$API_TOKEN

3. Run the test(s) with pytest

python -m pytest -v

Chat with us!

If you are interested in the development of pyDataverse, we invite you to join us for a chat on our Zulip Channel. This is the perfect place to discuss and exchange ideas about the development of pyDataverse. Whether you need help or have ideas to share, feel free to join us!

PyDataverse Working Group

We have formed a pyDataverse working group to exchange ideas and collaborate on pyDataverse. There is a bi-weekly meeting planned for this purpose, and you are welcome to join us by clicking the following WebEx meeting link. For a list of all the scheduled dates, please refer to the Dataverse Community calendar.

pydataverse's People

Contributors

Stargazers

Watchers

pydataverse's Issues

Support jsonData when adding a file to a dataset (api.upload_file)

The example at http://guides.dataverse.org/en/4.15/api/native-api.html#add-a-file-to-a-dataset illustrates that jsonData can be uploaded when adding a file:

curl -H "X-Dataverse-key:$API_TOKEN" -X POST -F '[email protected]' -F 'jsonData={"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"true"}' "https://example.dataverse.edu/api/datasets/:persistentId/add?persistentId=$PERSISTENT_ID"

The jsonData object allows the user to add additional metadata about the file:

description
file hierarchy ("directoryLabel")
tags ("categories")

It would be great if api.upload_file supported this jsonData object.

Missing Dataverse upload schema

Expected Behavior

It should be possible to create an Dataverse object from a JSON file, such as this example file.

Current Behavior

The following error is thrown:

FileNotFoundError: [Errno 2] No such file or directory: 'schemas/json/dataverse_upload_schema.json'

Possible Solution

Include Dataverse schema

Steps to Reproduce

napi = NativeApi(base_url=url, api_token=token)
dv = Dataverse()

with open('dataverse.json') as dataverse:
    data = json.load(dataverse)
    dv.set(data)
    try: 
        napi.create_dataverse(identifier=dv.alias, metadata=dv.to_json())
        
        # Uploading the file directly works, though
        #napi.create_dataverse(identifier=dv.alias, metadata=data)
    except OperationFailedError:
        print("Dataverse already created")

Context (Environment

Branch - develop
Commit hash - 3b040ff
Environment - macOS using Pipenv

Datafile metadata title not working

The metadata attribute title for a Datafile is not working.

Dataset publishing error

I'm using pyDataverse 0.2.1 and I can't publish a dataset. I'm getting the following error:

Traceback (most recent call last):
  File "create_and_publish_dataset.py", line 15, in <module>
    resp = api.publish_dataset(dataset_pid, type='major')
  File "/home/pdurbin/envs/dataverse-sample-data/lib/python3.6/site-packages/pyDataverse/api.py", line 727, in publish_dataset
    query_str += '?persistentId={0}&type={1}'.format(identifier, type)
NameError: name 'identifier' is not defined

Something like this should fix it:

dhcp-10-250-190-90:pyDataverse pdurbin$ git diff src/pyDataverse/api.py
diff --git a/src/pyDataverse/api.py b/src/pyDataverse/api.py
index 2bebc05..972e427 100644
--- a/src/pyDataverse/api.py
+++ b/src/pyDataverse/api.py
@@ -673,7 +673,7 @@ class Api(object):
             print('Dataset {} created.'.format(identifier))
         return resp
 
-    def publish_dataset(self, pid, type='minor', auth=True):
+    def publish_dataset(self, identifier, type='minor', auth=True):
         """Publish dataset.
 
         Publishes the dataset whose id is passed. If this is the first version
@@ -705,7 +705,7 @@ class Api(object):
 
         Parameters
         ----------
-        pid : string
+        identifier : string
             Persistent identifier of the dataset (e.g.
             ``doi:10.11587/8H3N93``).
         type : string
dhcp-10-250-190-90:pyDataverse pdurbin$

Here's the code I'm using to exercise the bug:

from pyDataverse.api import Api
import json
import dvconfig
base_url = dvconfig.base_url
api_token = dvconfig.api_token
api = Api(base_url, api_token)
print(api.status)
dataset_json = 'data/dataverses/open-source-at-harvard/datasets/open-source-at-harvard/open-source-at-harvard.json'
with open(dataset_json) as f:
    metadata = json.load(f)
dataverse = ':root'
resp = api.create_dataset(dataverse, json.dumps(metadata))
print(resp.json())
dataset_pid = resp.json()['data']['persistentId']
resp = api.publish_dataset(dataset_pid, type='major')
print(resp.json())

The "dvconfig" stuff comes from https://github.com/IQSS/dataverse-sample-data

Add Dataset draft downloading

Allow downloading of unpublished draft dataset and its data files using the API token and its access credentials.

Example of Dataset.export_metadata() passes wrong dsDescription

I'm using pyDataverse 0.2.1 and trying to get ds.export_metadata working based on the example at https://pydataverse.readthedocs.io/en/v0.2.1/developer.html#pyDataverse.models.Dataset.export_metadata

Here's my code:

from pyDataverse.models import Dataset
ds = Dataset()
data = {
    'title': 'pyDataverse study 2019',
    'dsDescription': 'New study about pyDataverse usage in 2019',
    'author': [{'authorName': 'LastAuthor1, FirstAuthor1'}],
    'datasetContact': [{'datasetContactName': 'LastContact1, FirstContact1'}],
    'subject': ['Engineering'],
}
ds.set(data)
ds.export_metadata('export_dataset.json')

Here's the error I'm getting:

Traceback (most recent call last):
  File "exportds3.py", line 11, in <module>
    ds.export_metadata('export_dataset.json')
  File "/home/pdurbin/envs/dataverse-sample-data/lib/python3.6/site-packages/pyDataverse/models.py", line 1175, in export_metadata
    return write_file_json(filename, self.dict())
  File "/home/pdurbin/envs/dataverse-sample-data/lib/python3.6/site-packages/pyDataverse/models.py", line 945, in dict
    'value': self.__generate_dicts(key, val)
  File "/home/pdurbin/envs/dataverse-sample-data/lib/python3.6/site-packages/pyDataverse/models.py", line 1092, in __generate_dicts
    for k, v in d.items():
AttributeError: 'str' object has no attribute 'items'

What am I doing wrong? Thanks.

Converter option: transform the native API response to a more user-friendly format

reference: IQSS/dataverse#3068

The native API can be a bit verbose for non-expert users. Include an option to transform the Dataverse native API response to a more usable format.

To illustrate here is the metadata information for a dataset title and author:
(sample code to transform the output below: https://github.com/IQSS/json-schema-test/blob/master/filemetadata/api_test/metadata_transformer.py)

User-friendly output that a user may expect

{
    "citation": {
        "title": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976", 
        "author": [
            {
                "authorName": "State Center for Health Statistics"
            }
        ],

Current API response

"metadataBlocks": {
            "citation": {
                "displayName": "Citation Metadata", 
                "fields": [
                    {
                        "typeName": "title", 
                        "multiple": false, 
                        "typeClass": "primitive", 
                        "value": "North Carolina Vital Statistics -- Birth/Infant Deaths 1976"
                    }, 
                    {
                        "typeName": "author", 
                        "multiple": true, 
                        "typeClass": "compound", 
                        "value": [
                            {
                                "authorName": {
                                    "typeName": "authorName", 
                                    "multiple": false, 
                                    "typeClass": "primitive", 
                                    "value": "State Center for Health Statistics"
                                }
                            }
                        ]
                    },

cc/ @pdurbin

format is not passed through from export_metadata() to dict()

In the function Dataset.export_metadata(), there must be the format passed to self.dict() -> self.dict(format=format).

Check out also for Dataverse and Datafile, if the same problem appears.

error while import pyDataverse.api

hello,
when i try to import API using this line of code
from pyDataverse.api import Api
there is some error occurred

Traceback (most recent call last): File "C:/Users/MIsawe/PycharmProjects/untitled/main.py", line 1, in <module> from pyDataverse.api import Api File "C:\Users\MIsawe\PycharmProjects\untitled\venv\lib\site-packages\pyDataverse\__init__.py", line 6, in <module> from requests.packages import urllib3 ModuleNotFoundError: No module named 'requests'

Encoding issue using create_dataset()

Using a Python script with create_dataset() I created a a new dataset on demo.dataverse.org (and one more dataverse server).

api = Api(base_url = dvserver, api_token = dvtoken)
api.create_dataset("1", dsmd)

Where dsmd is the content of dataset-finch1.json as a string (and slightly modified version of it for my last test) linked in the documentation.

dsmd = """{
  "datasetVersion": {
    "metadataBlocks": {
      "citation": {
        "fields": [
          {
            "value": "Dörwin's Fænches",
     .
     .
     .
"""

Everything seems to work fine but non ascii characters are not displayed (replace with �) when I open the dataverse through the browser nor when download it back with get_dataset().

I'm on Windows 10 with Python 3.6.4 and pyDataverse 0.2.1. I tried to run it as a script from the command line and in Spyder with the same result.

Parameterize get_datafile

As a developer I would like to have more control of the keyword arguments get_datafile uses in requests to download files. I have had issues downloading large files in the past and had to implement a custom get_request method to allow access to the stream parameter of requests.get.

Add new Dataverse API endpoint to retrieve user by token

Update pyDataverse to use the endpoint described in the 5.2 release, in this PR: IQSS/dataverse#7345

Create test for dictionary creation of a Dataverse

Write a test, which checks the content of Dataverse.dict(), if expected structure and values are inside.

Dataset.dict() exports empty arrays

the dict() function outputs empty arrays, but they should not. Check, if this is also the case for Dataverses and Datasets.

Check all API requests of Datasets for identifier compatibility

Check which API endpoints are compatible with using as identifier the Dataverse database ID and/or the PID.

http://docs.python-requests.org/en/master/user/quickstart/#response-status-

API Endpoints

ENDPOINT: RESULT
ENDPOINT: RESULT

After that, update the requests, so that both variations are possible and implemented.

See #71

Create tests for export_metadata() of Dataverse

API token is not sent when get_datafile() is called

The function get_datafile() is using the following call to get_request()

https://github.com/AUSSDA/pyDataverse/blob/87d82eaaca5a120f98ac428b1d3efc34954e13ca/src/pyDataverse/api.py#L949

As auth is by default False when calling get_request(query_str, params=None, auth=False) the API token is not being sent and the server may return a 403 error.

The same behavior could be noticed by calling get_dataset_export(), get_datafiles(), get_datafile_bundle(), ...

Other functions like get_dataverse() work as expected by using get_request() with the extra auth parameter:

https://github.com/AUSSDA/pyDataverse/blob/87d82eaaca5a120f98ac428b1d3efc34954e13ca/src/pyDataverse/api.py#L357

Is there any workaround besides changing auth=True in the get_request() definition?

Environment:

pyDataverse==0.2.1
requests==2.22.0
urllib3==1.25.7

Add import and export of Dataverse Download JSON

Add import functionality for Dataverse API Download JSON formats for Dataverses, Datasets and Datafiles, coming as a result to API requests.

Clarify: Which requests should be used for this?

Explaination:
When you retrieve a Dataset via the Api, you get more metadata for your Dataset, then you have to send for it's original creation (e.g. creation data, pid, UNF, etc.). So you need an own mapping with an own Schema file.

The Import is more important, than to export into this format (can not think of a use-case for the export).

Functionalities:

Schema
from_json(format='dataverse_download')
to_json(format='dataverse_download')

Add logging functionality

Add logging functionality to all modules.

Prepare

Snippets:

import requests
import json

def pretty_print_request(request):
    print( '\n{}\n{}\n\n{}\n\n{}\n'.format(
        '-----------Request----------->',
        request.method + ' ' + request.url,
        '\n'.join('{}: {}'.format(k, v) for k, v in request.headers.items()),
        request.body)
    )

def pretty_print_response(response):
    print('\n{}\n{}\n\n{}\n\n{}\n'.format(
        '<-----------Response-----------',
        'Status code:' + str(response.status_code),
        '\n'.join('{}: {}'.format(k, v) for k, v in response.headers.items()),
        response.text)
    )

def test_post_headers_body_json():
    url = 'https://httpbin.org/post'

    # Additional headers.
    headers = {'Content-Type': 'application/json' }

    # Body
    payload = {'key1': 1, 'key2': 'value2'}

    # convert dict to json by json.dumps() for body data.
    resp = requests.post(url, headers=headers, data=json.dumps(payload,indent=4))

    # Validate response headers and body contents, e.g. status code.
    assert resp.status_code == 200
    resp_body = resp.json()
    assert resp_body['url'] == url

    # print full request and response
    pretty_print_request(resp.request)
    pretty_print_response(resp)

Implementation

Review

Docs

Follow-Ups

Mapping DDI XML

Implement mapping from and to DDI XML.

Requirements

DDI XML from NESSTAR mapping
DDI XML from OAI-PMH endpoint mapping
DDI XML from frontend download mapping
import from all DDI XML versions
import from all DDI XML versions (lower priority)
validate data against schema
XML schemas

ACTIONS

0. Pre-Requisites

Part of re-factor models module #102
is there already a Python module out there, which works with DDI XML (or code, or developer)?
Check out DANS service

1. Research

2. Plan

Define requirements

3. Implement

4. Follow Ups

Review
- Code
- Tests
- Docs

Follow-Ups

Re-factor models module #102

support "Show Contents of a Dataverse" (`/api/dataverses/$id/contents`)

Please support the "Show Contents of a Dataverse" API.

At http://guides.dataverse.org/en/4.15/api/native-api.html#show-contents-of-a-dataverse it is documented like this:

Lists all the DvObjects under dataverse id.

GET http://$SERVER/api/dataverses/$id/contents

As a workaround for now I'm using api.get_request: https://github.com/IQSS/dataverse-sample-data/blob/ed52c316f530229b0c40463dc18c5f16d07cf11d/destroy_all_dvobjects.py#L30

Create dataset with existing DOI

Afaik, at the current state (for me it's 4.11), DDI XML is the only way to get existing DOIs into Dataverse. Therefore this functionality would be very useful. I guess it's just the :importddi that would need to be added.

A closing bracket is missing in the Developer Interface

On this page, right here:

Just a detail, but you know what they say: that's where God and the Devil are!

query_str vs path

The first argument to many functions is query_str but maybe it should be path instead. This tripped me up a little. Here's the doc I was reading for v0.2.1:

Here's a reference on query strings vs path from https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Examples

It looks like I used the word "endpoint" instead of "path" here:

https://github.com/IQSS/dataverse-sample-data/blob/4c85a6a43cca17b274e4f6ed18ca909c3cc761d3/get_api_token.py#L9

endpoint = '/builtin-users/' + username + '/api-token'
resp = api.get_request(endpoint, params=params, auth=True)

So maybe "path" or "endpoint"? To me, "query string" has a specific meaning. It's the key/value pairs after the "?" like tag=networking&order=newest in the example above.

support "Getting File Metadata" and "Updating File Metadata" API endpoints

Support for the following APIs would be appreciated:

Getting File Metadata: http://guides.dataverse.org/en/4.15/api/native-api.html#getting-file-metadata
Updating File Metadata: http://guides.dataverse.org/en/4.15/api/native-api.html#updating-file-metadata

Create test for json creation of a Dataset

Write a test, which checks the content of Dataset.json(), if expected structure and values are inside.

Create tests for export_metadata() of Datafile

Error: long_description_content_type

Error: /usr/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'long_description_content_type'

Add ability to delete a published dataset ("destroy")

http://guides.dataverse.org/en/4.15/api/native-api.html#delete-published-dataset describes a "destroy" API that allows superusers to delete datasets even after they are published. I can think of a couple use cases for this.

Harvard Dataverse, for example, allows anyone to create an account and publish datasets immediately. This means that spam datasets get created from time to time and must be deleted.
While I iterate on https://github.com/IQSS/dataverse-sample-data I'd like to be able to clear out a test server completely from all sample data I created and published. Yesterday I started working on a script for this at IQSS/dataverse-sample-data@0630d67 but it currently requires that all dataset are unpublished.

Here are the curl command examples from the API Guide link above:

Destroy by Persistent ID (PID):

curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE http://$SERVER/api/datasets/:persistentId/destroy/?persistentId=doi:10.5072/FK2/AAA000

Destroy by dataset ID:

curl -H "X-Dataverse-key:$API_TOKEN" -X DELETE http://$SERVER/api/datasets/999/destroy

I'm happy to make a pull request if you'd like. Please let me know.

Synergy opportunity? Generate python client with an OpenAPI codegen

This proposal is a bit out of scope for an issue tracker, but I really see a great opportunity for a synergy here.

If it became part of the effort to represent the Dataverse Native API in an OpenAPI Specification (formerly Swagger Specification, example: https://editor.swagger.io/ ), clients (or at least their interfaces) for many languages could be generated by a code generator like Swagger Codegen. At the same time it would be a major contribution to the Dataverse core project to have an OpenAPI definition.

I'd be glad to participate in this whole effort (OpenAPI or not), because I've been looking for place to collaborate on such code, for example to duplicate (for visibility) metadata from Datacite to our institutional Dataverse, which I'm currently implementing at the WZB.

Best
Jonas

Coordinate GDCC/dvcli and pyDataverse

Hi @skasberger,

for GDCC/dvcli I want to make heavy use of your great library.

As I wrote some comments to other issues already, I wonder if we should talk about the scope of your library. It doesn't make much sense to implement things in dvcli and then here again or the other way round.

I would be happy to either contribute here or give you access to the dvcli project.
If you would prefer talking over writing, hit me 😄

Create tests for export_metadata() of Dataset

Add api_token to usage example in docs

Add the passing of the Api token at the Api() creation of the basic usage example at the Docs.

Rename import_metadata() to import() and export_metadata() to export()

shorten the import function name. Also update docstrings.

Review other Python API wrapper for testing

Review other Python API wrapper modules to learn about testing.

Create test for json creation of a Dataverse

Write a test, which checks the content of Dataverse.json(), if expected structure and values are inside.

Add import and export of BagIt

Purpose:

make easy data and file exchange between bagit and pyDataverse

Functionalities:

validate
from_bagit()
to_bagit()

Resources

OAI-ORE and BagIT

Add history functionality to pyDataverse objects

Purpose

Add audit trail / history function to pyDataverse objects and function calls

Functionality

Question: use file(s) to store information: CSV, JSON, ???
#51
Integrate with logging #44
Integrate with OAISTree (#5 )
- Store history.json files outside oaistree structure (to seperate the two features)
Re-do all steps: a dataset could even be created at another instance again, so all workflow steps which should be stored in the history can be done n-times.
use state of pyDataverse object history to control workflow: OAISTree, API calls,
save Dataverse id and alias after creation
save Dataset id, title and internal id after creation
save Datafile id, filename and internal id after upload
synchronize pyDataverse object history with output (JSON, CSV, whatever)
save username and base_url to API calls
Filename: history_ID.json
create one entry for each event/activity

history.json structure (DRAFT):

metadata
- date-created: string, YYYY-MM-DD HH:MM:SS
- history_version: version des history schemas
history: [{}]
- dataset_id: string
- dataverse_dataset_version: string
- datafiles: [FILENAMES], ohne Pfad
- timestamp: string, YYYY-MM-DD HH:MM:SS
- description: string
- object_type: dataset oder datafile
- object_id
- creator
- change_type: {dataverse_release: , dataverse_process: }
  - dataverse: init, update, delete, move
    - release: init, update, delete, move
    - edit: major/minor release version change in Dataverse
    - delete: major/minor release version change in Dataverse
  - internal
    - specific_change_type: nähere Beschreibung des change types, zb aussda

Create test for dictionary creation of a Dataset

Write a test, which checks the content of Dataset.dict(), if expected structure and values are inside.

Add to jenkins.dataverse.org

https://jenkins.dataverse.org is a new service being offered to the Dataverse community for automated testing, continuous integration and perhaps any other use you can dream up. 😄

For more about this Jenkins service, please see http://guides.dataverse.org/en/4.14/developers/testing.html#continuous-integration

I'm very glad to see that Travis tests are already set up for pyDataverse at https://travis-ci.com/AUSSDA/pyDataverse

I am not suggesting that we replace Travis with Jenkins. Rather, I'm suggesting a "belt and suspenders" approach. In fact, for Dataverse itself we are currently using Travis to know if our Java code even compiles (and if the unit tests pass) and Jenkins to know if our API test suite is passing.

The way to add pyDataverse is to talk to me and @donsizemore at http://chat.dataverse.org (we're both in the eastern timezone of the United States and don't work weekends 😛 ). We'll get the test suite passing (with help from @skasberger probably) and then add it as a job to https://github.com/IQSS/dataverse-jenkins . Actually, once I talk to Don I'll probably create an issue over in that issue tracker for adding the job definition (XML, I believe).

Return a pyDataverse object from an API requests

When pulling a dataset, the response object contains the json of the dataset. Would it be attractive to instead return a Dataset object?

The Dataset class would simply contain getter and setter functions for all properties and the constructor only needs the json as an input.
I assume this would improve clarity (one could create a print function to show the metadata maybe using pandas?).

Additionally, a Dataset object could be passed to other functions like create_dataset.

Add import and export of OAISTree

As I mentioned at IQSS/dataverse#5235 (comment) I'm curious if the "DVTree" (Dataverse Tree) format could be used to upload sample data to a brand new Dataverse installation for use in demos and usability testing.

I would love to see some docs. Or a pointer to the code for now. Thanks! 😄

Export and import of Dataverses, Datasets and Datafies into CSV files

Export the metadata of a list of Dataverses, Datasets or Datafiles to a csv file. Header should be the attribute name. one row = one Dataverse, Dataset or Datafile. List must contain only same type of data, not mixtures of Dataverses or Datasets for example.

Add resources section to Docs

Add resources section to the Docs, where materials, such as videos, presentations, tutorials, blog posts, screencasts etc about pyDataverse can be collected.

pyDataverse talk @ Dataverse Community Conference 2019

Mapping DSpace JSON

Implement mapping from and to DSpace JSON.

Requirements

default dataset download JSON mapping
custom dataset metadata download JSON mapping
import DSpace JSON: dataset and datafile metadata
export to DSpace JSON: dataset and datafile metadata (low priority)
validate data against schema
JSON schema

ACTIONS

0. Pre-Requisites

Part of re-factor models module #102

1. Research

2. Plan

Define requirements

3. Implement

4. Follow Ups

Review
- Code
- Tests
- Docs

Follow-Ups

Re-factor models module #102
Inform relevant stakeholders about new functionality

Add custom JSON mappings

Implement mapping from and to custom JSON.

Requirements

custom JSON mapping: Dataverse, Dataset, Datafile
import from custom JSON
export to custom JSON
JSON schema
validate data against schema

ACTIONS

0. Pre-Requisites

Part of re-factor models module #102

1. Research

2. Plan

Define requirements

3. Implement

4. Follow Ups

Review
- Code
- Tests
- Docs

Follow-Ups

Re-factor models module #102

File upload metadata and error messages

When uploading a file, it should be possible to add metadata to the file.

Also a feedback could be given, wether a file already exists with the same checksum. (Should there be an option to force overwriting the file?)

Get DOI for pyDataverse

Get a DOI for the repo. Check, how versioning of releases is done. when is a new doi assigned to the repo/code?

https://guides.github.com/activities/citable-code/
add it to ORCID
add DOI Badge in README and Docs (![doi:10.7910/DVN/TJCLKP](https://img.shields.io/badge/DOI-10.7910%2FDVN%2FTJCLKP-orange.svg)](https://doi.org/10.7910/DVN/TJCLKP))

Synchronize local directory with remote folders

Purpose

Synchronize a local directory with a remote folder within a dataset at Dataverse.

User story

As a user of Dataverse, I would like to be able to continuously (e.g., daily, weekly) "mirror" ongoing data collections (e.g., by means of web scraping) with a (draft) version of my dataset at Dataverse. Currently, only one-time transfers are convenient to manage using PyDataverse.

Functionality

obtain remote metadata of files via get_datafiles(); use as argument a particular folder at the remote dataset (or the entire dataset, default)
obtain comparable metadata for local folder\ that needs to be synchronized
Compare files in (1) with (2), using filenames and file hashes
Generate a list of actions to bring in sync the directories: (a) copy from (1) to (2), (b) copy from (2) to (1), (c) delete in (1), (d) delete in (2)
Wrap functionality in new sync_folder() function, with arguments: local_folder (default: .), remote_folder (default: .), direction (one of mirror local to remote but do not delete anything on remote; mirror remote to local but do not delete anything in local; synchronize both directories, and delete files where needed), comparison (only on the basis of file names, or also on the basis of file hashes (default: hash+filename))

Create a pipeline module to control

Purpose:

implement functions to handle mass of data
allow easy control of different pipeline steps
test the module

Functionalities:

Control the workflow with a JSON file
- Schema
Work out exception handling
Data Types
- Dataverses
  - import_dataverses(list(dict))
  - get_dataverses(list(dict))
  - create_dataverses(list(dict))
  - create_dataverses_json(list(dict))
  - publish_dataverses(list(dict))
  - update_dataverses(list(dict))
  - delete_dataverses(list(dict))
- Datasets
  - import_datasets(list(dict))
  - get_datasets(list(dict))
  - create_datasets(list(dict))
  - create_datasets_json(list(dict))
  - publish_datasets(list(dict))
  - update_datasets(list(dict))
  - delete_datasets(list(dict))
  - destroy_datasets(list(dict))
  - create_private_urls(list(dict))
- Datafiles
  - import_datafiles(list(dict))
  - get_datafiles(list(dict))
  - upload_datafiles(list(dict))
  - create_datafiles_json(list(dict))
  - publish_datafiles(list(dict))
  - update_datafiles(list(dict))
  - delete_datafiles(list(dict))
  - replace_datafiles(list(dict))
- Functions
  - from_CSV() #11
  - to_CSV() #11

gdcc / pydataverse Goto Github PK

pydataverse's Introduction

pyDataverse

Running tests

Manual setup

Chat with us!

PyDataverse Working Group

pydataverse's People

Contributors

Stargazers

Watchers

Forkers

pydataverse's Issues

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context (Environment

User-friendly output that a user may expect

Current API response

Prepare

Implementation

Review

Follow-Ups

ACTIONS

0. Pre-Requisites

1. Research

2. Plan

3. Implement

4. Follow Ups

Follow-Ups

ACTIONS

0. Pre-Requisites

1. Research

2. Plan

3. Implement

4. Follow Ups

Follow-Ups

ACTIONS

0. Pre-Requisites

1. Research

2. Plan

3. Implement

4. Follow Ups

Follow-Ups

Recommend Projects

Recommend Topics

Recommend Org

Jobs