keboola / sapi-python-client Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 7.0 235 KB

Keboola Connection Storage API client

Home Page: http://docs.keboola.apiary.io/

License: MIT License

Python 99.91% Dockerfile 0.09%

sapi-python-client's People

Contributors

Stargazers

Watchers

Forkers

frantisekrehor ogaday davidesner flowpay-io sirbenjammin datasentics komackaj

sapi-python-client's Issues

Split tests to functional and mocks

Currently application/x-www-form-urlencoded is used everywhere. While acceptable, there may be a problem in requests, which are potentially large (such as saving/upadting configurations). There multipart/form-data has to be used. A simple solution is to use multipart/form-data everywhere disregarding a slight performance hit. This is probably important only for POST requests.

Add error handling to the base Endpoint request methods.

#14 (comment)

Improve parameters passed to the load_tables method of Workspaces.

#14 (comment)

Packagization

should be installable as a package

exporting gzipped table prepends ungzipped header

client.tables.export_to_file('in.c-test.sample_table', 'out/tables', is_gzip=True)

results in something like this:

"new_opt_account_vwocategorypicklist","ema_IsfromSuite"
?ٿ�&��$�������K�����uR�IQ�?�dB�e4>x�Ң��x�|>����2YΆTN�|�<'@��9����Y������^X�� jv

ungzipping the output table without header works fine. What are the options here?

Polling job status sleep time

Hello guys,
I believe there is a minor bug in the computation of sleep time between successive API calls.
The retries variable does not get incremented and therefore the sleep time is always 2 seconds.
This does not impact me in any significant way. I just noticed that this might not have been the original intention and wanted to let you know.

sapi-python-client/kbcstorage/jobs.py

Lines 100 to 121 in 5a93926

 def block_until_completed(self, job_id): 

 """ 

  Poll the API until the job is completed. 

  Args: 

  job_id (str): The id of the job 

  Returns: 

  response_body: The parsed json from the HTTP response 

  containing a storage Job. 

  Raises: 

  requests.HTTPError: If any API request fails. 

  """ 

 retries = 1 

 while True: 

 job = self.detail(job_id) 

 if job['status'] in ('error', 'success'): 

 return job 

 time.sleep(min(2 ** retries, 20)) 

 def block_for_success(self, job_id):

Function load of Table class does not create table

I am trying to import data to keboola from csv with load function in class Table. According to documentation, it should create table, if not exist:

    def load(self, table_id, file_path, is_incremental=False, delimiter=',',
             enclosure='"', escaped_by='', columns=None,
             without_headers=False):
             """
            Create a new table from CSV file.

When I trying to import data to non-exists table, I receive an error 404 Client Error: Not Found for url. Is this normal and load function should be only for update existing table ? Now, I have to check if table exist, if not, create it and than load the data. According to me, it should create the table automatically.

Support load into pandas DataFrame

Argument validation

#14 (comment)

Get rid of `Client` class

Contribution guide

Hi guys, seems awsome and I'd love to contribute my few cents.
Is there a recommended workflow for contributions? I'd like to ask b4 I start working on a feature.

Right now I have some refactoring ideas. Would you prefer discussing them in advance or should I just make the changes and discuss them in a PR?

Mainly I would

rename the methods of the HttpHelper class from camelCase to pep8_case
make the Client class a sublcass of the HttpHelper to avoid the awkard calls such as client.http.get_something() (The cleaner version would be client.get_something().
some minor changes (like turning the HttpHelper.tokenheader() into a @property. Imo it makes more sense to use it like HttpHelper.tokenheader

This would change the api alot and break lot of existing code so I'm asking in advance if you are ok with that.

That's what I got just through quickly skimming the source, feel free to disagree with anything 🍺

S3 retries

retries for put object and get object are necessary on transient errors
also retries for downloading manifest

fix creation of encrypted files

https://keboola.slack.com/archives/C02CGRFH8/p1567018886010400

Tests!

This thing needs tests.

make api_domain optional parameter in Client class

class Client:
    def __init__(self, token, api_domain='https://connection.keboola.com'):
        pass

so that you can init the client like this

client = Client('your-token')

What are the alternative api_domains anyway?

base_url must be only base url

i.e https://connection.keboola.com/ not https://connection.keboola.com/v2/storage/

correspond to KBC_URL in https://developers.keboola.com/extend/common-interface/environment/#environment-variables

fix updated api response

======================================================================
FAIL: test_create_file (tests.functional.test_files.TestFiles) ()
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/tests/functional/test_files.py", line 67, in test_create_file
self.assertFalse('credentials' in file_info)
AssertionError: True is not false

bring back support for tables

flake8

when passing, remove https://github.com/keboola/sapi-python-client/blob/master/.travis.yml#L8

fix UA

support multipart uploads

to allow uploading larger files
https://github.com/keboola/storage-api-php-client/blob/master/src/Keboola/StorageApi/Client.php#L1310

Tables exported to csv have wrong headers when specific columns are selected

Hi,
I believe I have spotted a bug in the export_to_file() method of the Tables class.
The exported csv always gets headers based on all columns of the table even if I select only a subset of them. Moreover, the order of header labels is based purely on the table detail in KBC, even though they might be exported in different order based on the columns argument.
I think this could be fixed if you used columns from table detail only as a fallback to handle situations where columns are not specified by the user. I.e. by replacing line 422 in kbcstorage/tables.py by this:

            if columns is None:
                columns = table_detail['columns']

fix CS

doesn't pass with the latest flake

Primary key seems to fail

Test Enhancements

You may have noticed I've looking at the tests ( #43 #44 #45 #46) - Hope these prs are welcome! I want to make a few more enhancements to the tests to make them slightly easier to work with, and then I'll look working on the issues in the milestone for v1.0.0.

Tidy up tests a little
- class name ( #43 )
- Remove deprecation warning ( #44 )
- tests.base.TestEndpoint.test_get doesn't actually make a request (c8d7872)
Separate functional and mock tests into subpackages ( #45 )
Use subtests ( #46 )
~~Add support for skipping functional tests when there is no connection (don't know how this would play with CI). See:~~
- https://docs.python.org/3/library/unittest.html#unittest-skipping
- https://ddt.readthedocs.io/en/latest/example.html
Profile tests to hopefully identify and shorten long running tests - Reducing total test time will reduce development friction ( #47 & #49 )
Modify functional tests to allow running concurrently - this will also significantly reduce test time if test modules can be run in parallel. I know that green has support for this, and other test runners like pytest may do too. Currently, because each test module makes requests against the same bucket name, they can't run concurrently ( #50 )
Increase test coverage - this is a longer term objective

Implement the Client class

as discussed in some comments floating somewhere around here. The client would handle passing of the base_url and auth_token to the endpoints + bundle all the endpoints to one place so we are left with this simple and clean usage:

>>> from kbcstorage import Client
>>> client = Client('myapitoken')
>>> client = Client('myapitoken', root_url='https://connection.keboola.com') # alternatively 
>>> client.buckets.list()
# a list of all buckets
>>> client.workspaces.detail(1234)
# etc...

This still allows one to instantiate endpoints one by one if he wishes to.

from kbcstorage.tables import Tables
from kbcstorage.buckets import Buckets

tables = Tables('https://connection.keboola.com', 'your-token')
# get table data into local file
tables.export_to_file(table_id='in.c-demo.some-table', path_name='/data/')
# save data
tables.create(name='some-table-2', bucket_id='in.c-demo', file_path='/data/some-table')
# list buckets
buckets = Buckets('https://connection.keboola.com', 'your-token')
buckets.list()
# list bucket tables
buckets.list_tables('in.c-demo')
# get table info
tables.detail('in.c-demo')

I am happy to implement this after we deal with #28

class Endpoint:
    """Endpoint implements all HTTP methods
    serves as a base class for the real endpoints.
    """
    def __init__(self, root, path, token):
        self.path = path

    def _get(**params):
        return requests.get(
            urllib.parse.urljoin(root, path),
            headers={'x-storage-token':self.token},
            params=params)


# either make a subclass
class BucketEndpoint(Endpoint):
    def listBuckets(self, *args, **kwargs):
        return self._get(*args, **kwargs)

class StorageClient:
    def __init__(self, token):
        self.token = token
        self.root = 'http://something.com'
        self.buckets = BucketsEndpoint(self.root, 'buckets'):
    
# and use like this:

client = StorageClient('apitoken')
client.buckets.listBuckets(id='1234')

but @Ogaday mentioned he was developing his own client, so he might already have an alternative. How do you feel about it?

Limit in export_raw have invalid validation

sapi-python-client/kbcstorage/tables.py

Line 509 in 276bceb

if limit is not None and limit is not isinstance(table_id, int):

limit should be int but validation checks table id instead

	def block_until_completed(self, job_id):
	"""
	Poll the API until the job is completed.

	Args:
	job_id (str): The id of the job

	Returns:
	response_body: The parsed json from the HTTP response
	containing a storage Job.

	Raises:
	requests.HTTPError: If any API request fails.
	"""
	retries = 1
	while True:
	job = self.detail(job_id)
	if job['status'] in ('error', 'success'):
	return job
	time.sleep(min(2 ** retries, 20))

	def block_for_success(self, job_id):

keboola / sapi-python-client Goto Github PK

sapi-python-client's People

Contributors

Stargazers

Watchers

Forkers

sapi-python-client's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs