GithubHelp home page GithubHelp logo

keboola / sapi-python-client Goto Github PK

View Code? Open in Web Editor NEW
6.0 6.0 7.0 235 KB

Keboola Connection Storage API client

Home Page: http://docs.keboola.apiary.io/

License: MIT License

Python 99.91% Dockerfile 0.09%

sapi-python-client's People

Contributors

kudj avatar kukant avatar odinuv avatar ogaday avatar pivnicek avatar tomasfejfar avatar ujovlado avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sapi-python-client's Issues

Request content-type

Currently application/x-www-form-urlencoded is used everywhere. While acceptable, there may be a problem in requests, which are potentially large (such as saving/upadting configurations). There multipart/form-data has to be used. A simple solution is to use multipart/form-data everywhere disregarding a slight performance hit. This is probably important only for POST requests.

exporting gzipped table prepends ungzipped header

client.tables.export_to_file('in.c-test.sample_table', 'out/tables', is_gzip=True)

results in something like this:

"new_opt_account_vwocategorypicklist","ema_IsfromSuite"
?ٿ�&��$�������K�����uR�IQ�?�dB�e4>x�Ң��x�|>����2YΆTN�|�<'@��9����Y������^X�� jv

ungzipping the output table without header works fine. What are the options here?

Polling job status sleep time

Hello guys,
I believe there is a minor bug in the computation of sleep time between successive API calls.
The retries variable does not get incremented and therefore the sleep time is always 2 seconds.
This does not impact me in any significant way. I just noticed that this might not have been the original intention and wanted to let you know.

def block_until_completed(self, job_id):
"""
Poll the API until the job is completed.
Args:
job_id (str): The id of the job
Returns:
response_body: The parsed json from the HTTP response
containing a storage Job.
Raises:
requests.HTTPError: If any API request fails.
"""
retries = 1
while True:
job = self.detail(job_id)
if job['status'] in ('error', 'success'):
return job
time.sleep(min(2 ** retries, 20))
def block_for_success(self, job_id):

Function load of Table class does not create table

I am trying to import data to keboola from csv with load function in class Table. According to documentation, it should create table, if not exist:

    def load(self, table_id, file_path, is_incremental=False, delimiter=',',
             enclosure='"', escaped_by='', columns=None,
             without_headers=False):
             """
            Create a new table from CSV file.

When I trying to import data to non-exists table, I receive an error 404 Client Error: Not Found for url. Is this normal and load function should be only for update existing table ? Now, I have to check if table exist, if not, create it and than load the data. According to me, it should create the table automatically.

Contribution guide

Hi guys, seems awsome and I'd love to contribute my few cents.
Is there a recommended workflow for contributions? I'd like to ask b4 I start working on a feature.

Right now I have some refactoring ideas. Would you prefer discussing them in advance or should I just make the changes and discuss them in a PR?

Mainly I would

  • rename the methods of the HttpHelper class from camelCase to pep8_case
  • make the Client class a sublcass of the HttpHelper to avoid the awkard calls such as client.http.get_something() (The cleaner version would be client.get_something().
  • some minor changes (like turning the HttpHelper.tokenheader() into a @property. Imo it makes more sense to use it like HttpHelper.tokenheader

This would change the api alot and break lot of existing code so I'm asking in advance if you are ok with that.

That's what I got just through quickly skimming the source, feel free to disagree with anything 🍺

S3 retries

  • retries for put object and get object are necessary on transient errors
  • also retries for downloading manifest

Tests!

This thing needs tests.

make api_domain optional parameter in Client class

class Client:
    def __init__(self, token, api_domain='https://connection.keboola.com'):
        pass

so that you can init the client like this

client = Client('your-token')

What are the alternative api_domains anyway?

fix updated api response

======================================================================
FAIL: test_create_file (tests.functional.test_files.TestFiles) ()
----------------------------------------------------------------------
Traceback (most recent call last):
File "/code/tests/functional/test_files.py", line 67, in test_create_file
self.assertFalse('credentials' in file_info)
AssertionError: True is not false

Tables exported to csv have wrong headers when specific columns are selected

Hi,
I believe I have spotted a bug in the export_to_file() method of the Tables class.
The exported csv always gets headers based on all columns of the table even if I select only a subset of them. Moreover, the order of header labels is based purely on the table detail in KBC, even though they might be exported in different order based on the columns argument.
I think this could be fixed if you used columns from table detail only as a fallback to handle situations where columns are not specified by the user. I.e. by replacing line 422 in kbcstorage/tables.py by this:

            if columns is None:
                columns = table_detail['columns']

fix CS

doesn't pass with the latest flake

Test Enhancements

You may have noticed I've looking at the tests ( #43 #44 #45 #46) - Hope these prs are welcome! I want to make a few more enhancements to the tests to make them slightly easier to work with, and then I'll look working on the issues in the milestone for v1.0.0.

  • Tidy up tests a little
    • class name ( #43 )
    • Remove deprecation warning ( #44 )
    • tests.base.TestEndpoint.test_get doesn't actually make a request (c8d7872)
  • Separate functional and mock tests into subpackages ( #45 )
  • Use subtests ( #46 )
  • Add support for skipping functional tests when there is no connection (don't know how this would play with CI). See:
  • Profile tests to hopefully identify and shorten long running tests - Reducing total test time will reduce development friction ( #47 & #49 )
  • Modify functional tests to allow running concurrently - this will also significantly reduce test time if test modules can be run in parallel. I know that green has support for this, and other test runners like pytest may do too. Currently, because each test module makes requests against the same bucket name, they can't run concurrently ( #50 )
  • Increase test coverage - this is a longer term objective

Implement the Client class

as discussed in some comments floating somewhere around here. The client would handle passing of the base_url and auth_token to the endpoints + bundle all the endpoints to one place so we are left with this simple and clean usage:

>>> from kbcstorage import Client
>>> client = Client('myapitoken')
>>> client = Client('myapitoken', root_url='https://connection.keboola.com') # alternatively 
>>> client.buckets.list()
# a list of all buckets
>>> client.workspaces.detail(1234)
# etc...

This still allows one to instantiate endpoints one by one if he wishes to.

from kbcstorage.tables import Tables
from kbcstorage.buckets import Buckets

tables = Tables('https://connection.keboola.com', 'your-token')
# get table data into local file
tables.export_to_file(table_id='in.c-demo.some-table', path_name='/data/')
# save data
tables.create(name='some-table-2', bucket_id='in.c-demo', file_path='/data/some-table')
# list buckets
buckets = Buckets('https://connection.keboola.com', 'your-token')
buckets.list()
# list bucket tables
buckets.list_tables('in.c-demo')
# get table info
tables.detail('in.c-demo')

I am happy to implement this after we deal with #28

Architecture

Let's discuss (feel free to stop me if there was too much discussion and no code!) the client architecture here

As mentioned in #3 I'd suggest structuring out the client in the following spirit (this is just a mock-up):

class Endpoint:
    """Endpoint implements all HTTP methods
    serves as a base class for the real endpoints.
    """
    def __init__(self, root, path, token):
        self.path = path

    def _get(**params):
        return requests.get(
            urllib.parse.urljoin(root, path),
            headers={'x-storage-token':self.token},
            params=params)


# either make a subclass
class BucketEndpoint(Endpoint):
    def listBuckets(self, *args, **kwargs):
        return self._get(*args, **kwargs)

class StorageClient:
    def __init__(self, token):
        self.token = token
        self.root = 'http://something.com'
        self.buckets = BucketsEndpoint(self.root, 'buckets'):
    
# and use like this:

client = StorageClient('apitoken')
client.buckets.listBuckets(id='1234')

but @Ogaday mentioned he was developing his own client, so he might already have an alternative. How do you feel about it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.