GithubHelp home page GithubHelp logo

datasift-python's People

Contributors

andimiller avatar arraypad avatar badmetacoder avatar chrisyoung77 avatar dugjason avatar movermeyer avatar pbutler avatar quipo avatar roviedo avatar zcourts avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datasift-python's Issues

Unexpected error

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/local/lib/python2.7/dist-packages/datasift-0.5.1-py2.7.egg/datasift/streamconsumer_http.py", line 127, in run
self._sock = resp.fp._sock.fp._sock
AttributeError: addinfourl instance has no attribute '_sock'

Any ideas?

Twisted error on Windows with Python 3.X

A customer was using Python 3.4 in a Windows environment and getting a traceback ending with

from twisted.python import lockfile, failure
File "C:\Python34\lib\site-packages\twisted\python\lockfile.py", line 52, in <module>
_open = file
NameError: name 'file' is not defined

when importing the datasift module.

Apparently this is due to a known problem on Windows with Python 3.X (glyph is a core maintainer of twisted): http://www.scriptscoop.net/t/7d436f5544a8/twisted-work-with-python-3-3.html

Can we update the README advising Windows users to use Python 2.7?

Typo stream_base_url in __init__.py

Here's the diff.

--- init.py 2013-03-20 00:14:48.000000000 -0700
+++ init.py.patch 2013-03-20 00:20:18.000000000 -0700
@@ -1458,7 +1458,7 @@
if self._user.use_ssl():
protocol = 'https'
if isinstance(self._hashes, list):

  •        return "%s://%smulti?hashes=%s" % (protocol, self._user.stream_base_url, ','.join(self._hashes))
    
  •        return "%s://%smulti?hashes=%s" % (protocol, self._user._stream_base_url, ','.join(self._hashes))
     else:
         return "%s://%s%s" % (protocol, self._user._stream_base_url, self._hashes)
    

Ratelimit causing repeated reconnects

When getting the error: "The rate limit for twitter has been exceeded" the library repeatedly attempts to reconnect without regard for the connection delay.

No mechanism exists to stop the client gracefully

There does not currently seem to be a way to cleanly stop Client once start_stream_subscriber() has been called. Ctrl+C causes it to throw KeyboardInterrupt exceptions even if handled in the calling code. It looks like this might be because Client uses twisted by calling reactor.run() in client.py but contains no code to call reactor.stop().

PushSubscription: update hash

Now it's imposible to update the hash of an active push subscription ( maybe it's imposible because of the api ).

I think this code:

subscription = datasift_user.get_push_subscription(subscription_id)
subscription._hash = new_hash
subscription.save()

would work if you go to save method in the PushSubscription class :

    def save(self):
        """
        Save changes to the name and output parameters of this subscription.
        """
        params = {
            'id': self.get_id(),
            'name': self.get_name()
        }

        for key in self.get_output_params():
            params['%s%s' % (self.OUTPUT_PARAMS_PREFIX, key)] = self.get_output_param(key)

        self._init(self._user.call_api('push/update', params))

and add hash to params:

    def save(self):
        """
        Save changes to the name and output parameters of this subscription.
        """
        params = {
            'id': self.get_id(),
            'name': self.get_name(),
            'hash': self._hash
        }

        for key in self.get_output_params():
            params['%s%s' % (self.OUTPUT_PARAMS_PREFIX, key)] = self.get_output_param(key)

        self._init(self._user.call_api('push/update', params))

Thanks

WEBSOCKET_HOST decode error on Python 3

On Python 3 (3.4.3) a simple stream client like this:

from __future__ import print_function
from datasift import Client

ds = Client("Username", "API Key")

@ds.on_delete
def on_delete(interaction):
    print( 'Deleted interaction %s ' % interaction)

@ds.on_open
def on_open():
    print( 'Streaming ready, can start subscribing')
    csdl = 'interaction.content contains "music"'
    stream = ds.compile(csdl)['hash']

    @ds.subscribe(stream)
    def subscribe_to_hash(msg):
        print(msg)

@ds.on_closed
def on_close(wasClean, code, reason):
    print( 'Streaming connection closed')

@ds.on_ds_message
def on_ds_message(msg):
    print( 'DS Message %s' % msg)

ds.start_stream_subscriber()

throws the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.4/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/datasift-python/datasift/client.py", line 237, in _stream
    options = ssl.optionsForClientTLS(hostname=WEBSOCKET_HOST.decode("utf-8"))
AttributeError: 'str' object has no attribute 'decode'

Socket timeouts appear to cause failure

The python client hardcodes a socket timeout of 5s when it runs the initial request (L75 of streamconsumer_http.py at the time of writing):

resp = urllib2.urlopen(req, None, 5)

This appears to cause problems later on with low-volume streams. Once a socket timeout has occurred, the stream appears to never receive any more data.

Removing that socket timeout fixes the problem, but then the stream thread is difficult to shut down without terminating the whole process.

How often does the DataSift streaming endpoint send its 'connected' keepalives? 2 x that value is probably a sensible value for the timeout, even if it does mean that a stream may take a long time to shut down.

Failed to install v2.6.0 on OSX using Python 2.7.10

➜  ~  sudo pip install datasift --upgrade
The directory '/Users/jason/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/jason/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting datasift
  Downloading datasift-2.6.0.tar.gz
Requirement already up-to-date: requests<3.0.0,>=2.8.0 in /Library/Python/2.7/site-packages (from datasift)
Requirement already up-to-date: autobahn<0.10.0,>=0.9.4 in /Library/Python/2.7/site-packages (from datasift)
Collecting six<2.0.0,>=1.6.0 (from datasift)
  Downloading six-1.10.0-py2.py3-none-any.whl
Collecting twisted<16.0.0,>=14.0.0 (from datasift)
Collecting pyopenssl<0.16.0,>=0.15.1 (from datasift)
  Downloading pyOpenSSL-0.15.1-py2.py3-none-any.whl (102kB)
    100% |████████████████████████████████| 106kB 1.3MB/s
Collecting python-dateutil<3,>=2.1 (from datasift)
  Downloading python_dateutil-2.4.2-py2.py3-none-any.whl (188kB)
    100% |████████████████████████████████| 192kB 1.3MB/s
Requirement already up-to-date: service-identity>=14.0.0 in /Library/Python/2.7/site-packages (from datasift)
Requirement already up-to-date: requests-futures>=0.9.5 in /Library/Python/2.7/site-packages (from datasift)
Collecting ndg-httpsclient>=0.4.0 (from datasift)
  Downloading ndg_httpsclient-0.4.0.tar.gz
Collecting zope.interface>=3.6.0 (from twisted<16.0.0,>=14.0.0->datasift)
Collecting cryptography>=0.7 (from pyopenssl<0.16.0,>=0.15.1->datasift)
  Downloading cryptography-1.1.1-cp27-none-macosx_10_6_intel.whl (1.3MB)
    100% |████████████████████████████████| 1.3MB 359kB/s
Requirement already up-to-date: characteristic>=14.0.0 in /Library/Python/2.7/site-packages (from service-identity>=14.0.0->datasift)
Collecting pyasn1-modules (from service-identity>=14.0.0->datasift)
  Downloading pyasn1_modules-0.0.8-py2.py3-none-any.whl
Collecting pyasn1 (from service-identity>=14.0.0->datasift)
  Downloading pyasn1-0.1.9-py2.py3-none-any.whl
Requirement already up-to-date: futures>=2.1.3 in /Library/Python/2.7/site-packages (from requests-futures>=0.9.5->datasift)
Collecting setuptools (from zope.interface>=3.6.0->twisted<16.0.0,>=14.0.0->datasift)
  Downloading setuptools-18.6.1-py2.py3-none-any.whl (462kB)
    100% |████████████████████████████████| 462kB 1.0MB/s
Collecting enum34 (from cryptography>=0.7->pyopenssl<0.16.0,>=0.15.1->datasift)
Collecting ipaddress (from cryptography>=0.7->pyopenssl<0.16.0,>=0.15.1->datasift)
  Downloading ipaddress-1.0.15-py27-none-any.whl
Collecting idna>=2.0 (from cryptography>=0.7->pyopenssl<0.16.0,>=0.15.1->datasift)
  Downloading idna-2.0-py2.py3-none-any.whl (61kB)
    100% |████████████████████████████████| 61kB 2.4MB/s
Collecting cffi>=1.1.0 (from cryptography>=0.7->pyopenssl<0.16.0,>=0.15.1->datasift)
  Downloading cffi-1.3.1-cp27-none-macosx_10_10_intel.whl (192kB)
    100% |████████████████████████████████| 196kB 1.8MB/s
Collecting pycparser (from cffi>=1.1.0->cryptography>=0.7->pyopenssl<0.16.0,>=0.15.1->datasift)
Installing collected packages: six, setuptools, zope.interface, twisted, enum34, ipaddress, pyasn1, idna, pycparser, cffi, cryptography, pyopenssl, python-dateutil, ndg-httpsclient, datasift, pyasn1-modules
  Found existing installation: six 1.6.1
    Uninstalling six-1.6.1:
      Successfully uninstalled six-1.6.1
  Found existing installation: setuptools 18.0.1
    Uninstalling setuptools-18.0.1:
      Successfully uninstalled setuptools-18.0.1
  Found existing installation: zope.interface 4.1.2
    Uninstalling zope.interface-4.1.2:
      Successfully uninstalled zope.interface-4.1.2
  Found existing installation: Twisted 14.0.2
    Uninstalling Twisted-14.0.2:
      Successfully uninstalled Twisted-14.0.2
  Found existing installation: pyasn1 0.1.8
    Uninstalling pyasn1-0.1.8:
      Successfully uninstalled pyasn1-0.1.8
  Found existing installation: pyOpenSSL 0.13.1
    DEPRECATION: Uninstalling a distutils installed project (pyopenssl) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling pyOpenSSL-0.13.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 211, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip/commands/install.py", line 311, in run
    root=options.root_path,
  File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 640, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 716, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip/req/req_uninstall.py", line 125, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip/utils/__init__.py", line 315, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/tmp/pip-awPwpa-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/pyOpenSSL-0.13.1-py2.7.egg-info'

User.list_historics() call not working

The error I get is -

Traceback (most recent call last):
File "./run-historic.py", line 6, in
user.list_historics()
File "/usr/local/lib/python2.7/dist-packages/datasift-0.5.3-py2.7.egg/datasift/init.py", line 198, in list_historics
return Historic.list(self, page, per_page)
File "/usr/local/lib/python2.7/dist-packages/datasift-0.5.3-py2.7.egg/datasift/init.py", line 523, in list
retval['historics'].append(Historic(user, historic))
File "/usr/local/lib/python2.7/dist-packages/datasift-0.5.3-py2.7.egg/datasift/init.py", line 541, in init
self._init(hash)
File "/usr/local/lib/python2.7/dist-packages/datasift-0.5.3-py2.7.egg/datasift/init.py", line 612, in _init
raise InvalidDataError('The volume info is missing')
datasift.InvalidDataError: The volume info is missing

Invalid target in creating task

With the combination

target top_level : li.all.mentions.company_name
target child_1: li.subtype
target child_2: li.user.type

(actually for every combination with that top level target) the library raise an exception

DataSiftApiException: The analysis configuration contains an invalid target: li.all.mentions.company_name

Our request json/dict is the following

{
   "parameters":{
      "child":{
         "child":{
            "parameters":{
               "threshold":5,
               "target":"li.user.type"
            },
            "analysis_type":"freqDist"
         },
         "parameters":{
            "threshold":10,
            "target":"li.subtype"
         },
         "analysis_type":"freqDist"
      },
      "parameters":{
         "threshold":200,
         "target":"li.all.mentions.company_name"
      },
      "analysis_type":"freqDist"
   }
}

service is linkedin.
This same call unexpectedly works using Pylon web interface.

What's happening?

output_mapper mapping created_at to short date, expenting long date

My pull method is returning:

"twitter": {
    "created_at": "Mon, 17 Mar 2014 14:29:24 +0000",
    "filter_level": "medium", ...

output_mapper is expecting "created_at" to be in date_handler_short format: ""%Y-%m%d %H:%M:%S" but my results are in long format "%a, %d %b %Y %H:%M:%S +0000"

Getting error: ValueError: time data 'Mon, 17 Mar 2014 15:01:08 +0000' does not match format '%Y-%m-%d %H:%M:%S'

create_definition requires a bytestring

The User.create_definition() method requires a bytestring - there's an explicit isinstance check for str. Ideally it should also be able to take unicode as well, and encode it itself - that means the application can focus on the data rather than worry about encoding and decoding.

If I get time, I'll prepare a small patch.

Windows Support

Originally raised in http://dev.datasift.com/discussions/python-api-windows

I have started to used the Python API and it seems to be working fine. However, I have tried to run some of the examples in the documentation with little success.
I am specially interested in the Live-Stream example.
Once I run it, I have this error

RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.

This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
I know that I managed to connect to my account and fetch the status data, so It is not a problem of connectivity.
I am a bit new so I am might be missing something basic, but the documentation is scarce for Python users. Any hint and or direction will be really appreciate it.

Cannot install package due to openssl error

Hello!

I can't install the package due to a conflict with my openssl version and the version of pyopenssl you've pinned. Would it be possible to bump it to a version around 0.15 as that installs fine? :)

I managed to get the package installed by using --no-deps and then installing pyopenssl before datasift :)

Here's the related issue pyca/pyopenssl#276

Travis CI build fails for pull requests

Automated builds triggered by pull requests fail with the error:

Please export a github OAUTH token as GITHUB_TOKEN to run these tests

The successful builds from master all export GITHUB_TOKEN as part of the step:

Setting environment variables from .travis.yml

Unable To Post Large Entity of CSDL for validate and compile

Hi,

This is rather a server side issue but I figured that if the changes go through on the server side, the client would have some modifications as well (hopefully in the near future)

So currently the way validate and compile endpoints work is to accept POST request with URL parameters specifying the CSDL to be validated/compiled. This works great in normal cases. However, if the entity is very large for URL parameters to handle, then by HTTP's nature, I'd get a 414 Request URI too long error (which, by the way, is not handled by DataSift's API endpoints, I'd still get a header response code 200). The real solution, imho, would be accepting HTTP body payload on the server side, this is what POST mainly used for anyways.

Kindly,

woozyking

Better handling of permission errors

If a user tries to access an endpoint they do not have access to the Python client throws an error along the lines of:

Traceback (most recent call last):
  File "historics.py", line 15, in <module>
    print(datasift.historics.status(start, end_time))
  File "/Library/Python/2.7/site-packages/datasift/historics.py", line 102, in status
    return self.request.get('status', params=params)
  File "/Library/Python/2.7/site-packages/datasift/request.py", line 39, in get
    return self.build_response(self('get', path, params=params, headers=headers), path=path)
  File "/Library/Python/2.7/site-packages/datasift/request.py", line 84, in build_response
    if int(response.headers.get("x-ratelimit-cost")) > int(response.headers.get("x-ratelimit-remaining")):
TypeError: int() argument must be a string or a number, not 'NoneType'

A new type of exception needs to be thrown when it sees '"error":"You do not have permission to access this endpoint"' as a response.

Issue with quickstart tutorial python3.4 str no attribute decode

With the code that I can find in the quickstart tutorial I have this error:

_Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/usr/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(_self._args, *_self._kwargs)
File "/usr/local/lib/python3.4/dist-packages/datasift/client.py", line 237, in stream
options = ssl.optionsForClientTLS(hostname=WEBSOCKET_HOST.decode("utf-8"))
AttributeError: 'str' object has no attribute 'decode'

If I change in the tutorial's code
client = datasift.Client('DATASIFT_USERNAME', 'DATASIFT_API_KEY')
with
client = datasift.Client(b'DATASIFT_USERNAME', b'DATASIFT_API_KEY')
I have this error:

Traceback (most recent call last):
File "tutorial_datasift.py", line 9, in
fltr = client.compile(csdl)
File "/usr/local/lib/python3.4/dist-packages/datasift/client.py", line 256, in compile
return self.request.post('compile', data=dict(csdl=csdl))
File "/usr/local/lib/python3.4/dist-packages/datasift/request.py", line 46, in post
return self.build_response(self('post', path, params=params, headers=headers, data=data), path=path)
File "/usr/local/lib/python3.4/dist-packages/datasift/request.py", line 86, in build_response
raise AuthException(data)
datasift.exceptions.AuthException: {'error': 'Authorization failed'}

If I change in client.py line 237
options = ssl.optionsForClientTLS(hostname=WEBSOCKET_HOST.decode("utf-8"))
with
options = ssl.optionsForClientTLS(hostname=WEBSOCKET_HOST)
it seems run.

Issue with twisted

I am facing this issue after subscribing to stream "Stream subscriber shutting down because connection was closed uncleanly (peer dropped the TCP connection without previous WebSocket closing handshake)" .

Warning while importing datasift
c:\Python27\lib\site-packages\zope.interface-4.1.2-py2.7-win32.egg\zope__init__.py:3: UserWarning: Module twisted was already imported from c:\Python27\lib\site-packages\twisted__init__.pyc, but c:\python27\lib\site-packages\autobahn-0.9.6-py2.7.egg is being added to sys.path
import pkg_resources
c:\Python27\lib\site-packages\twisted\internet\win32eventreactor.py:64: UserWarning: Reliable disconnection notification requires pywin32 215 or later
category=UserWarning)

Can someone suggest a fix for this ?

account identity list unexpected data type

The dictionary that we get from client.account.identity.list() has an updated_at value that is an integer, not a datetime object as would be expected.

    {
        "api_key": "dff990e42c14ef5d5aa280b0e9fea9e2",
        "created_at": "Wed, 13 May 2015 10:46:05 GMT",
        "expires_at": null,
        "id": "5dbb799eea004fcb3e2d999d767e0a20",
        "label": "DataSift",
        "master": true,
        "status": "active",
        "updated_at": 1440604653
    }

On running out of credit, library continues to try to reconnect.

On receiving the following error:
{"status":"failure","message":"You have insufficient credits available to consume the stream"}

Python Lib continues trying to reconnect. Should receive this message, and stop reconnection attempts. Check the same is true when trying to send invalid auth credentials

Exception handler in StreamConsumer_HTTP_Thread is too broad

A large try/except captures much of the reading code, with the except line here:

https://github.com/datasift/datasift-python/blob/develop/datasift/streamconsumer_http.py#L110

This catches everything, including errors in client's handler code, making debugging much harder.

The try/except should be more targetted in terms of the code that it surrounds, and the exception type should be a lot more specific (perhaps just socket exceptions, if that's what it's trying to catch - which is what is implied by the error message.)

problem with python setup install

When running python setup.. library throws authentication error : operator not validate, shutting down to avoid lockup or fatal exception.

push.update fails to update subscription's output params

I've been unable to update a subscription's output params by calling client.push.update. I receive a response with status code 200 but the output_params of the subscription remain unchanged.

I'm passing the following dict to push.update (sensible values edited away)

output_params: {
        'host': 'host.example.com',
        'port': 22,
        'auth': {
            'username': 'my_username',
            'password': 'my_password'
        },
        'directory': '/path/to/datasift/files',
        'file_prefix': 'datasift',
        'format': 'json_meta',
        'delivery_frequency': 300,
        'max_size': 10485760,
        'mark_in_progress': 0
    }

A prior call to push.validate with this same dict returns Validation Successful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.