GithubHelp home page GithubHelp logo

Comments (8)

dw avatar dw commented on September 18, 2024

Hi Fabio,

There is a fundamental problem with the existing code and using urllib2. I have found it impossible to make reliable use of streaming HTTP with any version of Python (behaviour differs across versions, but worse, hangs occur depending on received data frame size). My current attempts to work around this are extremely messy (effectively patching socket module internal, and urllib2 module internals).

I am working offline on a rewrite that does not utilize urllib (instead it uses an asynchronous approach). However until that time, the existing code isn't very useful, Python 3 or otherwise. Suggest holding off on any work for a few weeks, as it will likely be wasted.

Thanks for getting in touch.

David

from py-lightstreamer.

lxnay avatar lxnay commented on September 18, 2024

Hi David,
thanks for your prompt answer.
Then I hold off my patchwork for a while.

Out of curiosity, what kind of hang are you seeing? Could you
elaborate a bit more on this (perhaps providing a test case)? I'd love
to help out if possible.
I've seen that you're using threads (as I expected actually), maybe
you're just hitting a deadlock somewhere (I usually map SIGQUIT to a
simple thread dump function [1] to find out). Some parts of the Python
library are still behaving oddly in MT environments at times.

Feel free to contact me if you need further help with Lightstreamer (I
can forward your questions to the support team as well if I can).

[1] http://pastebin.com/kuEEqDi4

Regards,

Fabio Erculiani
Software Engineer
www.lightstreamer.com :: Weswit Srl

[email protected]

from py-lightstreamer.

dw avatar dw commented on September 18, 2024

Hi Fabio,

I haven't got a test case to hand right now, but the general idea was that
in particular versions of Python, fp.readline() on the urllib2 result
object was hanging. I only saw it on a slightly patched Debian 6 machine
(provided by ovh.net) and with a particular lightstreamer server. On my
local OS X machine, the problem did not appear at all.

Looking at socket._fileobject, I concluded that it is probably for the best
if use of this code is avoided altogether. _fileobject.readline() is doing
horrible things to get a lne, urllib2 is doing horrible things to give me a
socket._fileobject, and I am in turn doing horrible things to get a
_fileobject with block buffering disabled (requires temporarily modifying
process-global state).

It's not a threading-related deadlock. The hang was on sock.recv(1)
(another terrible smell - socket._fileobject implements readline() by
invoking 1 syscall per byte!).

I'm investigating alternatives at the moment. The problem with the
available asynchronous frameworks is just that - they are all-encompassing
libraries that assume they are first class actors in your design. For
example with Twisted, there is no easy way to make use of twisted.web2
(which provides streaming functionality) without causing the process-global
Twisted reactor to be touched. This isn't acceptable for a library, nor
would a huge dependency like Twisted be either.

Tornado is another option, but it is even less flexible than Twisted. For
the purposes of py-lightstreamer, it may make sense to simply write a small
self-contained select() loop and minimal HTTP client within the library
itself, and run that in a thread. You still get a thread-friendly
interface, but internally the library will multiplex connections onto a
single thread (my initial use case requires 5x simultaneous Lightstreamer
streams per account - the number of threads quickly adds up :)).

I'm warm to new ideas. The major problem for me is producing a horrible
library that requires 100 dependencies just to run. My preference would be
for a small, self-contained, reliable, well tested library.

Thanks,

David

On 24 August 2012 16:16, Fabio Erculiani [email protected] wrote:

Hi David,
thanks for your prompt answer.
Then I hold off my patchwork for a while.

Out of curiosity, what kind of hang are you seeing? Could you
elaborate a bit more on this (perhaps providing a test case)? I'd love
to help out if possible.
I've seen that you're using threads (as I expected actually), maybe
you're just hitting a deadlock somewhere (I usually map SIGQUIT to a
simple thread dump function [1] to find out). Some parts of the Python
library are still behaving oddly in MT environments at times.

Feel free to contact me if you need further help with Lightstreamer (I
can forward your questions to the support team as well if I can).

[1] http://pastebin.com/kuEEqDi4

Regards,

Fabio Erculiani
Software Engineer
www.lightstreamer.com :: Weswit Srl

[email protected]


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-8004330.

from py-lightstreamer.

lxnay avatar lxnay commented on September 18, 2024

Hi David,
have you tried httplib2 [1] ? I understand that this lib still uses socket, but maybe it sucks a bit less.
It would be very unfortunate if tons of dependencies are required just to implement a streaming client.

However, by default, recv() on fds (thus, sockets) is blocking on Linux, you may just hit this when the receive buffer is empty and then some other things start messing up the socket state (tcp state?). I'll have a look as soon as I can (probably next week, unlikely this weekend).

[1] http://pypi.python.org/pypi/httplib2

from py-lightstreamer.

dannyclark avatar dannyclark commented on September 18, 2024

Hi,

I also need to connect to a lightstreamer server from python, so I've had a look at some of the issues in this thread and I'll share some thoughts, just in case they're of interest. I did start looking at creating a patch but given the comment that this was being worked on offline and the fact that I wanted to use a very different concurrency model, I've implemented from scratch as a separate project here. It's really just a proof of concept for these ideas, nowhere near as resilient or fully-featured as this library, however it works well enough for the lightstreamer server I was connecting to, in order to demonstrate the ideas.

  1. I had a look at streaming HTTP without using custom handlers for urllib2 and this example seemed a good way to go, so that's what I've used.
  2. For an asyncrhonous approach, I've kept the concurrency model out of the library but I've got an example using gevent which works.
  3. I haven't come across the socket hanging issue described above (and I realise the requests library uses httplib underneath) so I don't claim to have any solution to that one. I did come across a basic threading deadlock on startup with the example code from the top of lightstreamer.py source. Maybe that's just because of latency to the lightstreamer server I was connecting to? In any case, I found I could work around it reliably with a time.sleep(2) after the create_session() and before the send_control(): I'm sure there are more robust ways to do that.

Happy to combine efforts (i.e. and submit as a patch instead) if you're interested in taking a similar approach to concurrency: I think this is the main difference. Though it did sound like you were headed down a very different route with e.g. tornado or twisted.

Cheers,
Dan.

from py-lightstreamer.

dw avatar dw commented on September 18, 2024

Hi Danny!

Thanks for getting in touch. I'm impressed with how small your module is,
although there's subtleties in how py-lightstreamer is written that might
be unclear. Firstly, it's designed to be integrated into an existing
message-oriented system where control flow (or process model) is less
important than having reliable actors generating actionable events and
otherwise minding their own business (in this particular system multiple
similar components coexist, and minimizing the "behavioural wrapping" keeps
things neater downstream).

Another divergence is in attempting to thoroughly document and gently
abstract: the Lightstreamer protocol is surprisingly finickity despite its
size, and besides the protocol PDF, there isn't much protecting the
unsuspecting consumer from having a bad time or writing hacky code.
Subtlety appears in many places, e.g. the need to track prior rows for
certain table modes, the string escaping method, the case-sensitivity of
various strings, e.g. "raw" vs "RAW", "LS_snapshot=1" vs "LS_snapshot=true"
(leading to an HTTP 500!)), etc. Another factor is the total absence of
good error messages from Lightstreamer itself when something breaks.

I've been distracted by other work, but my goal would be to finish wrapping
away remaining wire details around one or two higher level classes (an
unfinished offline version has a separate Table class, for example).

A primary aspect leading to interest in converting to asynchronous IO is
due the number of connections required; currently I'd require a minimum of
7 threads just to maintain a connection to a single account with the
service I've written the library in order to consume. In order to consume
multiple accounts across several providers, quickly the thread count
becomes problematic.

I'm not sure I understood the comment about imposing a concurrency model,
it would be nice if you could expand on that. The API is fully
non-blocking, suiting it to consumption in various environments: Twisted,
GTK/Qt, multi-threaded code, or even gevent (modulo the phase of the moon
and assuming its morass of monkey-patches are functioning correctly today).
Indeed it's difficult to have another interface without exporting details
such as waits or network retries in the connection/subscription path to the
consumer.

I'd rather avoid a dependency on external libraries, but if you've avoided
the deadlock problem with Requests, then that definitely sounds like the
path of least resistance for the time being! :) If you're interested, then
by all means please make a patch, otherwise if you wait long enough I may
do it anyway.

Going forward I want to introduce a 'Table' class that gives cleaner
start()/stop()/listen()/delete() methods than the horrible make_op() API,
perhaps replace Dispatcher/WorkQueue with something less lame, although
beyond these and bug fixes in the form of ensuring retries, reconnects, and
fixing the insane-number-of-threads issue, it already feels relatively
useful.

I'd also like to re-licence it from AGPL3 to Apache, the former is too
restrictive on commercial use.

David

On 12 October 2012 01:13, dannyclark [email protected] wrote:

Hi,

I also need to connect to a lightstreamer server from python, so I've had
a look at some of the issues in this thread and I'll share some thoughts,
just in case they're of interest. I did start looking at creating a patch
but given the comment that this was being worked on offline and the fact
that I wanted to use a very different concurrency model, I've implemented
from scratch as a separate project herehttps://github.com/dannyclark/py-lightstreamer-lite/.
It's really just a proof of concept for these ideas, nowhere near as
resilient or fully-featured as this library, however it works well enough
for the lightstreamer server I was connecting to, in order to demonstrate
the ideas.

I had a look at streaming HTTP without using custom handlers for
urllib2 and this examplehttp://docs.python-requests.org/en/latest/user/advanced/#streaming-requestsseemed a good way to go, so that's what I've used.
2.

For an asyncrhonous approach, I've kept the concurrency model out of
the library but I've got an example using geventhttps://github.com/dannyclark/py-lightstreamer-lite/blob/master/examples/multiple-sessions-gevent.pywhich works.
3.

I haven't come across the socket hanging issue described above (and I
realise the requests library uses httplib underneath) so I don't claim to
have any solution to that one. I did come across a basic threading deadlock
on startup with the example code from the top of lightstreamer.py source.
Maybe that's just because of latency to the lightstreamer server I was
connecting to? In any case, I found I could work around it reliably with a
time.sleep(2) after the create_session() and before the send_control(): I'm
sure there are more robust ways to do that.

Happy to combine efforts (i.e. and submit as a patch instead) if you're
interested in taking a similar approach to concurrency: I think this is the
main difference. Though it did sound like you were headed down a very
different route with e.g. tornado or twisted.

Cheers,
Dan.


Reply to this email directly or view it on GitHubhttps://github.com//issues/1#issuecomment-9360117.

from py-lightstreamer.

dannyclark avatar dannyclark commented on September 18, 2024

David,

Thanks for the reply and thanks for explaining some of your design decisions and design goals. I do see your point about the subtleties of the protocol and indeed one of my reasons for getting in touch was so as not to end up just blindly re-implementing any of the work you've done there!

All I meant by "not imposing a concurrency model" was that I didn't want to use threading or multiprocessing or any async I/O library at all in my (minimal) implementation. The choice is left to the user of the library. I appreciate this means a little more work to do for the user, but then he has free rein to use threads, processes, green threads or whatever makes sense for the application as a whole. I've removed this comment from the README as it was a bit misleading.

I get your point about py-lightstreamer being suitable for different concurrency environments and I've since been able to use gevent with py-lightstreamer, which works fine (after replacing "while True: signal.pause()" with something like "while True: time.sleep(10)" at the end of the example code).

Cheers,
Dan.

from py-lightstreamer.

femtotrader avatar femtotrader commented on September 18, 2024

see Lightstreamer/Lightstreamer-example-StockList-client-python#2

from py-lightstreamer.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.