The stream.py from aht

stream.py's Introduction

Hi there 👋

😄 I'm a software engineere & data scientist. I built innovative tools to democratize data science & machine learning.

💬 Ask me about DL/ML/dataviz & distributed system

🌱 I’m looking to collaborate on SWE/ML/data for social good

🌱 I’m currently learning about real-time data architecture, AI/tech ethics, EV/battery, sustainability

💬 How to reach me: @climate_dad

😄 Pronouns: he/him

🔭 At Panasonic, I built an automated deep learning (AutoML) system for time-series & IoT data, which can train/tune DL models on ~1B data points. Helps data scientists train/tune LSTM, ResNet, Self-Normalizing Networks & Mixture Density Network on data from S3/Parquet with zero sweats :)

🔭 At Arimo (a top-ranked data science startup per FastCompany), I built an "Alexa for big data analytics" system which can answer questions & visualize large datasets https://youtu.be/3RQDQApgz-4?t=225 (demo ~ 3:45) using Apache Spark, NLP & statistical graphics best practice & d3.js.

🤾‍ Played with distributed deep learning https://github.com/adatao/tensorspark before Horovod, Ray or tf.distributed comes around. I was not a principal instigator in this project but I provided support & optimization.

🤾‍ I contributed to Golang in the early days, like 10 years ago :). It's an efficient sieve of Eratosthenes using CSP channels that Rob Pike wants to keep as a demo/test use case in the main source code as it's quite an interesting concurrent system https://github.com/aht/gosieve.

🤾‍ Some Python hacking back in the days while I had lots of free time fork-exec and pipe with I/O redirection, Lazily-evaluated, parallelizable Python pipeline, Agents and functions that modify Python sequences in-place

🤾‍ I wrote a toy LISP interpreter that support prefix/postfix & infix op just to annoy LISP people https://github.com/aht/olisp

🤾‍ An esoteric programming gem: a self-hosting Fractran interpreter in 84 fractions. This was one of those things for which the space available is really too small to explain everything...

🌱 I gave talk on "Visualization as Data and Data as Visualization" at Strata Hadoop World 2016. As a father of a little girl, I dived into a data-driven story about women stopping coding since the 80s and imagined a world where data & visualizations are easily sharable & infinitely collaborative.

🌱 I gave a talk on "Concurrent programming with Go" circa 2011

🌱 I gave a talk on "whatis git" the stupid content trackercirca 2012

stream.py's People

Contributors

Stargazers

Watchers

stream.py's Issues

PCollector cannot collect items from ProcessPool

The documentation states:

PCollectors can collect from ForkedFeeder‘s or ProcessPool‘s (via system pipes)
...
class PCollector([waittime=0.1]): Collect items from many ForkedFeeder‘s or ProcessPool‘s.

However, when you try to pipe the output of a ProcessPool into a PCollector, it fails:

AttributeError: 'ProcessPool' object has no attribute 'outpipe'

As the error suggests, ForkedFeeder has an attribute 'outpipe', but ProcessPool does not.

I'm not sure how I could be calling it incorrectly, but I'd be glad to hear I was.

Neither ProcessPool nor ThreadPool respawn dead workers

If you are processing a stream that has a chance of failure, each workers in your ThreadPool/ProcessPool will eventually hit a failure case and die.

Once all of the workers have died, processing stops.

For example, your "Retrieving web pages concurrently" only works because your ThreadPool has more workers than failures (4 workers vs 2 failures). If you reduce the number of workers to 1, it only processes the first two URLs before dying.

Recommend Projects

aht / stream.py Goto Github PK

stream.py's Introduction

Hi there 👋

stream.py's People

Contributors

Stargazers

Watchers

Forkers

stream.py's Issues

PCollector cannot collect items from ProcessPool

Neither ProcessPool nor ThreadPool respawn dead workers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs