GithubHelp home page GithubHelp logo

stream.py's Introduction

Hi there 👋

😄 I'm a software engineere & data scientist. I built innovative tools to democratize data science & machine learning.

💬 Ask me about DL/ML/dataviz & distributed system

🌱 I’m looking to collaborate on SWE/ML/data for social good

🌱 I’m currently learning about real-time data architecture, AI/tech ethics, EV/battery, sustainability

💬 How to reach me: @climate_dad

😄 Pronouns: he/him

🔭 At Panasonic, I built an automated deep learning (AutoML) system for time-series & IoT data, which can train/tune DL models on ~1B data points. Helps data scientists train/tune LSTM, ResNet, Self-Normalizing Networks & Mixture Density Network on data from S3/Parquet with zero sweats :)

🔭 At Arimo (a top-ranked data science startup per FastCompany), I built an "Alexa for big data analytics" system which can answer questions & visualize large datasets https://youtu.be/3RQDQApgz-4?t=225 (demo ~ 3:45) using Apache Spark, NLP & statistical graphics best practice & d3.js.

🤾‍ Played with distributed deep learning https://github.com/adatao/tensorspark before Horovod, Ray or tf.distributed comes around. I was not a principal instigator in this project but I provided support & optimization.

🤾‍ I contributed to Golang in the early days, like 10 years ago :). It's an efficient sieve of Eratosthenes using CSP channels that Rob Pike wants to keep as a demo/test use case in the main source code as it's quite an interesting concurrent system https://github.com/aht/gosieve.

🤾‍ Some Python hacking back in the days while I had lots of free time fork-exec and pipe with I/O redirection, Lazily-evaluated, parallelizable Python pipeline, Agents and functions that modify Python sequences in-place

🤾‍ I wrote a toy LISP interpreter that support prefix/postfix & infix op just to annoy LISP people https://github.com/aht/olisp

🤾‍ An esoteric programming gem: a self-hosting Fractran interpreter in 84 fractions. This was one of those things for which the space available is really too small to explain everything...

🌱 I gave talk on "Visualization as Data and Data as Visualization" at Strata Hadoop World 2016. As a father of a little girl, I dived into a data-driven story about women stopping coding since the 80s and imagined a world where data & visualizations are easily sharable & infinitely collaborative.

🌱 I gave a talk on "Concurrent programming with Go" circa 2011

🌱 I gave a talk on "whatis git" the stupid content trackercirca 2012

stream.py's People

Contributors

aht avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stream.py's Issues

PCollector cannot collect items from ProcessPool

The documentation states:

PCollectors can collect from ForkedFeeder‘s or ProcessPool‘s (via system pipes)
...
class PCollector([waittime=0.1]): Collect items from many ForkedFeeder‘s or ProcessPool‘s.

However, when you try to pipe the output of a ProcessPool into a PCollector, it fails:

AttributeError: 'ProcessPool' object has no attribute 'outpipe'

As the error suggests, ForkedFeeder has an attribute 'outpipe', but ProcessPool does not.

I'm not sure how I could be calling it incorrectly, but I'd be glad to hear I was.

Neither ProcessPool nor ThreadPool respawn dead workers

If you are processing a stream that has a chance of failure, each workers in your ThreadPool/ProcessPool will eventually hit a failure case and die.

Once all of the workers have died, processing stops.

For example, your "Retrieving web pages concurrently" only works because your ThreadPool has more workers than failures (4 workers vs 2 failures). If you reduce the number of workers to 1, it only processes the first two URLs before dying.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.