mesosphere / rendler Goto Github PK
View Code? Open in Web Editor NEWA rendering web crawler for Apache Mesos.
A rendering web crawler for Apache Mesos.
New terminal state introduced in Mesos 0.21.0.
I'm not sure how this happened. I followed the advance cluster course on the mesosphere website and built out a four node cluster. I then decided to try installing RENDLER on it to get a feel for how a custom framework internals work.
After cloning the repo down to my master node I tried executing the python script and was greeted with an the following import error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named native
So I went looking to see what modules python is aware of. In the packages folder: /usr/lib/python2.7/site-packages/mesos
I just see just two module folders:
I don't see a native
folder. This module must have not been installed when I followed the advanced course to build out the cluster. I spent a bit of time trying to figure out the issue then settled with installing the .egg by doing the following:
# visit https://open.mesosphere.com/downloads/mesos/
# find the latest Python egg for my OS
wget http://downloads.mesosphere.io/master/centos/7/mesos-0.26.0-py2.7-linux-x86_64.egg
sudo easy_install mesos-0.26.0-py2.7-linux-x86_64.egg
We should be using the same VirtualBox image instead of a specific / custom box.
It might be worth noting that you need a few things on your system to get this working for the Python example.
You will receive this error if you try and run without installing a few modules.
File "crawl_executor.py", line 25, in <module>
from bs4 import BeautifulSoup
ImportError: No module named bs4
sudo pip install wget
sudo pip install beautifulsoup4
sudo pip install html5lib
sudo yum install -y libxml2-devel
sudo yum install -y libxslt-devel
sudo yum install -y python-devel
sudo pip install lxml
You will get errors about PhantomJs like the following:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "render_executor.py", line 62, in run_task
if call(["phantomjs", "render.js", url, destination]) != 0:
File "/usr/lib64/python2.7/subprocess.py", line 524, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib64/python2.7/subprocess.py", line 1308, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
To resolve that you need to build PhantomJs from source. If you can find a binary for your Linux distro then go with that. I used a binary I found for Centos 7 here. Note there are some issues bundling binaries for PhantomJs see thead here. If you must build from source follow the steps below it can take an hour or so.
# needed to phantomjs build from source
sudo yum -y install gcc gcc-c++ make flex bison gperf ruby \
openssl-devel freetype-devel fontconfig-devel libicu-devel sqlite-devel \
libpng-devel libjpeg-devel
git clone --recurse-submodules https://github.com/ariya/phantomjs.git
cd phantomjs
./build.py
Also the Executer throws a nice warning about not explicitly specifying the parser for BS4 that appears to halt the script.
Executor registered on slave 586d51bc-408a-4191-bce7-8527a6c0f2f4-S0
/usr/lib/python2.7/site-packages/bs4/__init__.py:166: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
To get rid of this warning, change this (See PR #41):
BeautifulSoup([your markup])
to this:
BeautifulSoup([your markup], "lxml")
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.