Comments (7)
The code works with pyspark 2.3.1 if you don't specify the --py-files dependencies.zip option. I'm not sure why it works with pyspark 2.2.1. It may have something to do with the directory structure or files in the dependencies directory.
from pyspark-example-project.
Ok, I found the issue. There is a py4j version in the dependencies.zip file. If you remove the py4j directory out of dependencies.zip, then it works with pyspark 2.3.1.
Also, note that the content in dependencies directory doesn't match dependencies.zip.
from pyspark-example-project.
Good morning, Peter.
Many thanks for your trying to help me. I followed on your suggestion and completely removed py4j directory from the archive. Now, I am facing the error <'module' object has no attribute 'Logger'> (see below). Any ideas on what might cause this?
(py2715env) al$ spark-submit --master local[*] --files etl_config.json etl_job.py
2018-08-27 10:21:27 WARN Utils:66 - Your hostname, al resolves to a loopback address: 127.0.0.1; using 172.21.80.16 instead (on interface en0)
2018-08-27 10:21:27 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-27 10:21:28 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/Users/al/Desktop/dev/Spark/etl/etl_job.py", line 41, in
from pyspark import SparkFiles
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/init.py", line 46, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 31, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 221, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 885, in CloudPickler
AttributeError: 'module' object has no attribute 'Logger'
2018-08-27 10:21:28 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-27 10:21:28 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/2k/1p2m03494rz7b16rp96zwf8m_86r_h/T/spark-f3a68ca0-d982-4390-b24a-30cbaa674ba4
from pyspark-example-project.
from pyspark-example-project.
Thanks, Peter.
I did what you suggested, but it is still there (see below). What else should I check?
(py2715env) al$ spark-submit --master local[*] --py-files dependencies.zip --files etl_config.json etl_job.py
2018-08-27 12:41:04 WARN Utils:66 - Your hostname, al resolves to a loopback address: 127.0.0.1; using 172.21.80.16 instead (on interface en0)
2018-08-27 12:41:04 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-27 12:41:04 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/Users/al/Desktop/dev/Spark/etl/etl_job.py", line 41, in
from pyspark import SparkFiles
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/init.py", line 46, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 31, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 221, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 885, in CloudPickler
AttributeError: 'module' object has no attribute 'Logger'
2018-08-27 12:41:05 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-27 12:41:05 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/2k/1p2m03494rz7b16rp96zwf8m_86r_h/T/spark-85c1f328-9ddf-4130-bce2-479378ab0312
(py2715env) al$
from pyspark-example-project.
Hello.
I have replicated the (original) error on my side. I'll try and fix it today or later this week.
from pyspark-example-project.
Okay, delete dependencies.zip
and then re-build it on your system using,
./build_dependencies.sh dependencies venv
Assuming you've named the folders in the same way that I have. It should then work okay. I will probably remove dependencies.zip from the repo as this really ought to be built locally and not source controlled.
from pyspark-example-project.
Related Issues (18)
- PEX integration? HOT 1
- YAML instead of JSON HOT 1
- import sklearn fails HOT 1
- Nice work! Could you please add the license file? HOT 5
- Pyspark-best-practices
- Best practice around passing DF to multiple functions HOT 1
- `AttributeError: module 'logging' has no attribute 'Handler' in PySpark3
- Setup and Teardown should be @classmethods setUpClass and tearDownClass HOT 1
- etl_config.json not loaded in EMR HOT 1
- Wrong variables in example HOT 1
- Issue while executing the code via pycharm
- ModuleNotFoundError: No module named 'dependencies'
- Pass Parameters to Spark
- Add License HOT 1
- Failed TestCase
- Does not work when running on yarn in client mode HOT 1
- Hard changes in structure of the project and documentation. HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyspark-example-project.