GithubHelp home page GithubHelp logo

Comments (7)

pwrose avatar pwrose commented on May 19, 2024

The code works with pyspark 2.3.1 if you don't specify the --py-files dependencies.zip option. I'm not sure why it works with pyspark 2.2.1. It may have something to do with the directory structure or files in the dependencies directory.

from pyspark-example-project.

pwrose avatar pwrose commented on May 19, 2024

Ok, I found the issue. There is a py4j version in the dependencies.zip file. If you remove the py4j directory out of dependencies.zip, then it works with pyspark 2.3.1.

Also, note that the content in dependencies directory doesn't match dependencies.zip.

from pyspark-example-project.

alexlusher avatar alexlusher commented on May 19, 2024

Good morning, Peter.

Many thanks for your trying to help me. I followed on your suggestion and completely removed py4j directory from the archive. Now, I am facing the error <'module' object has no attribute 'Logger'> (see below). Any ideas on what might cause this?

(py2715env) al$ spark-submit --master local[*] --files etl_config.json etl_job.py
2018-08-27 10:21:27 WARN Utils:66 - Your hostname, al resolves to a loopback address: 127.0.0.1; using 172.21.80.16 instead (on interface en0)
2018-08-27 10:21:27 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-27 10:21:28 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/Users/al/Desktop/dev/Spark/etl/etl_job.py", line 41, in
from pyspark import SparkFiles
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/init.py", line 46, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 31, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 221, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 885, in CloudPickler
AttributeError: 'module' object has no attribute 'Logger'
2018-08-27 10:21:28 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-27 10:21:28 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/2k/1p2m03494rz7b16rp96zwf8m_86r_h/T/spark-f3a68ca0-d982-4390-b24a-30cbaa674ba4

from pyspark-example-project.

pwrose avatar pwrose commented on May 19, 2024

from pyspark-example-project.

alexlusher avatar alexlusher commented on May 19, 2024

Thanks, Peter.

I did what you suggested, but it is still there (see below). What else should I check?

(py2715env) al$ spark-submit --master local[*] --py-files dependencies.zip --files etl_config.json etl_job.py
2018-08-27 12:41:04 WARN Utils:66 - Your hostname, al resolves to a loopback address: 127.0.0.1; using 172.21.80.16 instead (on interface en0)
2018-08-27 12:41:04 WARN Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2018-08-27 12:41:04 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
File "/Users/al/Desktop/dev/Spark/etl/etl_job.py", line 41, in
from pyspark import SparkFiles
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/init.py", line 46, in

File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/context.py", line 31, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 221, in
File "/usr/local/Cellar/apache-spark/2.3.1/libexec/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 885, in CloudPickler
AttributeError: 'module' object has no attribute 'Logger'
2018-08-27 12:41:05 INFO ShutdownHookManager:54 - Shutdown hook called
2018-08-27 12:41:05 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/2k/1p2m03494rz7b16rp96zwf8m_86r_h/T/spark-85c1f328-9ddf-4130-bce2-479378ab0312
(py2715env) al$

from pyspark-example-project.

AlexIoannides avatar AlexIoannides commented on May 19, 2024

Hello.

I have replicated the (original) error on my side. I'll try and fix it today or later this week.

from pyspark-example-project.

AlexIoannides avatar AlexIoannides commented on May 19, 2024

Okay, delete dependencies.zip and then re-build it on your system using,

 ./build_dependencies.sh dependencies venv

Assuming you've named the folders in the same way that I have. It should then work okay. I will probably remove dependencies.zip from the repo as this really ought to be built locally and not source controlled.

from pyspark-example-project.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.