Bug
Expected behaviour
Current behaviour
import wget
fails with:
ModuleNotFoundError: No module named 'wget'
Steps to reproduce
- Step 1
git clone (this repo)
docker-compose up
- Step 2
open JupyterLab at localhost:8888
- Step 3
Follow instructions:
In [1]:
from pyspark.sql import SparkSession
spark = SparkSession.\
builder.\
appName("pyspark-notebook").\
master("spark://spark-master:7077").\
config("spark.executor.memory", "512m").\
getOrCreate()
Learn and practice Apache Spark using PySpark
In [2]:
import wget
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
wget.download(url)
import wget
fails with
ModuleNotFoundError: No module named 'wget'
Possible solutions (optional)
- add
apt install pip3
to base Dockerfile.
- I tried the above, but am getting errors pulling down the scala deb from https://www.lightbend.com/
Which brings me to another question...why are you using a bespoke scala image stuffed on some random server? Attempts to rebuild the docker/base/Dockerfile are failing b/c (I think) the scala deb is no longer there:
Processing triggers for libc-bin (2.28-10) ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 243 0 243 0 0 682 0 --:--:-- --:--:-- --:--:-- 680
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Reading package lists...E: Sub-process Popen returned an error code (2)
E: Encountered a section with no Package: header
E: Problem with MergeList /scala.deb
E: The package lists or status file could not be parsed or opened.
The command '/bin/sh -c mkdir -p ${shared_workspace}/data && mkdir -p /usr/share/man/man1 && apt-get update -y && apt-get install -y curl python3 r-base && ln -s /usr/bin/python3 /usr/bin/python && curl https://downloads.lightbend.com/scala/${scala_version}/scala-${scala_version}.deb -k -o scala.deb && apt install -y ./scala.deb && rm -rf scala.deb /var/lib/apt/lists/*' returned a non-zero code: 100
Add some solutions, if any
Comments (optional)
Add some comments, if any
Checklist
Please provide the following:
Client: Docker Engine - Community
Version: 20.10.1
API version: 1.41
Go version: go1.13.15
Git commit: 831ebea
Built: Tue Dec 15 04:34:58 2020
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.1
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: f001486
Built: Tue Dec 15 04:32:52 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc:
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker-compose version 1.25.0, build unknown
docker-py version: 4.1.0
CPython version: 3.8.5
OpenSSL version: OpenSSL 1.1.1f 31 Mar 2020