GithubHelp home page GithubHelp logo

code-challenge-2020's People

Contributors

haikane avatar kayibal avatar lusob avatar michcio1234 avatar pedrocwb avatar pipatth avatar sienkowski avatar superlaza avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

code-challenge-2020's Issues

Cannot download data with docker-compose

Dear all,

when running "sudo docker-compose up orchestrator", I get the output posted below but no file appears in
/usr/share/data/raw/ (in fact there is no data/ directory in /usr/share). There is not file in data_root/raw either.

More info:
OS: Linux Mint 19.3 Tricia (based on Ubuntu Bionic)
Docker version 19.03.12, build 48a66213fe
docker-compose version 1.27.1, build 509cfb99

Using "docker network ls", I can see the network named "code-challenge-2020_default".

--
Here is the command output

$ sudo docker-compose up orchestrator
WARNING: The PWD variable is not set. Defaulting to a blank string.
Creating network "code-challenge-2020_default" with the default driver
Creating code-challenge-2020_dask-scheduler_1 ... done
Creating code-challenge-2020_luigid_1 ... done
Creating code-challenge-2020_orchestrator_1 ... done
Attaching to code-challenge-2020_orchestrator_1
orchestrator_1 | DEBUG: Checking if DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv) is complete
orchestrator_1 | WARNING: Failed connecting to remote scheduler 'http://luigid:8082'
orchestrator_1 | Traceback (most recent call last):
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
orchestrator_1 | (self._dns_host, self.port), self.timeout, **extra_kw
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
orchestrator_1 | raise err
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
orchestrator_1 | sock.connect(sa)
orchestrator_1 | ConnectionRefusedError: [Errno 111] Connection refused
orchestrator_1 |
orchestrator_1 | During handling of the above exception, another exception occurred:
orchestrator_1 |
orchestrator_1 | Traceback (most recent call last):
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 677, in urlopen
orchestrator_1 | chunked=chunked,
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 392, in _make_request
orchestrator_1 | conn.request(method, url, **httplib_request_kw)
orchestrator_1 | File "/usr/local/lib/python3.6/http/client.py", line 1287, in request
orchestrator_1 | self._send_request(method, url, body, headers, encode_chunked)
orchestrator_1 | File "/usr/local/lib/python3.6/http/client.py", line 1333, in _send_request
orchestrator_1 | self.endheaders(body, encode_chunked=encode_chunked)
orchestrator_1 | File "/usr/local/lib/python3.6/http/client.py", line 1282, in endheaders
orchestrator_1 | self._send_output(message_body, encode_chunked=encode_chunked)
orchestrator_1 | File "/usr/local/lib/python3.6/http/client.py", line 1042, in _send_output
orchestrator_1 | self.send(msg)
orchestrator_1 | File "/usr/local/lib/python3.6/http/client.py", line 980, in send
orchestrator_1 | self.connect()
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 187, in connect
orchestrator_1 | conn = self._new_conn()
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 172, in _new_conn
orchestrator_1 | self, "Failed to establish a new connection: %s" % e
orchestrator_1 | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f1dbb74c320>: Failed to establish a new connection: [Errno 111] Connection refused
orchestrator_1 |
orchestrator_1 | During handling of the above exception, another exception occurred:
orchestrator_1 |
orchestrator_1 | Traceback (most recent call last):
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
orchestrator_1 | timeout=timeout
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 727, in urlopen
orchestrator_1 | method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 439, in increment
orchestrator_1 | raise MaxRetryError(_pool, url, error or ResponseError(cause))
orchestrator_1 | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='luigid', port=8082): Max retries exceeded with url: /api/add_task (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1dbb74c320>: Failed to establish a new connection: [Errno 111] Connection refused',))
orchestrator_1 |
orchestrator_1 | During handling of the above exception, another exception occurred:
orchestrator_1 |
orchestrator_1 | Traceback (most recent call last):
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/luigi/rpc.py", line 163, in _fetch
orchestrator_1 | response = self._fetcher.fetch(full_url, body, self._connect_timeout)
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/luigi/rpc.py", line 116, in fetch
orchestrator_1 | resp = self.session.post(full_url, data=body, timeout=timeout)
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 578, in post
orchestrator_1 | return self.request('POST', url, data=data, json=json, **kwargs)
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
orchestrator_1 | resp = self.send(prep, **send_kwargs)
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
orchestrator_1 | r = adapter.send(request, **kwargs)
orchestrator_1 | File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
orchestrator_1 | raise ConnectionError(e, request=request)
orchestrator_1 | requests.exceptions.ConnectionError: HTTPConnectionPool(host='luigid', port=8082): Max retries exceeded with url: /api/add_task (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f1dbb74c320>: Failed to establish a new connection: [Errno 111] Connection refused',))
orchestrator_1 | INFO: Retrying attempt 2 of 3 (max)
orchestrator_1 | INFO: Wait for 30 seconds
orchestrator_1 | INFO: Informed scheduler that task DownloadData_wine_dataset_False__usr_share_data__79bc385f2e has status PENDING
orchestrator_1 | INFO: Done scheduling tasks
orchestrator_1 | INFO: Running Worker with 1 processes
orchestrator_1 | DEBUG: Asking scheduler for work...
orchestrator_1 | DEBUG: Pending tasks: 1
orchestrator_1 | INFO: [pid 1] Worker Worker(salt=932162234, workers=1, host=793338ce4678, username=root, pid=1) running DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1 | INFO: INFO:download-data:Downloading dataset
orchestrator_1 | INFO: INFO:download-data:Will write to /usr/share/data/raw/wine_dataset.csv
orchestrator_1 | INFO: [pid 1] Worker Worker(salt=932162234, workers=1, host=793338ce4678, username=root, pid=1) done DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1 | DEBUG: 1 running tasks, waiting for next task to finish
orchestrator_1 | INFO: Informed scheduler that task DownloadData_wine_dataset_False__usr_share_data__79bc385f2e has status DONE
orchestrator_1 | DEBUG: Asking scheduler for work...
orchestrator_1 | DEBUG: Done
orchestrator_1 | DEBUG: There are no more tasks to run at this time
orchestrator_1 | INFO: Worker Worker(salt=932162234, workers=1, host=793338ce4678, username=root, pid=1) was stopped. Shutting down Keep-Alive thread
orchestrator_1 | INFO:
orchestrator_1 | ===== Luigi Execution Summary =====
orchestrator_1 |
orchestrator_1 | Scheduled 1 tasks of which:
orchestrator_1 | * 1 ran successfully:
orchestrator_1 | - 1 DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1 |
orchestrator_1 | This progress looks :) because there were no failed tasks or missing dependencies
orchestrator_1 |
orchestrator_1 | ===== Luigi Execution Summary =====
orchestrator_1 |
code-challenge-2020_orchestrator_1 exited with code 0

Thank you

luigi: error: unrecognized arguments

In the file orchestrator/task.py there's a class (task) called "MakeDataset". This class gets luigi arguments by using underscore, for example, out_dir = luigi.Parameter(). It throws an exception when someone tries to run the task by using:
docker-compose run orchestrator luigi --module task MakeDataset --out_dir mydir --scheduler-host luigid

I've found the problem and solution here:
spotify/luigi#1728

Partial Submission .zip Year Change

Very minor, but the naming schema instructions still say to name it cc19_<first_name>_<last_name>.zip instead of cc20_<first_name>_<last_name>.zip.

Error Downloading dataset

In order the build the images and download the data, I executed ./build-task-images.sh 0.1, then I executed docker-compose up orchestrator, but i got this errors:

WARNING: The PWD variable is not set. Defaulting to a blank string.
Creating code-challenge-2019_luigid_1         ... done
Creating code-challenge-2019_dask-scheduler_1 ... done
Recreating code-challenge-2019_orchestrator_1 ... done
Attaching to code-challenge-2019_orchestrator_1
orchestrator_1    | DEBUG: Checking if DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv) is complete
orchestrator_1    | INFO: Informed scheduler that task   DownloadData_wine_dataset_False__usr_share_data__79bc385f2e   has status   PENDING
orchestrator_1    | INFO: Done scheduling tasks
orchestrator_1    | INFO: Running Worker with 1 processes
orchestrator_1    | DEBUG: Asking scheduler for work...
orchestrator_1    | DEBUG: Pending tasks: 1
orchestrator_1    | INFO: [pid 1] Worker Worker(salt=005178342, workers=1, host=49f018198416, username=root, pid=1) running   DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1    | ERROR: [pid 1] Worker Worker(salt=005178342, workers=1, host=49f018198416, username=root, pid=1) failed    DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1    | Traceback (most recent call last):
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status
orchestrator_1    |     response.raise_for_status()
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
orchestrator_1    |     raise HTTPError(http_error_msg, response=self)
orchestrator_1    | requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http+docker://localhost/v1.35/containers/5ddaa92a0628f808540bcc84316fcb811524fcc25d238cc199a0e707adb5989d/start
orchestrator_1    | 
orchestrator_1    | During handling of the above exception, another exception occurred:
orchestrator_1    | 
orchestrator_1    | Traceback (most recent call last):
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/luigi/worker.py", line 199, in run
orchestrator_1    |     new_deps = self._run_get_new_deps()
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/luigi/worker.py", line 141, in _run_get_new_deps
orchestrator_1    |     task_gen = self.task.run()
orchestrator_1    |   File "/opt/orchestrator/util.py", line 352, in run
orchestrator_1    |     self._run_and_track_task()
orchestrator_1    |   File "/opt/orchestrator/util.py", line 364, in _run_and_track_task
orchestrator_1    |     self.configuration,
orchestrator_1    |   File "/opt/orchestrator/util.py", line 195, in run_container
orchestrator_1    |     raise e
orchestrator_1    |   File "/opt/orchestrator/util.py", line 185, in run_container
orchestrator_1    |     **configuration)
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 809, in run
orchestrator_1    |     container.start()
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/models/containers.py", line 400, in start
orchestrator_1    |     return self.client.api.start(self.id, **kwargs)
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped
orchestrator_1    |     return f(self, resource_id, *args, **kwargs)
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/api/container.py", line 1095, in start
orchestrator_1    |     self._raise_for_status(res)
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status
orchestrator_1    |     raise create_api_error_from_http_exception(e)
orchestrator_1    |   File "/usr/local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
orchestrator_1    |     raise cls(e, response=response, explanation=explanation)
orchestrator_1    | docker.errors.NotFound: 404 Client Error: Not Found ("network code_challenge_default not found")
orchestrator_1    | DEBUG: 1 running tasks, waiting for next task to finish
orchestrator_1    | INFO: Informed scheduler that task   DownloadData_wine_dataset_False__usr_share_data__79bc385f2e   has status   FAILED
orchestrator_1    | DEBUG: Asking scheduler for work...
orchestrator_1    | DEBUG: Done
orchestrator_1    | DEBUG: There are no more tasks to run at this time
orchestrator_1    | DEBUG: There are 1 pending tasks possibly being run by other workers
orchestrator_1    | DEBUG: There are 1 pending tasks unique to this worker
orchestrator_1    | DEBUG: There are 1 pending tasks last scheduled by this worker
orchestrator_1    | INFO: Worker Worker(salt=005178342, workers=1, host=49f018198416, username=root, pid=1) was stopped. Shutting down Keep-Alive thread
orchestrator_1    | INFO: 
orchestrator_1    | ===== Luigi Execution Summary =====
orchestrator_1    | 
orchestrator_1    | Scheduled 1 tasks of which:
orchestrator_1    | * 1 failed:
orchestrator_1    |     - 1 DownloadData(no_remove_finished=False, fname=wine_dataset, out_dir=/usr/share/data/raw/, url=https://github.com/datarevenue-berlin/code-challenge-2019/releases/download/0.1.0/dataset_sampled.csv)
orchestrator_1    | 
orchestrator_1    | This progress looks :( because there were failed tasks
orchestrator_1    | 
orchestrator_1    | ===== Luigi Execution Summary =====
orchestrator_1    | 
code-challenge-2019_orchestrator_1 exited with code 0 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.