GithubHelp home page GithubHelp logo

jovianhq / opendatasets Goto Github PK

View Code? Open in Web Editor NEW
321.0 321.0 139.0 26.56 MB

A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.

License: MIT License

Python 99.20% Shell 0.80%
data-science datasets machine-learning python

opendatasets's People

Contributors

aakashns avatar birajcoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opendatasets's Issues

Downloading datasets behind network Proxies fail due to timeout errors

For users behind network proxies, the following example in the main README.md fails due to timeout errors:

$ python
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key: 
2024-01-12 06:45:08,854 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f1a5408e490>: Failed to establish a new connection: [Errno 110] Connection timed out')': /api/v1/datasets/download/tunguz/us-elections-dataset?datasetVersionNumber=None

However if KAGGLE_PROXY environment variable is properly set, the example works for users behind network proxy as well:

Here's the code snippet that makes this work:

import os
if 'https_proxy' in os.environ.keys():
    os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
elif 'HTTPS_PROXY' in os.environ.keys():
    os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
else:
    os.environ['KAGGLE_PROXY'] = ''

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download(dataset_url)

and here's the sample run behind network proxy:

python
Python 3.9.18 (main, Sep 11 2023, 13:41:44) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> if 'https_proxy' in os.environ.keys():
...     os.environ['KAGGLE_PROXY'] = os.environ['https_proxy']
... elif 'HTTPS_PROXY' in os.environ.keys():
...     os.environ['KAGGLE_PROXY'] = os.environ['HTTPS_PROXY']
... else:
...     os.environ['KAGGLE_PROXY'] = ''
... 
>>> import opendatasets as od
>>> dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
>>> od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: ****
Your Kaggle Key: 
Downloading us-elections-dataset.zip to ./us-elections-dataset
  0%|                                                                        | 0.00/133k [00:00<?, ?B/s]
100%|████████████████████████████████████████████████████████████████| 133k/133k [00:00<00:00, 6.49MB/s]

I was planning to submit a PR to fix the issue but I see that the last time this repo updated was over 2 years ago.

Clean up minimal example of README

You can clean up the example from the README from:

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')

to:

import opendatasets as od
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')

or:

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download(dataset_url)

Kaggle JSON (in different place)

aakashns

This is a useful repository. Thank you for sharing.

FYI: Digging into the code, the method "read_kaggle_creds()" looks for the kaggle.json at the current directory, "./kaggle.json", but most of the time, the kaggle.json is stored at "~/.kaggle/kaggle.json"

Thanks.

Return the output path

Return an object, that contains the data target directory and datafiles.
I would use it to load the data into the notebook.

ApiException: (404) Reason: Not Found

I am using Google Colab where I am downloading this dataset : https://www.kaggle.com/datasets/hereisburak/pins-face-recognition
using opendatasets library

I've uploaded my kaggle.json into the files section of Google Colab as well but when I am doing od.download(url) I am getting following error :

image

I always do this same process but this time it is not working
So what could be the reason and how to resolve it ?

versions :
Python : 3.7.13
opendatasets : 0.1.20

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.