GithubHelp home page GithubHelp logo

oc_project02's Introduction

OC_project02

Utilisez les bases de Python pour l'analyse de marché

Présentation

Le but de ce projet est d'écrire un script Python cappable de récuperer les données de tous les livres présents sur http://books.toscrape.com/, et de les regrouper par catégories dans des fichiers CSV.

Usage

Tout d'abbord, il faut s'assurer que Python 3 et PIP sont installés et de préférence à jour sur votre machine. Il faudra ensuite installer les dépendences necessaires au script:

pip install -r requirements.txt

La commande permantant de lancer le scrapping est la suivante:

python main.py

Une fois le script lancé, celui-ci créera un dossier exports dans lequel seront placées les données récuppérées, classés par catégorie.

oc_project02's People

Contributors

malloc0x3cc avatar

Watchers

 avatar

oc_project02's Issues

Performance issue

Scraping 11 books from the travel_2 category took 14.203393459320068 seconds.
There is a need to optimize the scraping as much as possible.
Multi-threading should do the trick.

Multithreading: DONE
Async: Pending

Support for multiple pages

Only 20 books are shown on a single page, so the script should check if there is a button to view more books on another page.

Export bug

Multithreading of each book fucks up the output.
There are probably too many occurences at the time

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.