GithubHelp home page GithubHelp logo

trouvetonstage's Introduction

Project : TrouveTonStage

GitHub contributors


Sommaire :

  1. Description
  2. Installation
  3. Usage
  4. Documentation
  5. Notes

Description

Project carried out as a part of a Data Engineering unit at ESIEE Paris. The goal behind was to collect data and reuse it through databases, and produce value with graphic interpretation or search engine.

This project called "TrouveTonStage" aims to make the search for an internship simpler and more personalized. It's made of few scrapers which first gather information about job offers on differents websites. Once done, these data stored in csv are then put on an ElasticSearch database, and thanks to some Flask, a searchengine is build upon that to provide an efficient way to navigate through all theses job offers according to criterions.

For the technical part, you can look to the readme file in the 'scraper' file which contains all scraper used.

Installation

If you want to use these project, you can first clone the repo: git clone https://github.com/Leralix/TrouveTonStage.git

Scrapig part

The scraping part in inside de 'scraper' folder. There is one requirement about the scraping part if you want to run this, you have to install additional packages contained in "requirements.txt" $ python -m pip install -r requirements.txt

Application part

The application part is inside the 'app' folder. To install correctly the application part, you have to get Docker, because here we use an image of ElasticSearch. You have two choices to run the backend process, either you have a simple image of ElasticSearch and you run the programm locally or you run the docker-compose.

Usage

Getting informations

To execute correctly the project, you can first execute the 'main.py' inside the "scraper" folder that will execute the scripts to gather informations about job offers. Inside the main you can modify the delay between each pages load, the number of pages to scrap, output files and other things. Once the data is gathered, it cleans all these datas and produce an output csv. You can run it with : $ python main.py

Edit: Be careful, the process can take quite a time to run entierly, so make sur you have time or you scrap few pages.

The scraping process is made so that every time the 'main' file is executed, it stores the informations in only one csv that kept incrementing with new datas.

Run the backend

The folder 'app' contains everything to run the backend process. The 'data' folder contains the csv that will be put on ES, so make sure you transfer the csv collected from the scraper to the 'app' folder.

Make sure you launch docker, then go inside the 'app' folder and type: docker-compose up -d

This will execute the docker-compose file. Wait few seconds, its the time it takes to put all datas in csv to ES, then go on your :5000 port to see the Flask page.

Documentation

The documenation was produced with Sphinx and is contained inside the 'docs' folder.

If the link doesn't work well, you can easily go to docs/build/html And open the "index.html" inside, and normally you'll have no problem to see the documentation.

Note

  • The docstring was made using help of the "Mintlify Doc Writer" plugin in PyCharm.
  • The dockerisation of the 'app' can be changed by using a volume instead of keeping everything local

trouvetonstage's People

Contributors

gabe-mgnt avatar leralix avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.