GithubHelp home page GithubHelp logo

jobpilot's Introduction

Stars

Pre-commit checks

Open issues Open pull requests

License Latest release

PyPI release

jobpilot

jobpilot is a straightforward job scraping library.

Current limitations

Currently, jobpilot only supports LinkedIn. However, integration with other platforms, such as Indeed, is in the pipeline.

Installation

You can install jobpilot from PyPI using popular Python package managers like poetry, pip, pipx, and others.

Usage

A basic usage script is available in the examples directory.

Development

If you're interested in contributing to jobpilot, start by cloning the repository:

git clone https://github.com/fabifont/jobpilot.git
cd jobpilot

Once cloned, install the development dependencies with:

poetry install

This project maintains code quality and consistency using tools like pre-commit, ruff, black, and pyright. After cloning and installing the necessary dependencies, set up the pre-commit hooks:

pre-commit install

The pre-commit configuration checks for various common issues. These hooks run automatically with every commit. If a hook detects an issue it cannot fix automatically, it will abort the commit, offering an explanation of the problem.

To manually run pre-commit without committing, for instance, to validate your code before committing, use:

pre-commit run --all-files

All contributors must adhere to these checks. Submitted code in pull requests must pass all Continuous Integration (CI) workflows. GitHub workflows will verify this automatically. Pull requests with failing checks will not be merged.

jobpilot follows the Semantic Versioning 2.0.0 standard.

Contributing

Contributions to jobpilot are very welcome! Whether you're reporting bugs, enhancing documentation, or improving the code, every contribution helps.

Branching and commit naming conventions

To ensure consistency and clarity across the codebase, jobpilot follows a simplified naming convention for branches and commits. The naming conventions are based on the concepts shared in this article.

To assist with crafting conformant commit messages, jobpilot includes commitizen in its development dependencies. After staging your changes, you can utilize:

cz commit

commitizen will guide you in generating a structured and consistent commit message, adhering to our project's conventions.

License

jobpilot is licensed under the GPL-3.0. For full details, see the LICENSE file.

jobpilot's People

Contributors

fabifont avatar

Stargazers

Ali Naqvi avatar Luca avatar Lorenzo La Corte avatar Ali Haider avatar

Watchers

 avatar

jobpilot's Issues

Logo

Create a logo for this project.

Bad EmploymentType bindings

Some values in the EmploymentType enum are represented as strings rather than tuples. This could either raise a ValueError when trying to resolve an alias for that entry or result in an incorrect value when converting it to a string.

Add docstrings

Add comprehensive docstrings adhering to the Google style.

Bad criteria parsing

Sometimes, not all the criteria are available. This can lead to an IndexError when parsing them.

Better strategy to stop concurrent tasks

Probably it's better to start requests by chunks instead of stating them all together. That can be useful to stop earlier when 0 results are found.

Current behavior: if limit is 1000 for example, 1000/25 tasks are started concurrently and those will run till the end if they are scheduled in different order even if 0 results are found (in a bad scenario).

Another approach can be stopping all the tasks with start greater than the start of the task that found 0 jobs.

Optimize concurrent fetching for LinkedIn's job endpoints

LinkedIn has distinct endpoints for retrieving the list of jobs and for fetching specific job details. Each endpoint comes with its own rate limiter.

Given that, it is possible to optimize the data retrieval process by implementing a concurrent fetching mechanism that simultaneously retrieves job listings from one endpoint and individual job details from the other.

This would make full use of the available rate limits for both endpoints, maximizing efficiency and minimizing retrieval time.

Handle missing city or region

Theoretically, city should always be present. However, it's better to handle cases where it might be missing, as well as cases where the region is absent.

If the country is missing, the default value should be Country.WORLDWIDE.

Rust

Is not written in Rust, big issue

Add tests and coverage report

Write tests to be used with pytest and define a Github workflow to automatically run them and generate the coverage report.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.