GithubHelp home page GithubHelp logo

tsy0716 / au-nz-jobs Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 92 KB

A package to download, save and analyse jobs in Australian and New Zealand from SEEK.

License: GNU General Public License v3.0

Makefile 6.31% Python 93.69%
data data-analysis jobs jobsearch seek

au-nz-jobs's Introduction

au-nz-jobs

image

image

Documentation Status

A package to download and save jobs in Australian and New Zealand from SEEK.

About

I am a data scientist based in AU/NZ. I found it quite overwhelming to search jobs from SEEK. It needs lots of clicks and time to find the right jobs. Going deeper, if you want to have a better understanding of the trends of the job market, there isn't a handy tool to download the jobs and do some analysis.

This package is to help job seekers/HR guys/companies to batch download jobs from SEEK and save them to local files.

It also provides some basic analysis and visualization tools to help you understand the job market better (roadmap).

Development Status

This package is still in early development stage. Use it at your own risk.

Features

downloader

Sub package to download jobs from SEEK.

  • Search jobs by:

    • multiple keywords in batch
    • multiple locations in batch
    • date range: last n days
    • job type: full-time, part-time, contract, casual
    • sort mode: relevance, date
  • The default search in SEEK will yield too many results(including ads and unrelated jobs)

    • You can define a check_words list to filter out the irrelevant jobs
  • The job details will be further downloaded based on the filtered job

  • Output, a dictionary of DataFrames as below:

    • jobs_wide: a wide formatted DataFrame with one row per job including all downloaded job details.

      • If you want to get a single table containing all the information, this is the one.
    • jobs: similar to jobs_wide, but only the dimension_id columns are kept.

      • This is for those who will work on the jobs data for a relational database. Need to work with other dimension tables.
    • dimension tables:

      • classification: SEEK job classification, generally the industry, e.g. Construction, Engineering, Information & Communication Technology
      • sub_classification: SEEK job sub-classification, more specific than classification, NO father-child relationship to classification e.g. Water & Waste Engineering, Programme & Project Management
      • location: high-level location, e.g. Sydney, Melbourne, Brisbane
      • area: more specific location, e.g. Sydney CBD, Inner West
      • advertiser: the advertiser of the job, can be different from the actual company
      • company_review: only for the jobs which have company reviews

save_jobs

Sub package to save the downloaded jobs to local files.

  • save the downloaded jobs from downloaded DataFrames to local files

  • choose from single table(jobs_wide) or relational database tables (jobs and dimension tables)

  • output as csv, excel, sqlite

    • csv: one csv file per table
    • excel: a single excel file with single sheet for single table, multiple sheets for relational database tables
    • sqlite: a single sqlite file with multiple tables (coming soon)
  • Sqlite is required for further analysis and visualization modules. (coming soon)

  • NO other SQL databases will be supported. Please handle the data by yourself.

analysis (roadmap)

visualization (roadmap)

Installation

pip install au-nz-jobs

NOTE for downloader!

Please CAREFULLY read the following limitations before using this package.

Implicit Steps for downloader

  1. For each keyword and location pair in given date range,the jobs without details will be downloaded first.
  2. The downloaded jobs from step 1 will be then filtered by the check_words list.
  3. Further details of jobs in step 2 will be downloaded.
  4. Jobs data from step 3 will cleaned and restructured to DataFrames.

Limitations

  • This package is based on the api provided by SEEK.
  • The api is not officially supported by SEEK. Any changes to the api will break this package.
  • This package is ONLY for PERSONAL USE. Please do not use it for any commercial purpose.
  • Downloading jobs takes might take a long time. Please be patient.
  • Some suggestions to save you some time:
    • reduce keywords and locations, each pair of keyword and location will be iterated through
      • e.g. A download with 3 keywords and 3 locations will yield 9 searches!!!
    • reduce the date range, 31 days is the maximum, and it can take a long time to download
    • limit the location to city rather than state or country (you can search by state or country anyway)
      • e.g. Sydney rather than NSW or Australia
  • For a single keyword and location pair, no matter of the date range, the maximum number of jobs you can download is 550.

Usage

from au_nz_jobs import Jobs,save_jobs

# define the keywords you want to search in a list
keywords = ['data scientist', 'data engineer']

# define the locations you want to search in a list
locations = ['Sydney', 'Melbourne']

# The default download will yield too many results(including ads and unrelated jobs)
# A check_words list is STRONGLY recommended to filter out the irrelevant jobs
# The check_words list should contain the most related words to the job you want to search
check_words = ['data', 'scientist']

# define the date_range for jobs to be downloaded, 3 means last 3 days
date_range = 3

# initiate the Jobs class
# parameters:
#   keywords: a list of keywords to search
#   locations: a list of locations to search
#   work_type: a list of work types to search, options: ['full-time', 'part-time', 'contract', 'casual'], all by default
data_jobs = Jobs(keywords, locations, work_type=['full-time', 'part-time', 'contract', 'casual'])

# download all dfs
# parameters:
#   date_range: the date range to search, 3 means last 3 days
#   check_words: a list of words to filter out the irrelevant jobs
#   sort_mode: the sort mode for the search, options: ['relevance', 'date'], date by default
df_dict = data_jobs.get_all_dfs(date_range,check_words=check_words)

# save the downloaded jobs to local files
# parameters:
#   format: csv, excel
#   single_table: True for single table, False for relational database tables
#   path: the path to save the files
#   NO need to specify the file name, the file name will be generated automatically
#   jobs.csv, jobs.xlsx, jobs.db for single table
#   [jobs,classification,sub_classification,location,area,advertiser,company_review].csv for relational database tables
save_jobs(df_dict,format='csv',single_table=True,path='data')

Roadmap

  • downloader
  • save_jobs: csv, excel
  • save_jobs: sqlite
  • add documentation to readthedocs
  • add tests
  • tableau public dashboard of data related jobs based on this package
  • analysis and visualization - will break down to smaller tasks

Contributing

If you have any questions or suggestions, please feel free to open an issue or pull request. Other developers are welcome to contribute to this project. Feel free to mail me if you have any questions. Email: [email protected]

License

GPL-3.0

Credits

Credit to job-seeker for the idea.

Credit to seek/au and seek/nz for the api.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.