Welcome to the Green Job Titles Project repository! This project involves identifying and classifying as "green" job titles from a wide array of scientific literature using advanced Natural Language Processing (NLP) tools. Below is an overview of the key components and data files included in this repository.
The primary goal of this project is to update and expand the existing job classification systems like O*NET to include new, emerging green roles influenced by technological advancements, regulatory changes, and increasing environmental awareness.
Files:
- data/Green Job Titles Lists.xlsx: This Excel file contains comprehensive lists of identified green job titles derived from our analysis. Each entry includes the job title, source references, the sectors these jobs are associated with, and the countries covered. This file also includes details of the quality checks conducted and raw results from the NLP processes.
- data/Green Job Titles Robustness Checks.xlsx: This file contains the results from our robustness checks, focusing on identifying any false positives or false negatives in our dataset. The aim is to ensure the accuracy and reliability of our job title identification process.
- data/scopus.xlsx: Metadata related to all text sources from Scopus analysed in the project.
- data/wos.xls: Metadata related to all text sources from Web of Science analysed in the project.
- data/Green Knowledge Database.xlsx: This Excel file contains all metadata related to articles from Scopus and WoS along with information on countries covered by authors obtained using LLM.
- tr_robustness_checks.ipynb: This Jupyter notebook presents a list of pseudorandomly selected articles used for robustness checks.
To view the excel files the easiest way is to download the whole repository and open them locally (see: https://docs.github.com/en/get-started/start-your-journey/downloading-files-from-github).