GithubHelp home page GithubHelp logo

wangdan7477 / teleqna Goto Github PK

View Code? Open in Web Editor NEW

This project forked from netop-team/teleqna

0.0 0.0 0.0 3.08 MB

TeleQnA: A Benchmark dataset for evaluating telecommunications knowledge of large language models

License: MIT License

Python 100.00%

teleqna's Introduction

TeleQnA

Introduction

TeleQnA is a comprehensive dataset tailored to assess the knowledge of Large Language Models (LLMs) in the field of telecommunications. It encompasses 10,000 multiple-choice questions distributed across five distinct categories:

  • Lexicon: This category comprises 500 questions that delve into the realm of general telecom terminology and definitions.

  • Research overview: Comprising 2,000 questions, this category provides a broad overview of telecom research, spanning a wide spectrum of telecom-related topics.

  • Research publications: With 4,500 questions, this category contains detailed inquiries regarding multi-disciplanary research in telecommunications, drawing from a variety of sources such as transactions and conferences proceedings.

  • Standards overview: This category consists of 1,000 questions related to summaries of standards from multiple standarization bodies like 3GPP and IEEE.

  • Standards specifications: With 2,000 questions, this category explores the technical specifications and practical implementations of telecommunications systems, leveraging information from standardization bodies like 3GPP and IEEE.

For more in-depth information about the dataset and the generation process, please refer to our paper by following this link. To prevent inadvertent data contamination with models trained using GitHub data, we have implemented a password protection measure for unzipping the dataset. The password is teleqnadataset.

Dataset Format

Each question is represented in JSON format, comprising five distinct fields:

  • Question: This field consists of a string that presents the question associated with a specific concept within the telecommunications domain.

  • Options: This field comprises a set of strings representing the various answer options.

  • Answer: This field contains a string that adheres to the format ’option ID: Answer’ and presents the correct response to the question. A single option is correct; however, options may include choices like “All of the Above” or “Both options 1 and 2”.

  • Explanation: This field encompasses a string that clarifies the reasoning behind the correct answer.

  • Category: This field includes a label identifying the source category (e.g., lexicon, research overview, etc.).

Dataset Instance

An example of the dataset is provided below:

question 2045: {
		"question": "What is the maximum number of eigenmodes that the MIMO channel can support? 
                (nt is the number of transmit antennas, nr is the number of receive antennas)",
		"option 1": "nt",
		"option 2": "nr",
		"option 3": "min(nt, nr)",
		"option 4": "max(nt, nr)",
		"answer": "option 3: min(nt, nr)",
		"explanation": "The maximum number of eigenmodes that the MIMO channel can support 
		is min(nt, nr).",
		"category": "Research publications"
                } 

Experiments Code

The provided code allows to evaluate the performance of OpenAI's models (e.g., GPT-3.5). To do so, follow the below steps:

  • Clone the repository
  • Unzip TeleQnA.zip using the password teleqnadataset
  • Install the required dependencies using the following command:

pip install -r requirements.txt

  • Insert OpenAI's API key into the evaluation_tools script.
  • Run the command below

python run.py

Upon completion, a .txt file in JSON format is generated. This file contains the original dataset, with two additional fields added to each question:

  • tested answer: This field contains the answer chosen by the tested model.

  • correct: This field is marked as "True" when the tested answer matches the designated correct answer in the dataset.

Citation

If you would like to use the data or code, please cite the paper:

@misc{maatouk2023teleqna,
      title={TeleQnA: A Benchmark Dataset to Assess Large Language Models Telecommunications Knowledge}, 
      author={Ali Maatouk and Fadhel Ayed and Nicola Piovesan and Antonio De Domenico and Merouane Debbah and Zhi-Quan Luo},
      year={2023},
      eprint={2310.15051},
      archivePrefix={arXiv},
      primaryClass={cs.IT}
     }

teleqna's People

Contributors

netop-team avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.