GithubHelp home page GithubHelp logo

quality-attributes / datasets Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 2.0 2.15 MB

Official data sources for the Quality Attributes project

Java 2.11% Python 4.41% Jupyter Notebook 93.48%
nlp quality-attributes study datasets mining-software-repositories requirements-engineering

datasets's Introduction

Databases

Official data sources for the Quality Attributes project to train, test and validate if Non-Functional Requirements related to Quality Attributes can be found on GitHub Issues reports.

Training Set

On December 14th, 2019 the site http://ctp.di.fct.unl.pt/RE2017/pages/submission/data_papers/ was visited to get the PROMISE dataset, included as part of a data challenge.

Sayyad Shirabad, J. and Menzies, T.J. (2005) The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada. Available: http://promise.site.uottawa.ca/SERepository

The non-functional requirements' labels in this dataset (involving 15 different projects) are distributed as follows:

Class Quantity Percentage
Funcional (F) 255 40.80%
Availability (A) 21 3.36%
Fault Tolerance (FT) 10 1.60%
Legal (L) 13 2.08%
Look & Feel (LF) 38 6.08%
Maintainability (MN) 17 2.72%
Operational (O) 62 9.92%
Performance (PE) 54 8.64%
Portability (PO) 1 0.16%
Scalability (SC) 21 3.36%
Security (SE) 66 10.56%
Usability (US) 67 10.72%
Total 625 100%

For the purposes of this study, only a subset of this dataset was considered, as part of the quality attributes categories and due to imbalanced classes:

Class Quantity Percentage
Availability (A) 21 8.20%
Fault Tolerance (FT) 10 3.91%
Maintainability (MN) 17 6.64%
Performance (PE) 54 21.09%
Scalability (SC) 21 8.21%
Security (SE) 66 25.78%
Usability (US) 67 26.17%
Total 256 100%

Test Set

Based upon the book:

Miller, Roxanne E., 2009, The Quest for Software Requirements, MavenMark Books, Milwaukee, WI

40 different non-functional requirements associated to quality attributes where collected. From the following categories (matching the ones included in the training).

  • Access Security
  • Availability
  • Usability
  • Maintainability
  • Scalability

Validation Set

According to the State of the Octoverse in 2019, the most contributed open source project at GitHub were as follows:

Place Repository Contributors
01 microsoft/vscode 19.1k
02 MicrosoftDocs/azure-docs 14k
03 flutter/flutter 13k
04 firstcontributions/first-contributions 11.6k
05 tensorflow/tensorflow 9.9k
06 facebook/react-native 9.1k
07 kubernetes/kubernetes 6.9k
08 DefinitelyTyped/DefinitelyTyped 6.9k
09 ansible/ansible 6.8k
10 home-assistant/home-assistant 6.3k

The repositories selected describe different software systems, excluding documentations and projects with the same scope (i.e. flutter and react-native. Data collected using quality-attributes/issue-collector for the following repositories:

  1. microsoft/vscode
  2. flutter/flutter
  3. tensorflow/tensorflow
  4. kubernetes/kubernetes
  5. ansible/ansible

Note: Only the latest 100 issues (as of 02/20/2020) for each repository were collected, due to GitHub's API v4 limitations

datasets's People

Contributors

dependabot[bot] avatar manolomon avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.