GithubHelp home page GithubHelp logo

isabella232 / sap-sam Goto Github PK

View Code? Open in Web Editor NEW

This project forked from signavio/sap-sam

0.0 0.0 0.0 9.01 MB

Example source code for SAP Signavio Academic Models (SAP-SAM)

License: Apache License 2.0

Python 0.18% Jupyter Notebook 99.82%

sap-sam's Introduction

SAP Signavio Academic Models (SAP-SAM)

This repository contains the source code for the paper SAP Signavio Academic Models: A Large Process Model Dataset by Diana Sola, Christian Warmuth, Bernhard Schäfer, Peyman Badakhshan, Jana-Rebecca Rehse, and Timotheus Kampik.

Link to the paper: https://arxiv.org/abs/2208.12223 (pre-print)

Link to the dataset: https://zenodo.org/record/7012043

License

The example code in this repository is licensed as follows. Note that a different license applies to the dataset itself!

Copyright (c) 2022 by SAP.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

The following license applies to the SAP-SAM dataset.

Copyright (c) 2022 by SAP.

SAP grants to Recipient a non-exclusive copyright license to the Model Collection to use the Model Collection for Non-Commercial Research purposes of evaluating Recipient’s algorithms or other academic research artefacts against the Model Collection. Any rights not explicitly granted herein are reserved to SAP. For the avoidance of doubt, no rights to make derivative works of the Model Collection is granted and the license granted hereunder is for Non-Commercial Research purposes only.

"Model Collection" shall mean all files in the archive (which are JSON, XML, or other representation of business process models or other models).

"Recipient" means any natural person receiving the Model Collection.

"Non-Commercial Research" means research solely for the advancement of knowledge whether by a university or other learning institution and does not include any commercial or other sales objectives.

Citing SAP-SAM

@misc{SAP-SAM-paper,
  doi = {10.48550/ARXIV.2208.12223},
  url = {https://arxiv.org/abs/2208.12223},
  author = {Sola, Diana and Warmuth, Christian and Schäfer, Bernhard and Badakhshan, Peyman and Rehse, Jana-Rebecca and Kampik, Timotheus},
  keywords = {Other Computer Science (cs.OH), Software Engineering (cs.SE), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {SAP Signavio Academic Models: A Large Process Model Dataset},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

or

@dataset{SAP-SAM-dataset,
  author       = {Kampik, Timotheus and Warmuth, Christian and Sola, Diana and Schäfer, Bernhard and Axworthy, Liz and Ivarsson, Erica and
                  Ouda, Karim and Eickhoff, David},
  title        = {SAP Signavio Academic Models},
  month        = aug,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {0.5.1},
  doi          = {10.5281/zenodo.6964944},
  url          = {https://doi.org/10.5281/zenodo.6964944}
}

Setup

You need to download the [dataset](insert link) and place it into the folder ./data/raw such that the models are in ./data/raw/sap_sam_2022/models.

We provide two conda environment.yml files that can be used to create a new environment and install the required dependencies:

  • environment.yml: contains the abstract dependencies (pandas, numpy, ...).
  • environment-lock.yml: contains versions for all dependencies and the transitive dependencies to ensure reproducible results.

You can use the following conda command to create the environment:

conda env create -f environment.yml  

or

conda env create -f environment-lock.yml  

Getting started

We provide a tutorial Jupyter Notebook that illustrates the dataset format in more detail and shows how to use the csv parsers developed in ./src.

The properties Jupyter Notebook gives an overview of selected properties of the dataset.

Dataset Format

The dataset contains 103 csv files with a rough size of 38 GB of process models (see modeling notations of the models below).

CSV Format

  1. csv columns:
    • Revision ID: Unique identifier for model revision
    • Model ID: Unique identifier for model
    • Organization ID: Unique identifier for organization this model originates from
    • Datetime: Date and time of creation
    • Model JSON: JSON containing model information
    • Description: Description of model (typically empty)
    • Name: Model name
    • Type: Model type (duplicate and less specific than namespace)
    • Namespace: Stencilset/modeling notation (e.g. BPMN, DMN, UML,...)
  2. Number of models: 1,021,471
  3. Number of models by modeling notation:
Modeling notation Frequency
BPMN 2.0 618,807
Value Chain 194,078
DMN 1.0 98,286
EPC 32,369
BPMN 1.0 15,643
UML 2.2 Class 14,953
Petri Net 11,207
ArchiMate 2.1 10,956
UML Use Case 10,228
Organigram 4,568
BPMN 2.0 Choreography 4,096
BPMN 2.0 Conversation 2,788
FMC Block Diagram 1,398
CMMN 1.0 999
CPN 385
Journey Map 287
YAWL 2.2 238
Process Documentation Template 86
jBPM 4 76
XForms 20
Chen Notation 3

Dummy Data

In order to remove personal first and last names, emails or in some cases matriculations numbers (which users have added in non-compliance with the T&Cs), we have applied a simple replacement script. In particular, we have replaced - to the extent possible - emails, names, and (matriculation) numbers with the following dummy values:

Context Dummy
Email Dummy [email protected]
Name Dummy Jane Doe
Matriculation/Number Dummy 12345678

Project Organization

├── data
│   ├── interim           <- Intermediate data that has been transformed.
│   └── raw               <- The raw dataset should be placed in this folder.
├── notebooks             <- Jupyter notebooks.
├── reports            
│   └── figures           <- Generated graphics and figures used in the paper.
├── src               
│   └── sapsam            <- Source code and dictionaries for use in this project.
├── LICENSE               <- License that applies to the example code in this repository.
├── README.md             <- The top-level README for developers using this project.
├── environment-lock.yml  <- Contains versions for all dependencies and the transitive dependencies to ensure reproducible results.
├── environment.yml       <- Contains the abstract dependencies (pandas, numpy, ...).
└── setup.py              <- Makes project pip installable (pip install -e .) such that src can be imported.

sap-sam's People

Contributors

sap-dianasola avatar timkam avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.