GithubHelp home page GithubHelp logo

ega-se-user-docs's Introduction

EGA-SE-user-docs

This repository contains user information on how to submit files to the swedish Sensitive Data Archive (SDA) service. This document serves as an outline of the submission process, and links to more detailed guides on how to perform the individual steps for creating a submission.

Preparing the data

There are currently no restrictions on what kinds of files are allowed to upload to the SDA. Our recommendation is to still spend some time to see if there are file types intended for archiving available for your data, and use those. For example - for alignment data, use the cram format instead of sam/bam.

Preparing the system

Submitting data requires two tools, crypt4gh for encrypting the data, and s3cmd for uploading to the SDA.

To prepare the system follow the instructions here. This is a required step whether you work on bianca or any other system.

Note If you are using Bianca, see the Encrypting on Bianca instructions which cover transferring the crypt4gh tool and the public key to bianca.

Encrypting

The first step of a submission is to encrypt the submission data files with the crypt4gh encryption tool. This is fairly straight forward in the general case, but needs a few extra steps on Bianca as the system isn't directly connected to the internet. This process is detailed in the Encrypting on Bianca document, which covers the following steps:

Outside of Bianca, this can be limited to these two or three steps:

  • Download the crypt4gh public key

    For the SDA to be able to understand the encrypted files when they are uploaded, they need to be encrypted with the correct public key. This key can be downloaded from this repository with this command if you are uploading to the SDA:

    wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_key.pub

    or this command if you are uploading to Big Picture:

    wget https://raw.githubusercontent.com/NBISweden/EGA-SE-user-docs/main/crypt4gh_bp_key.pub
  • (Optional) Create a personal key pair

    The creation of a key pair is very simple using crypt4gh. Run the following command, replacing [my-key] with the name of the key (can be anything) and specifying a passphrase when requested:

    crypt4gh generate --name=[my-key]

    To verify that the key pair was created, run the ls command and make sure the keys you specified exist in the folder.

  • Encrypt the files

    Now that you have the public key, and the tools you need, you can encrypt the submission files. An encryption key will be created automatically by the tool, but if you prefer to use a specific key, you can provide one using the -s argument.

    ./crypt4gh encrypt -p [crypt4gh-public-key.pub] -f [my-file] [-s [my-key].sec.pem]

    where crypt4gh-public-key.pub is the crypt4gh public key you downloaded in the previous step.

Submitting

Once your files are encrypted, you are almost ready to start submitting to the SDA. There is just one more thing that is needed; checksums for the files.

Get checksums of the files

Once the files are uploaded, they need to be validated, which will require you to enter md5 and sha256 checksums for each file. To prepare for this later step, it's advised to create checksums for all files as you upload them.

Check our short guide on Calculating checksums of files for details.

Get the configuration file

The s3cmd tool requires a configuration file with the relevant settings. You can get the configuration file by logging in with your Elixir ID here for SDA and here for Big Picture.

If you choose not to use the downloaded configuration file, we recommend setting the multipart chunk size significantly higher than the default 5 Mbyte. It can be set up to 2 Gbytes but values above 100 Mbyte probably do very little to improve throughput.

NOTE: The following section requires the usage of [username] when uploading data. The username refers to the value of the secret_key in the downloaded configuration file. Make sure to get it from the configuration file and use it in all s3cmd commands.

Upload the file(s)

S3 allows for optional creation of folders. Folder creation is automatic when adding a directory name to a file upload.

For example, if file1.c4gh should be stored under the experiment1 folder, the command would look like:

./s3cmd-2.2.0/s3cmd -v -c /path/s3-config put [my-path-to-file1.c4gh] "s3://[username]/experiment1/file1.c4gh"

while in case no folder needs to be created, the command would look like:

./s3cmd-2.2.0/s3cmd -v -c /path/s3-config put [my-path-to-file1.c4gh] "s3://[username]/file1.c4gh"

Once the upload is finished, make sure the file was uploaded, by running the following command:

./s3cmd-2.2.0/s3cmd -v -c [my-path-to-s3-config] ls "s3://[username]/[my-s3-path]/"

You should be able to see the file and potentially others stored in the same location.

Download Instructions

This sections provides guidelines and intructions on how to download the data files. It contains three steps - Generate Public and Secret Key, Decrypt the data files and Validate the decrypt data files.

Preparing the system

Downloading data files requires crypt4gh for decrypting the data files.

Generate Public and Secret Key

The first step is to generate a keypair using the crypt4gh encryption tool. This can be done using the following command. There are two keys generated - user.sec is the secret key and user.pub is the public key. You must reply to the FEGA-SE helpdesk by sending them the generated public key - user.pub.

crypt4gh-keygen --sk user.sec --pk user.pub

Decrypt the data files

You recieve an email from the FEGA-SE helpdesk that contains a URL. This URL contains links to download the encrypted data files. Download the download_data script from here and use the following command.

./download_data.sh

In case the name of the text file is changed or it exists in a different path than the download script, run:

./download_data.sh path/filename

This downloads all the encrypted files from the text file and allows it to maintain the structure of the dataset. Next, you must transfer the downloaded encrypt files and the secret key to a secure environment. Use the following command to decrypt files inside the secure environment using the secret key - user.sec that was generated in the previous step.

crypt4gh decrypt --sk user.sec < encrypted-file.c4gh > encrypted-file

Ensure that the encrypted-file.c4gh and the corresponding encrypted-file are in the same folder.

Validate the decrypt files

The next step is to validate the decrypted files. This can be done using calculating checksums of the downloaded files. Executing the download_data script downloads a file checksums_list.sha256 that contains the list of checksums. Following command is used to validate the decrypted file.

sha256sum -c checksums_list.sha256

In the end, the requestor replies to FEGA-SE helpback to confirm successful download and decryption.

ega-se-user-docs's People

Contributors

aaperis avatar dbampalikis avatar jbygdell avatar kostas-kou avatar norling avatar pontus avatar shreyasshivakumara avatar viklund avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ega-se-user-docs's Issues

Update documentation for SDA

Update the current user documentation so it is inline with submitting to the standalone archive.

Focus on uploading from Bianca.

First Submission

Short Summary: Use API to get rid of extra files in the submission.

In the first submission to the Central EGA, the data owner chose to only submit 10 files to test the system, though the metadata he uploaded contained information about all files this caused the submission to fail. It looks like it is possible to use the API to remove the extra entries.

Login URL for the API in the test system:
https://test.ega-archive.org/submission-api/v1/login

Here is documentation on deleting objects:
https://ega-archive.org/submission/programmatic_submissions/submitting-metadata#deletingObjects

And here is for logging in and so on:
https://ega-archive.org/submission/programmatic_submissions/how-to-use-the-api

(the full documentation is available on ega-archive.org and then menu Submission -> Tools -> Rest API)

Tasks:

  • Create a list of requests to delete objects
  • Email Adam with information
  • If requested, create a script for the deleting objects in the first submission

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.