GithubHelp home page GithubHelp logo

snijderlab / stitch Goto Github PK

View Code? Open in Web Editor NEW
20.0 1.0 3.0 84.26 MB

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling

License: MIT License

C# 100.00%
mass-spectrometry antibody sequencing

stitch's Introduction

Stitch

Template-based assembly of proteomics short reads for de novo antibody sequencing and repertoire profiling.

Getting started

There are distributed executable files for windows (x64, arm64), linux (x64, arm64) and mac (x64, arm64). If you use any other platform the see 'Building'. To use these first download the latest package, found on the releases page. Unpack the archive for your system and run the files from the command line with the filename of the batch file to be used.

Windows:

.\stitch.exe run batchfiles\monoclonal.txt           (x64)
.\stitch_arm.exe run batchfiles\monoclonal.txt       (arm64)

Linux:

(x64, should work on most distros)
chmod +x ./stitch.bin                      (give running permission to the binary)
./stitch.bin run batchfiles/monoclonal.txt

(arm64)
chmod +x ./stitch_arm                      (give running permission to the binary)
./stitch_arm run batchfiles/monoclonal.txt

OSX:

(x64, minimum version macOS 10.12 Sierra)
chmod +x ./stitch.bin                      (give running permission to the binary)
./stitch run batchfiles/monoclonal.txt

(arm64, minimum version macOS 11.0 Big Sur)
chmod +x ./stitch_arm                      (give running permission to the binary)
./stitch_arm run batchfiles/monoclonal.txt

For help creating batch files see manual.pdf, this is can be found on the same page.

Different versions

Releases can be found on the releases page. Nightly versions, which contain all new features but are less stable, can be found on the action page.

Building

First retrieve the source code using git clone.

git clone https://github.com/snijderlab/stitch.git stitch

The project is built with dotnet (.NET 7.0) development is done on windows, but it should work on all major platforms. To run the project on your own machine (not using precompiled binaries for linux or windows x64) install dotnet, stay in this folder (the root) and run:

dotnet run --project stitch <path to batchfile>

It will warn you that the assets folder is missing, this can be fixed by creating a symbolic link (mklink for windows cmd) from the folder in which the dll will be placed (stitch\bin\Debug\net7.0\) called assets to .\assets.

mklink /J stitch\bin\debug\net7.0\assets\ assets\
mklink /J stitch\bin\debug\net7.0\images\ images\
mklink /J stitch\bin\release\net7.0\assets\ assets\
mklink /J stitch\bin\release\net7.0\images\ images\
ln -s assets stitch/bin/debug/net7.0/assets
ln -s images stitch/bin/debug/net7.0/images
ln -s assets stitch/bin/release/net7.0/assets
ln -s images stitch/bin/release/net7.0/images

To generate a single executable run:

dotnet publish stitch -c release -r [target] --self-contained

The target name should then be a valid 'RID' for the platform you choose. But if this is omitted it will default to windows x64. See this site for information about RIDs.

Testing

There are some unit tests provided. These can be found in the 'tests' folder. To run the unit tests run (from the root folder):

dotnet test tests

Examples

The 'batchfiles' folder contains some examples which can be run to see what the program is up to. These examples are present both with the built binaries and the source code.

  • basic.txt
  • monoclonal.txt
  • polyclonal.txt

The 'benchmarks' folder contains a set of examples with a known output which are used to benchmark the program continuously. The description of these examples can be found using the following doi 10.1021/acs.jproteome.1c00913.

Credits

  • Douwe Schulte - Software engineer - d.schulte{at}uu{dot}nl
  • Joost Snijder - Principal investigator
  • Bastiaan de Graaf - Code reviews
  • Wei Wei Peng - Testing and analysis

Acknowledgements

Dependencies

  • Hecklib core, public nuget package see nuget.config for more info on the exact url
  • Stitch assets git submodule, contains the css and js to make the html report shine. A separate submodule to simplify reuse of these files.

License

MIT License (see LICENSE.md)

stitch's People

Contributors

douweschulte avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

stitch's Issues

Use error messages in reads reading

In GitLab by @nonnominandus on Feb 14, 2020, 10:52

When reading reads, fasta or peaks files use the same error messages.

  • Reads
  • Fasta
  • Peaks

Template matching multithreading

In GitLab by @nonnominandus on Dec 3, 2019, 16:35

The template matching step is fairly easy to multithread and is a huge part of the runtime. I would suggest multithreading using the individual smithwaterman alignments as blocks, as this also speeds up the process when only a single template is used.

Alignment on click get to position

In GitLab by @nonnominandus on Mar 12, 2020, 16:40

It would be very nice if when you clicked on an alignment the corresponding patch on the page you enter would be highlighted. This would really help in discovering where the alignment matches.

Enforce unique fasta identifiers

In GitLab by @nonnominandus on Mar 23, 2020, 19:03

Now it would be possible to create two reads with identical names (even more likely with identical EscapeIdentifiers).

This leads to the following problems

  • Only one will be included in the HTML page
    • Unclear for the user
    • It could clash when the two reads have to be written to the file at the same time, resulting in a hard to reproduce bug

The proposal is to create a single list to check all new names against and force names to be unique by given identical names a numbered suffix.

Sidebar fold in/out

In GitLab by @nonnominandus on Oct 13, 2019, 14:45

The sidebar seems to fold out to max immediatly when trying to change the width of the sidebar. And maybe it would if the sidebar would change according to the mouse position but then after release snaps to a right place (or some other way to give some more visual feedback).

Contextual information contigs + paths

In GitLab by @nonnominandus on Mar 25, 2020, 19:00

Since the move to separate pages the contextual information (the location in the graph) is removed for contigs and paths. It would be nice if the details pages could have included the subgraph in which they are located including highlights.

But maybe this will not be necessary if the reads alignment proofs to be good enough.

Check links in alignment

In GitLab by @nonnominandus on Mar 13, 2020, 09:41

Sometimes they seem to be wrong. Find out where and why.

Generate subfolders when needed while saving reports

In GitLab by @nonnominandus on Mar 6, 2020, 11:08

It would be nice if the saving of reports could automatically create missing folders, especially because this gives the option to generate subfolders for specific subsets if needed, like "../{data}/{k}-{alp}.html" would create a subfolder per input data with all its runs inside.

Do not forget to add this in the manual if done.

Consensus Tree

In GitLab by @nonnominandus on Apr 1, 2020, 18:30

It would be nice if at the top of the HTML page there would be a tree specifying the relationships between the generated consensus sequences (Reads alignment or recombination based on the used parameters). I am hopeful this would aid in detecting poly clonal mixtures.

Redo path sequence creation

In GitLab by @nonnominandus on Feb 14, 2020, 11:02

Create a full sequence alignment with joined reads, joined DOC etc.

Rethink terminology

In GitLab by @nonnominandus on Feb 14, 2020, 10:58

As some terms now muddle the understanding for new people.
Apply these terms in the docs, batchfile and code.

Reads alignment step

In GitLab by @nonnominandus on Mar 23, 2020, 18:58

Add an extra step after the recombination. This step should align all reads on the consensus sequences of the recombined templates.

Goal

  • Provide a more intuitive results page
  • Remove (all) errors related to faulty assembly by the de Bruijn assembly

Features

  • Two sets of input reads should be made
    • For recombination, with only the best reads to maximise speed and quality
    • For the reads align, with many more reads also including some erroneous ones, those should be filtered out by the software
  • The new alignment should be the first thing on the HTML page, the recombination should be accessible
  • In the HTML the alignment should link directly to the input reads (where it now links to the paths)

Sidebar sequence view bugs

In GitLab by @nonnominandus on Sep 19, 2019, 16:31

The sidebar does not always display correctly. Also it would be nice to be able to see the full length of the reads.

More use for the identifier system

In GitLab by @nonnominandus on Mar 23, 2020, 19:09

The new identifier system as implemented in fasta files could be expanded into the following ways:

  • Display the actual identifier instead of the escaped one (this needs differentiation in the code whether or not the AsideIdentifier needed is escaped or not)
  • Make this an essential part of the MetaData api, as such also use the identifiers of Peaks datasets and create identifiers for the simple datasets

Differentiate between Heavy and Light chain

In GitLab by @nonnominandus on Jun 10, 2020, 15:36

In the resultspage it would be nice if there was a differentiation between the heavy and light chain of the antibodies.

Use SIMD in AminoAcid[] comparison

In GitLab by @nonnominandus on Jan 20, 2020, 11:59

If AminoAcid could be saved as a 'pure' byte. SIMD would make a lot of sense in the comparison of two arrays. This would have the potential to decrease the runtime by at least 10x but likely even more.

Progressbar Update

In GitLab by @nonnominandus on Mar 2, 2020, 10:05

A progressbar would be a nice addition, especially one that works for large run.

--> | [15%]

Maybe also include substeps (the bruijn, recombination, reports) in the calculation, so that even for one run there still is a somewhat reliable progress counter.
Also it would be nice if the total runtime was also given, automatically switching the unit.

--> | [15%] [XXs]

Then maybe after the runs the time per run can be given.

Sequence logo

In GitLab by @nonnominandus on Feb 14, 2020, 10:55

Create a sequence logo in the sidebars.

Use DOC to get better scores

In GitLab by @nonnominandus on Feb 14, 2020, 11:00

By using the DOC when assigning a Smith Waterman Score to an alignment, the score could be made more representative of the confidence in the alignment.

Create asides for the short reads

In GitLab by @nonnominandus on Mar 12, 2020, 17:13

It would be nice if the short reads also had a details page like normal paths.

Template matching

In GitLab by @nonnominandus on Nov 26, 2019, 10:35

Create a way to position reads/contigs on a given template. At first keep it to a alignment to all sequences in a list and test the runtime of this.

  • Visualisatie van template matching resultaten voor analyse, interface voor de gebruiker
  • Samenstellen van de juiste databases van germline sequenties, liefst met verschillende opties voor de gebruiker om er ofwel helemaal blind aan te beginnen, ofwel een organisme te specificeren, ofwel een kant-en-klare set templates aan te leveren.
  • Testen van template matching met homologie (scorings matrix zoals BLOSUM62 oid; we moeten nog kijken of de substituties in antilichamen andere patronen volgen dan het algehele proteoom, zal vast literatuur over bestaan)
  • Op basis van template-matching in database van germline sequenties de V-(D)-J-C recombinaties automatisch aanmaken voor een tweede (derde, vierde, 38e) ronde van template matching (zodat contigs die de V-D-J-C junction omspannen gebruikt kunnen worden om de juiste recombinaties te bepalen). Volgens mij moet je dus voor het D segment ipv de germline sequentie gewoon een gap plaatsen omdat die sequenties toch zoveel verschillen.

Localise path not findable

In GitLab by @nonnominandus on Mar 5, 2020, 16:00

If a path is wrong it would be nice if the program could find out which part is wrong. Wrong drive, wrong folder, wrong file and maybe even provide possible names given the expectation that the user likely mistyped a couple of chars.

HTML on very big projects

In GitLab by @nonnominandus on Feb 14, 2020, 10:56

With huge projects the HTML does not work anymore

Graph compression

In GitLab by @nonnominandus on Apr 9, 2020, 15:32

It would be nice if a group of nodes indicating the variety of a single aminoacid could be compressed into one node. This would speed up the whole process and make the HTML more useful.

Polyclonal

In GitLab by @nonnominandus on Apr 1, 2020, 12:22

Now it seems to work for monoclonal data it is time to test it with polyclonal data.

Tests

  • 3 Mix dataset (June 2019 Douwe)
  • 7 Mix dataset (Sem)

Use given position score for reads

In GitLab by @nonnominandus on Mar 9, 2020, 16:07

In Peaks data the given local confidence could be used to populate the initial DOC.

The same could be done for FASTQ files (for which support could be added).

Compress alignment in templates

In GitLab by @nonnominandus on Mar 11, 2020, 12:48

It would be very nice if the alignment in templates could be compressed into less lines, this would result in somewhat smaller file sizes but more importantly would help in overseeing all the paths.

Lack of unit tests

In GitLab by @nonnominandus on Sep 19, 2019, 16:27

It would be very nice if there would be unit tests.
See the link below for how it would work in dotnet.
link

Remove trailing gaps

In GitLab by @nonnominandus on Mar 13, 2020, 12:32

Sometime gaps are trailing before or after or even just free from the path sequence. Of course these should be removed.

TEMPLATE
 --PLA   <- Front
 EM--    <- Back
 - --ATE <- Those are the weirdest

ForceOnSingleTemplate Investigation

In GitLab by @nonnominandus on Jun 10, 2020, 15:33

Make sure that all paths are being placed, according to Joost there seems to be a very low amount placed.

Show distributions of lists | HTML Redesign

In GitLab by @nonnominandus on Mar 24, 2020, 12:48

It would be very nice if there were distributions (histograms) above some tables.

I would propose.

  • Length distribution for reads
  • Length distribution for paths
  • Score distributions

Plus look into some distributions that could be used in detail pages.

Plus look into how it would work with the new reads alignment

Huge RAM use

In GitLab by @nonnominandus on Mar 4, 2020, 11:04

The assembler seems to use a lot of ram (up to ~2Gb), I guess this is while generating the report in memory, maybe it would be nice to stream this report to the file instead of buffering it all the time.

Update progressbar every x sec

In GitLab by @nonnominandus on Mar 4, 2020, 11:07

It would be nice if the progressbar would automatically refresh every x sec (5? slowely increasing?) to keep the user in touch with the actual runtime.

Then adding an ETA would be a nice option, just based on the percentage done in the given time so very simple, but could still be useefull.

ForceOnSingleTemplate More Options

In GitLab by @nonnominandus on Jun 10, 2020, 15:35

This option could also be used with the alignment on the V database. So it would be nice if this options could be used in more places.

Implement scoring parameter

In GitLab by @nonnominandus on Mar 13, 2020, 09:42

For databases the new scoring parameter 'Scoring' should have the options 'Absolute' and 'Relative'.

Aligned reads full length

In GitLab by @nonnominandus on Nov 27, 2019, 13:44

The last part is cutoff, very likely to be the code deleted yesterday (lastposition += k-2)

Check gaps

In GitLab by @nonnominandus on Mar 13, 2020, 10:36

Check how gaps are scored in the consensus sequence. Sometime a gaps seems to be inserted in the consensus sequence when a lot more paths seem to support no gap.

Sidebar folding

In GitLab by @nonnominandus on Dec 3, 2019, 15:58

The sidebar seems to detect absolute positions of the mouse in respect to all monitors not the one on which the report is opened.

Contextmenu in alignment

In GitLab by @nonnominandus on Mar 12, 2020, 16:39

It would be very useful if a context menu was in place on hover over an alignment patch.

Very simple mock up to show the idea

TEMPLATESEQUENCE
    LATUSEQU
     ^ ---------------
      | Score: XX     |
      | ID: P0006     |
      | DOC: .:.::... |
      | Contig: C0009 |
      ----------------

Useful information would be:

  • Score
  • ID (path)
  • ID (contig, where you hover over)
  • Depth Of Coverage for this patch (maybe full length would be overkill, so only for 5 at a time?)
  • Position on the path (numeric) to know where it matches
  • Small graphic of the alignment? (different coloured blocks for the different in/del/matches)

Speed up HTML generation for huge projects

In GitLab by @nonnominandus on Mar 5, 2020, 13:22

Now the generation of an HTML report with lots of contigs/paths can take minutes, it would be nice if this could take less time. Maybe just remove shit that is unnecessary?

Related

#13 and #24

Check cycles

In GitLab by @nonnominandus on Mar 13, 2020, 09:43

There seem to be a lot of small cycles (< K) find out why they are here and hopefully restrain them from being generated.

Use mAb test sequences

In GitLab by @nonnominandus on Feb 14, 2020, 15:27

  • Create clean FASTA
  • Compare consensus sequence with real sequence
    • Find difference, determine what places, why, and how much difference
  • Test the effect of all variables
    • K, Cutoff, alphabet etc
  • Create plots

Assembler problems

In GitLab by @nonnominandus on Mar 13, 2020, 12:34

The assembler seems to have some problems still.

  • Complex subgraphs
  • Autocycles (clycles from a contig to itself)
  • Precycles (one contig with the same sequence as the one just after it, sometimes combined with autocycles)
  • Very short contigs (1 or sometimes even 0 AminoAcids)
  • Not fully extended contigs (showing overlap which should be merged into the real sequence)

To solve this the minimal subset of any real data that still gives these problems should be made and studied.

Open folder

In GitLab by @nonnominandus on Mar 2, 2020, 12:11

Opening a folder and using all files as input would be nice for huge work (systematic testing).

InputFolder ->
    Path: path/to/folder
    Name: name
<-

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.