Hi, I came across your pipeline not long ago and found it would be g

That's good! Now I suggest to do the following: create a f

can you try to do the following: <div class="snippet-clipboard-content notranslate

One more thing to try: <div class="snippet-clipboard-content notranslate position-

Error executing process with NeoFuse,about icbi-lab/nextneopi

Comments (51)

riederd commented on June 12, 2024 1

Ok, it seems that in principle mhcflurry works. Let's see if your run with less CPUs completes.

from nextneopi.

riederd commented on June 12, 2024 1

sorry I meant .command.run

from nextneopi.

riederd commented on June 12, 2024 1

hmmm, this is stange, I do not see a big difference to our settings here. These two settings differ:

pending signals                 (-i) 12383285
max locked memory       (kbytes, -l) unlimited

But I don't think this is the problem.

What puzzles my also is that the featureCounts step takes so long in your case, more than 4 hrs. I would expect 10-20 min, as we see here in our environment.

Just to make sure if the Resource temporarily unavailable issue is really temporary can you please try once more to run

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/30/21453b69c882f412b58dee6149538c
$ bash .command.run

Thanks

from nextneopi.

riederd commented on June 12, 2024 1

That's good!
Now I suggest to do the following:

create a file in the nextNEOpi bin/ directory named set_limits.sh with the following content:

ulimit -n 4096
ulimit -u 8192

Edit the conf/process.config file in the nexNEOpi directory and look for:

withName:Neofuse {
    container = 'https://apps-01.i-med.ac.at/images/singularity/NeoFuse_dev_0d1d4169.sif'
    cpus = 10
}

change it to:

withName:Neofuse {
    beforeScript = 'source /data/SBCS-BessantLab/Antara/nextNEOpi/bin/set_limits.sh'
    container = 'https://apps-01.i-med.ac.at/images/singularity/NeoFuse_dev_0d1d4169.sif'
    cpus = 10
}

rerun the pipeline with the -resume option set.

from nextneopi.

riederd commented on June 12, 2024 1

Hi, can you also try to reduce the number of CPUs to 10 for pVACseq?

For a manual test you may do this by editing command.sh in /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a and set the threads parameter from -t 40 to -t 10.
After this you may run:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ bash .command.run

if this works, you may change the cpus in conf/process.config

from nextneopi.

riederd commented on June 12, 2024 1

can you try to do the following:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ bash .command.run

from nextneopi.

riederd commented on June 12, 2024 1

Hmmm, I think you hit an issue in pVACseq, which might be solved in the newest version. I'll prepare an updated image this evening. Meanwhile, can you send me a tar archive from that working directory, so that I can test locally. You would need to create it as follows:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09
$ tar -chvzf testdata.tar.gz 7305b172968e7a9bc25e0b59f2eb8a

Please send me a private e-mail with a download link for the resulting testdata.tar.gz.

from nextneopi.

riederd commented on June 12, 2024 1

One more thing to try:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> bash .command.sh

from nextneopi.

riederd commented on June 12, 2024 1

...and in case you get an netMHCstab error please look for the following line in .command.sh:

--netmhc-stab

and remove it and re-run the commands above.
NetMHCstab is run from a webservice which is not always working as expected. It can be disabled in nextNEOpi with the option --use_NetMHCstab false

from nextneopi.

riederd commented on June 12, 2024 1

This is interesting, you should not get those. Can you check the pandas version and path for me:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> pip show pandas

and

Singularity> python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> print(pd.__version__)

Can you then try the new test image that I prepared:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ rm -rf ./MHC_Class*
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data https://apps-01.i-med.ac.at/images/singularity/pVACtools_3.0.1_icbi_test_20220609.sif /bin/bash
Singularity> pip show pandas

and then:

Singularity> bash .command.sh

from nextneopi.

riederd commented on June 12, 2024 1

Thanks, I'll try to reproduce this

from nextneopi.

riederd commented on June 12, 2024

Hi, I'm sorry that you are running into these troubles.

In order to sort this out I'd need some more information:

Please tell us which version of nextNEOpi you are running
Post the .nextflow.log file from the failed run.
Post the .command.run file from the /data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074

Thanks

from nextneopi.

antaralabiba97 commented on June 12, 2024

Thank you for your prompt response.

I am using version nextNEOpi_v1.3.1 which I believe is the latest? I set this up on 21st of May using the latest documents and data available on GitHub. Also, I am running only one sample at the moment from the TESLA consortium data used to benchmark your pipeline. I have WES normal and tumor and tumor rnaseq data.

I have attached the files you have asked for. Just changed the extension from .run to .txt so I can attach it here.

Please let me know if you require any other info.

Thank you.

nextflow.log
command.run.txt

from nextneopi.

riederd commented on June 12, 2024

Thanks for the information.
We are checking the relevant code in NEOfuse and try to understand why you get this error. Can you help us meanwhile with the following:

post the following log file /data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074/sample1/LOGS/sample1_8_MHCFlurry.log
check if /data/SBCS-BessantLab/Antara/nextNEOpi/work/a9/f5d9ded3bc5bc9dfbb2f33b7f58074/sample1/NeoFuse/tmp/MHC_I/sample1_8_NEK11_ALDH1L1_1_8.tsv exists
rerun (use -resume) with a smaller number of CPUs in NEOfuse (e.g. 8)

from nextneopi.

antaralabiba97 commented on June 12, 2024

sample1_8_MHCFlurry.log

As you can see from the image the sample1_8_NEK11_ALDH1L1_1_8.tsv does not exist.

Also, just changed CPU parameters to 8 so will try and resume and let you know what happens.

Thank you.

from nextneopi.

riederd commented on June 12, 2024

Thanks! It seems that mhcflurry is not running or is being stopped/killed just shortly after it starts up.

Can you try to run it manually, e.g.:

$ singularity exec --no-home /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-NeoFuse_dev_0d1d4169.sif /bin/bash
singularity> mhcflurry-predict --affinity-only --alleles A*02:01 --peptides TPDPGAEV --out /tmp/test_1.txt --models /home/neofuse/.local/share/mhcflurry/4/2.0.0/models_class1_pan/models.combined
[...]
singularity> cat /tmp/test_1.txt

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi, I have run the above and have attached what the output looks like. Please let me know the next step. Thank you again! Really appreciate the help

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi,

So I was having issues with the jobs I had submitted to run on the cluster so decided to kill the previous runs and start fresh with the amended CPU 8 defined for NeoFuse. I set up the directory as I did before but noticed this time the link to the resources file on your GitHub is unreachable? So I used the resources folder I had already created. However, when I run the pipeline again there is no route to the host to pull the singularity image ...(attached screenshot). Added the nextflow log too.

I understand this is a different issue and if you would rather me post on a new thread please let me know! Very keen to get this pipeline working and looking forward to hopefully doing this soon!

Thank you!

nextflow.log

from nextneopi.

riederd commented on June 12, 2024

Hi, unfortunately we had an electricity issue last night which affects the server on which the resource is located. The bad thing is that there is a holiday and long weekend now so it might take until Monday to get this fixed since not all things involved are in our hands. We are sorry for this.

from nextneopi.

riederd commented on June 12, 2024

The resource download should work again.

from nextneopi.

antaralabiba97 commented on June 12, 2024

Ah great, will test the new run shortly, thank you. Will let you know if I encounter the same problems with MHCflurry (hopefully not!)

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hello,

So I tried running again with changing the CPUs to 8 for NeoFuse but unfortunately I encountered the same error as before. I have added the .nextflow.log , sample1_8_MHCFlurry.log and the sample1_MHCI_final.log. The sample1_8_NEK11_ALDH1L1_1_8.tsv does not exist in the location specficed in the error. Also, tried the singularity command again which you posted above and got the same output.

Not sure what is causing the issue with this missing file but please do let me know on ways to get this sorted.

Thank you!

command.run.txt
sample1_8_MHCFlurry.log
sample1_MHCI_final.log
nextflow.log

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hello,

I was just wondering if you have had a chance to take a look at this error? Appreciate you may be busy but please let me know if there is any solution when you get the time! Thank you :)

from nextneopi.

riederd commented on June 12, 2024

Hi, we were out of office the last days. We continue to look into this and keep you updated.
Meanwhile, can you try to do the following:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/30/21453b69c882f412b58dee6149538c
$ bash .command.run.sh

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi, no worries and thank you.

Also, tried this but there is no ".command.run.sh" file. Below are the files available in the directory you specified.

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi, I ran the above and this is the error output...

from nextneopi.

riederd commented on June 12, 2024

Hi, it seems you are running into a resource limit. Can you post the output of:

$ ulimit -a

Can you do this on the head node of your cluster and on one of the compute nodes, in case you are running nextNEOpi on a cluster.

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hello, I am just running on the head node and not submitting a job on the cluster, this is the output on the head node.

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi, I ran again and now get the error below

from nextneopi.

riederd commented on June 12, 2024

So you really seem to hit some resource limit on your machine. Do you have many other processes running on that machine? Can you check with:

$ ps -eLf | wc -l
$ ps -eLf | grep hfy006 | wc -l

You can try to raise some limits:

$ ulimit -n 4096
$ ulimit -l unlimited
$ ulimit -u 8192

and then run the .command.run script again.

from nextneopi.

antaralabiba97 commented on June 12, 2024

I am unable to change "ulimit -l" to unlimited as it is locked however, I have changed the other two parameters and will re-run. The only other processes I had running were the mhcflurry-predict which did not fully terminate after the previous run exited.

Will keep you updated. Thank you!

from nextneopi.

antaralabiba97 commented on June 12, 2024

After doing the above running the ".command.run script" completed! I had to kill the previous processes which were still running from the nextflow run that exited with the MHCflurry error. Please let me know on how to proceed from the stage the pipeline exited. Thank you for all your help so far, glad to be one step closer!

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi,

So I did all the above and the Neofuse part of the run completed and I have the output folder for this in my results! Thank for the help on this part, it's really appreciated!

However, I now have an error during the pVACseq stage which is causing the process to exit. I have attached the files associated with the error.

nextflow.log
command.run.txt
command.sh.txt

Feels like I'm nearly there so I am very excited for the run to complete and then hopefully once I have a working pipeline I will be able to run my other samples!

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hi, I tried doing the above, and the process aborts

from nextneopi.

antaralabiba97 commented on June 12, 2024

The process exits with "Error: No command specified".

from nextneopi.

antaralabiba97 commented on June 12, 2024

Sent the email, please let me know if you do not receive it. Thank you.

from nextneopi.

antaralabiba97 commented on June 12, 2024

I didn't get the netMHCstab error but the same error as before "Error: No command specified".

from nextneopi.

riederd commented on June 12, 2024

Did you get pandas warnings with this?

from nextneopi.

antaralabiba97 commented on June 12, 2024

Yes I did, the same as before

from nextneopi.

antaralabiba97 commented on June 12, 2024

Did all the steps to check python and pandas version which are the same as yours.

Tried the new test image and it worked!

When I go and look in the folder with the outputs I don't see the "sample1_tumor.filtered.tsv" file but just the filtered results file for HLA-A02:01 "sample1_tumor_HLA-A02:01.filtered.tsv".

For now, I have not included "HLA-HD" in the pipeline so no MHC-II predictions are generated but once this run finishes completely I will go back to include it.

from nextneopi.

riederd commented on June 12, 2024

Cool. Thanks!

The final filtered result for the entire sample is generated by nextNEOpi after collecting the parallelized junks. So what you see is expected.

I did not disclose my pandas version ;-) so I don't think you can state that it is the same as yours. Would it be possible for you to post the output of:

$ cd /data/SBCS-BessantLab/Antara/nextNEOpi/work/09/7305b172968e7a9bc25e0b59f2eb8a
$ singularity exec --no-mount hostfs -B /data/SBCS-BessantLab/Antara/nextNEOpi -B "$PWD" --no-home -B /data/SBCS-BessantLab/Antara/nextNEOpi/assets -B /data/SBCS-BessantLab/Antara/nextNEOpi/tmpDir -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/iedb:/opt/iedb -B /data/SBCS-BessantLab/Antara/nextNEOpi/resources/databases/mhcflurry_data:/opt/mhcflurry_data /data/SBCS-BessantLab/Antara/nextNEOpi/work/singularity/apps-01.i-med.ac.at-images-singularity-pVACtools_3.0.0_icbi_5dfca363.sif /bin/bash
Singularity> pip show pandas

from nextneopi.

antaralabiba97 commented on June 12, 2024

Haha you're right I mean just the python version!

Here's the output of the above:

from nextneopi.

riederd commented on June 12, 2024

Thanks!
This is very interesting, python from the singularity image is using the pandas package that is installed in your home directory, which is not working with pVACseq in the image. In principle you should not see anything from your home directory from within the container since we use the --no-home and --no-mount hostfs options to startup the container. This is working fine here, e.g. see what happens if I want to change to my home dir from within the container:

Singularity> cd ~
bash: cd: /home/rieder: No such file or directory

May I ask which version of singularity you use?

from nextneopi.

antaralabiba97 commented on June 12, 2024

Hmm, yes I understand just had a read around this.
This is the version:

from nextneopi.

riederd commented on June 12, 2024

I think I get a clue what happens a your site. Can you please post the output of:

grep "bind path" /etc/singularity/singularity.conf

from nextneopi.

antaralabiba97 commented on June 12, 2024

Here is the output for the above command:

from nextneopi.

riederd commented on June 12, 2024

yes, here we go

bind path = /data

tells singularity to bind mount /data from the host to /data in the container. Now, your user home $HOME is located in /data i.e. /data/home/hfy006 this way, no matter if we tell singularity to not mount the user home (--no-home), it will still be present in the container because it gets mounted by default via the explicit bind path = /data directive in the global config.

When importing a library, Python is first looking in the user home under $HOME/.local/lib/... for a matching package and if finds one it will use it. Now, if the package has an incompatible version you will get warnings or errors or any sort of unexpected behavior.

So the quickest fix is to remove the bind path = /data from /etc/singularity/singularity.conf, since you will be likely to hit these package/library conflicts also with other singularity containers, it can happen not only with python packages but also - for example - with R libraries. However, I'm not sure if this is something you/your admin are/is concerned about and there maybe some important reasons why you/your admin set this configuration as it is.

I need to check if there is any other way to avoid this situation. Since this is not a specific nextNEOpi bug I'll close the issue for now, but feel free to reopen it.

Thanks a lot for all your input!

from nextneopi.

antaralabiba97 commented on June 12, 2024

Ok, I understand the issue now.

For now, I am running nextNEOpi on the cluster so I will ask the admin team to see if we can work around this. I do have my own custom-built PC arriving soon which is designed to run pipelines like nextNEOpi locally without memory or performance problems so may be able to avoid the issue above.

I will get back to you once I am able to work off the cluster and hopefully be able to run the pipeline smoothly! Thanks for all your help thus far :)

from nextneopi.

riederd commented on June 12, 2024

One thing, that may work would be to set a "fake home" in the params.conf which points to the tmpDir

e.g.

singularity {
    enabled = true
    autoMounts = true
    runOptions =  "--no-home" + " -H " + params.singularityTmpMount + " -B " +  params.singularityAssetsMount + " -B " + params.singularityTmpMount + " -B " + params.resourcesBaseDir + params.singularityHLAHDmount + " -B " + params.databases.IEDB_dir + ":/opt/iedb" + " -B " + params.databases.MHCFLURRY_dir + ":/opt/mhcflurry_data"
}

Might work, but this is untested, so I have no idea if other problems pop up with is hack.

from nextneopi.

antaralabiba97 commented on June 12, 2024

Ok, I will try and hope for the best 😅

Will let you know what happens.

from nextneopi.

antaralabiba97 commented on June 12, 2024

After doing what you suggested above, the pVACseq process did start more than 48 hours ago 😅 but many processes have gone defunct straight after starting and the processes which are running have been running for a very long time. Think the best thing to do is to run it on my PC when it arrives as not sure how to overcome the issues with the cluster.

from nextneopi.

Error executing process with NeoFuse about nextneopi HOT 51 CLOSED

Comments (51)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs