GithubHelp home page GithubHelp logo

Comments (15)

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Hi Miguel, just to be clear, I am one of the reviewers of the paper you submitted.

I should have submitted the review already today, I have not done so, because I did not manage to run the pipeline. Would you prefer me to write the review without actually running the software, or would you prefer me to wait for us to resolve the installation / running issues?

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Thanks for these quick fixes. I did managed to get much further (it's still running, maybe I will get some plots). Here are some more comments for now:

Regarding the light install, I don't think you pushed the script to the repo (I see no scripts/lightinstall.sh).

I also digged a tiny bit in the error, I messed up the config file, I have no specified the karyon directory well and it tried to write tmp files to the install directory of karyon. So I set up the real location of karyon (/scratch/kjaron/install/karyon) but I still got

python3 install/karyon/bin/allplots.py -f data/reference/genome.fa -v freebayes_Afus1_raw.vcf -p Afus1.mpileup -b data/mapped_reads/Afus1.rg.sorted.rmdup.bam -l data/trimmed_reads/Afus1/ERR5959256-trimmed-pair1.fastq.gz -o data/Afus1/karyon_plots                                                                        
Traceback (most recent call last):                                                                                                                                                 
  File "/scratch/kjaron/install/karyon/bin/allplots.py", line 93, in <module>                                                                                                      
    allplots(window_size,                                                                                                                                                          
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 496, in allplots                                                                                                  
    os.mkdir(kitchen)                                                                                                                                                              
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/kjaron/install/karyon/tmp/DETQV7'   

so, I also made the directory tmp, and it finally worked. The program was running for a while and generate 2 plots - lenlin and lenlog. But then annother error occured

/ceph/users/kjaron/.conda/envs/karyon/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/ceph/users/kjaron/.conda/envs/karyon/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/scratch/kjaron/install/karyon/bin/allplots.py", line 93, in <module>
    allplots(window_size, 
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 504, in allplots
    var_v_cov(vcf, mpileup, window_size, newpath, lendict, scafminsize, scafmaxsize)
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 112, in var_v_cov
    mean_cov = extract_pileup_data(pileup_file, window_size, lendict, scafminsize, scafmaxsize)
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 103, in extract_pileup_data
    coverage.append(int(line[line.find("DP=")+3:].split(";")[0]))
ValueError: invalid literal for int() with base 10: 'th_1\t2\tG\t22\t.....................^B.\tJJJJJJJJFJFJJJJJJJJFJ<\n'

I suspect this will have something to do with parsing mpileup file. I made mine using

samtools mpileup -a --no-BAQ --fasta-ref data/reference/Afus1/genome.fa --output /scratch/$USER/Afus1.mpileup data/mapped_reads/Afus1.rg.sorted.rmdup.bam

but I see that in your workflow you use bfctools. Those details would be good to mention somewhere. I am reruning it now with bfctools, let's see how that changes things.

By the way, regardin tmp files, I don't think it's nice to write tmp files to the installation destination. On plenty of the cluster environments, shared installations will be in read-only directories and quite often in locations that are not well suited to big/many tmp files. What about writing the tmp files to the current directory, or to a directory specified by an argument or to the location specified by environmental variable TMPDIR? I am mentioning this because it would be a real problem on practucally all the cluster I have ever worked on.

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Hello Miguel,

I managed to get the first karyon plot. The plotting script still has not finished without any errors (see bellow), but I do think I got the important bit done and it's enough for me to finish the review I think.

Thank you for your assistance.

Best,
Kamil

python3 install/karyon/bin/allplots.py -f data/reference/genome.fa -v freebayes_Afus1_raw.vcf -p Afus1.bfctools.mpileup -b data/mapped_read
s/Afus1.rg.sorted.rmdup.bam -l data/trimmed_reads/Afus1/ERR5959256-trimmed-pair[1,2].fastq.gz -o data/Afus1/karyon_plots                                                           
/ceph/users/kjaron/.conda/envs/karyon/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3440: RuntimeWarning: Mean of empty slice.                                             
  return _methods._mean(a, axis=axis, dtype=dtype,                                                                                                                                 
/ceph/users/kjaron/.conda/envs/karyon/lib/python3.9/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars                          
  ret = ret.dtype.type(ret / rcount)                                                                                                                                               
/ceph/users/kjaron/.conda/envs/karyon/lib/python3.9/site-packages/seaborn/_decorators.py:36: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, 
the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.                           
  warnings.warn(                                                                                                                                                                   
Traceback (most recent call last):                                                                                                                                                 
  File "/scratch/kjaron/install/karyon/bin/allplots.py", line 93, in <module>                                                                                                      
    allplots(window_size,                                                                                                                                                          
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 506, in allplots                                                                                                  
    fair_coin_global(vcf, window_size, newpath, lendict, scafminsize, scafmaxsize)                                                                                                 
  File "/scratch/kjaron/install/karyon/bin/karyonplots.py", line 189, in fair_coin_global                                                                                          
    if (int(values[0])+int(values[1])) == 0:                                                                                                                                       
IndexError: list index out of range                                             

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Hello,

to check the new version, I redownloaded the repository, (naively) tried to run the installation script. It dropped lots of errors, but I think I managed to overcome the problems by installing the necessary packages via conda and adjusting the config file accordingly. However, to be honest, I find the installation script problematic, it's silently configuring package managers and messing up with user configuration files, which I don't find very fortunate as many users will be confused about the altered behavior of their computing setting. If you intend the package to be really widely used, you really need to make the installation procedure a lot more straightforward. For example by making a single conda recipe. Or by simplifying the installation script to a few installation steps, which will be for people a lot easier to debug. If you would be alright with that, I can give you more detailed feedback on the installation and how I imagine simplifying them.

However, perhaps first it would be good to get the tool working. The plotting script seems to work when I run "--help" common, but when I tried to actually plot stuff I run into an error

Traceback (most recent call last):
  File "/ceph/users/kjaron/src/karyon/bin/allplots.py", line 95, in <module>
    df = allplots(window_size, 
  File "/ceph/users/kjaron/src/karyon/bin/karyonplots.py", line 511, in allplots
    df = window_walker(window_size, step, vcf, fasta_file, bam, nQuire, kitchen, newpath, counter, lendict, scafminsize, scafmaxsize, no_plot)
  File "/ceph/users/kjaron/src/karyon/bin/karyonplots.py", line 370, in window_walker
    ref, alt = float(refalt[0]), float(refalt[1])
IndexError: list index out of range

I am not sure if the installation is messed up, or Karyon does not like the data I am using (the same as I used before, I wanted to first see how different the plots will be). Do you have a dummy dataset I could try? Alternativelly, do you know what have gone wrong?

I also looked into getting docker work on our cluster, but I don't think that's an option for me.

Sorry for the hassle, but I would like to see the program working before I will submit my review.

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Hi Miguel,

thanks for looking into it and don't worry about the delay...

I re-installed the tool and re-run it on my data, and hit exactly the same error. Again, I would encourage you to create a toy dataset that can be used by users to verify they installation was successful. I am still not sure if the problem is due to some sort of problem with my data or with the installation.

Running out of imagination, I cleared some space on my personal computer and installed docker to try karyon. However, I found out that even the docker installation does not seem to work at all - when I tried to pull the docker image, it complains about not finding the package.

docker pull gabaldonlab/karyon
Using default tag: latest
Error response from daemon: pull access denied for gabaldonlab/karyon, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

then I tried to check the docker github repo, but it seems that it is private, which might be the reason why I have not seen the docker image online. However, I used docker only a few times, I might be missing something basic.

Let me know if there is a quick fix to that.

Something I don't expect you do to right now, but I really think you should consider finding time to invest more in the installation. I am quite certain, nearly no one could run the installation script without any errors, with even slimmer chance if attempting to install the tool on a cluster. If you want the tool to be useful for the community, you need to make it a lot more straightforward to use.

I recommend to write a markdown document with the installation steps are explained step by step, instead of installation bash script. I had to dissect the script and run it bit by bit anyway. Having more comments would be really helpful. More specifically

  • I would avoid installing anything via apt (or apt-get), that is usually not available on clusters. If you decide on keeping the apt step, I would at least avoid adding more repositories without properly explaining what implications it has to the users. (these lines).
  • I would also advice against changing conda channels, you can add -c for the executed conda commands instead (these lines). Furthermore, you can stack the commands together (instead of having 15 conda install lines, you can run: something like conda install -c bioconda -y biopython matplotlib ipython jupyter pandas sympy nose seaborn psutil pysam)
  • I would also avoid changing .bashrc. The whole point of conda is separation of environments and kat should be in a separated environment (these lines) for other things that you can also simply link them in the conda environment directory.
  • your program also creates a temporary directory with os.makedir(), if replaced by os.makedirs() could help to avoid an error if the tmp directory is missing. However, I would also recommend to allow users to specify their own place where to save temporary directory, on clusters typically local scratch disks rather than shared directories where I would normally install the software.
  • as mentioned above - provide a test dataset. So I can verify that I installed the pipeline as expected.

from karyon.

MANaranjo avatar MANaranjo commented on August 30, 2024

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

Hi @MANaranjo,

I am sorry for the late response this time!

I am really sorry to say, but I am running in tons of problems again and I have not even tried to install the software this time, instead I tried to run the docker image as I thought that will be a lot more straightforward.

I pulled the image and run it (note I had to change the commands quite a bit compared to the manual)

docker pull cgenomics/karyon

docker run -dit --name=kamilyon -v /Volumes/dump/containers --rm cgenomics/karyon

and then I tried to at least get the help message and I got

docker exec kamilyon python bin/karyon.py --help
  File "bin/karyon.py", line 331
    try:
       ^
IndentationError: unindent does not match any outer indentation level

What's wrong? Does the container work for you? How is that even possible?!

I would also like to point out that your documentation is still full of typos - none of those commands you have there is working as it is. The most problematic is that you don't even specify the paths right - scripts/karyon.py instead of bin/karyon.py (which I figured once I run docker exec kamilyon ls scripts bin).

I will try to look at the installation too, but I am suspecting it still won't work - I don't have a suitable computer with debian-based system at my disposition (i.e. I don't have apt-get). I will get back to you about that too.

Thanks for including testing data. Once I get it to work I will give it a try.

from karyon.

KamilSJaron avatar KamilSJaron commented on August 30, 2024

I spent a few more hours trying to get it to run but I still did not manage.

I am sorry, I gave up.

from karyon.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.