GithubHelp home page GithubHelp logo

conchoecia / pauvre Goto Github PK

View Code? Open in Web Editor NEW
50.0 50.0 12.0 2.3 MB

Pauvre: QC and genome browser plotting Oxford Nanopore and PacBio long reads.

Python 31.08% Shell 0.05% Makefile 0.54% C 61.76% Roff 3.04% Yacc 3.53%

pauvre's People

Contributors

conchoecia avatar edwardbetts avatar emollier avatar mebbert avatar merwok avatar samstudio8 avatar wdecoster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pauvre's Issues

Argument error after clean installation

Hey there,

Am getting the following after a clean installation of pauvre:

$ pauvre --version
Traceback (most recent call last):
  File "/usr/local/bin/pauvre", line 9, in <module>
    load_entry_point('pauvre==0.1.84', 'console_scripts', 'pauvre')()
  File "/usr/local/lib/python3.5/dist-packages/pauvre/pauvre_main.py", line 155, in main
    information to stdout.""")
  File "/usr/lib/python3.5/argparse.py", line 1353, in add_argument
    return self._add_action(action)
  File "/usr/lib/python3.5/argparse.py", line 1716, in _add_action
    self._optionals._add_action(action)
  File "/usr/lib/python3.5/argparse.py", line 1557, in _add_action
    action = super(_ArgumentGroup, self)._add_action(action)
  File "/usr/lib/python3.5/argparse.py", line 1367, in _add_action
    self._check_conflict(action)
  File "/usr/lib/python3.5/argparse.py", line 1506, in _check_conflict
    conflict_handler(action, confl_optionals)
  File "/usr/lib/python3.5/argparse.py", line 1515, in _handle_conflict_error
    raise ArgumentError(action, message % conflict_string)
argparse.ArgumentError: argument -q/--quiet: conflicting option strings: -q, --quiet

Non-ASCII quote marks.

which : {‘major’, ‘minor’, ‘both’}; which ticks to modify;

I am running into the following issue with your code:

SyntaxError: Non-ASCII character '\xe2' in file /usr/local/lib/python2.7/dist-packages/pauvre/marginplot.py on line 50, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I believe this is due to fancy unicode quote marks.

Thanks!

Falling back to Bitstream Vera Sans

I get this error in a server environment:

/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/font_manager.py:1288: UserWarning: findfont: Font family ['sans-serif'] not found. Falling back to Bitstream Vera Sans
  (prop.get_family(), self.defaultFamily[fontext]))

test failure with biopython 1.80

Hi,

While introducing biopython 1.80 in Debian experimental, I noticed in Debian Bug#1024835 that pauvre fails when trying to run test suites with:

$ python3 -m unittest discover -v

The error log shows:

Traceback (most recent call last):
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build/pauvre/pauvre_main.py", line 636, in <module>
    main()
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build/pauvre/pauvre_main.py", line 630, in main
    args.func(parser, args)
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build/pauvre/pauvre_main.py", line 62, in run_subtool
    import pauvre.synplot as submodule
  File "/<<PKGBUILDDIR>>/.pybuild/cpython3_3.10_pauvre/build/pauvre/synplot.py", line 46, in <module>
    import Bio.SubsMat.MatrixInfo as MI
ModuleNotFoundError: No module named 'Bio.SubsMat'

This seems to be related to the module Bio.SubsMat being unmaintained in biopython since 1.75; it is quite possible it has been removed in version 1.80. Their NEWS.rst states regarding version 1.75:

A new module substitution_matrices was added to Bio.Align, which includes an Array class that can be used as a substitution matrix. As the Array class is a subclass of a numpy array, mathematical operations can be applied to it directly, and C code that makes use of substitution matrices can directly access the numerical values stored in the substitution matrices. This module is intended as a replacement of Bio.SubsMat, which is currently unmaintained.

In hope this helps,
Étienne.

missing 1 required positional argument: 'histogram'

Hi conchoecia

I just updated pauvre to the most recenet version 0.1.7 and run it, but it required scipy to be installed. After installing scipy, (sudo pip3 install scipy), I now get the following error:

$ pauvre marginplot -f barcode07.fastq Traceback (most recent call last): File "/usr/local/bin/pauvre", line 9, in <module> load_entry_point('pauvre==0.1.7', 'console_scripts', 'pauvre')() File "/usr/local/lib/python3.4/dist-packages/pauvre/pauvre_main.py", line 302, in main args.func(parser, args) File "/usr/local/lib/python3.4/dist-packages/pauvre/pauvre_main.py", line 57, in run_subtool submodule.run(args) File "/usr/local/lib/python3.4/dist-packages/pauvre/marginplot.py", line 370, in run margin_plot(args) File "/usr/local/lib/python3.4/dist-packages/pauvre/marginplot.py", line 185, in margin_plot stats(args.fastq, read_lengths, read_mean_quals) TypeError: stats() missing 1 required positional argument: 'histogram'

Python version is Python 3.6.1 (miniconda version)
How can I fix this? Thank you!

best regards
sc

Best

matplotlib.use() must be called *before* pylab

Got the following error messages when running pauvre:
pauvre_env/lib/python3.5/site-packages/matplotlib/__init__.py:1405: UserWarning: This call to matplotlib.use() has no effect because the backend has already been chosen; matplotlib.use() must be called *before* pylab, matplotlib.pyplot, or matplotlib.backends is imported for the first time.

But the command worked nevertheless.

Turn off transparent background in plotting - problem

Attempt to turn off the transparent background to make a white background. However, I got the error message. If I don't us the --transparent option, the command works.
Command line:
pauvre marginplot --transparent False --fastq myfile.fq

Error message:

usage: pauvre [-h] [-v] {marginplot,redwood,stats,synplot} ...
pauvre: error: unrecognized arguments: --transparent False myfile.fq

TypeError: print_images() got an unexpected keyword argument 'base_output_name'

Installed using conda...no errors during install. BUT...while pauvre generates and outputs text stats to terminal, it gives an error when trying to generate/display the marginplot graphic image:

TypeError: print_images() got an unexpected keyword argument 'base_output_name'

Command line issued: pauvre marginplot --fastq ecoli_p4_filtered.fastq

This happens both on a Mac as well as a Linux system...

make a quiet version

for marginplot, make it quiet as @wdecoster requested. @wdecoster, I cannot access gmail or twitter for the next few days, so any further discussion until Friday of this week will have to take place through github issues.

Problems in setup.py

Hello! Heard about this project on a Debian mailing list and I noticed issues in setup.py:

  1. The requirements for scikit-learn is scikit-learn, not sklearn (that’s the importable module name) — this is the original bug I saw fixed in the Debian package

  2. requires is deprecated and de facto useless (its job is done by install_requires for dependencies and python_requires for the Python version itself)

  3. adding scripts to packages is wrong and may not get the things installed at the right location or at all; for this case (scripts/test.sh) I think no packaging is needed, assuming your CI clones the whole repo and can run the file directly; the file is not useful for end-users.

For reference, entry_points is a good mechanism for Python scripts, scripts is another param useful for non-Python executables, package_data is for data files that the code needs to access at runtime, the dreaded MANIFEST.in file is used to get extra files in sdists that don’t get installed (like a tox.ini or doc file). Note that data_files is under-specified and best left to platform-specific packaging formats (i.e. trying to install things to /etc or /usr/share using setup.py is fraught with trouble).

Extra notes that you may find interesting to save future trouble:

Require python 3

Make the program require a python3 environment to run. Print out an error if in python2.

Matplotlib text.latex.unicode rcparam deprecation warning

Greetings,

While running the test suite, the following warning appears repeatedly:

/usr/lib/python3.8/_collections_abc.py:832: MatplotlibDeprecationWarning:
The text.latex.unicode rcparam was deprecated in Matplotlib 3.0 and will be removed in 3.2.
  self[key] = other[key]

Reading through Matplotlib API chages for version 3.0, it would seem that the parameter will be forced to True in the future. However, in pauvre/rcparams.py, it is currently set to False, so I'm not sure how you would wish to move forward on this topic. Anyway, for information...

Kind Regards

Phred quality scores affected by plotting

Hello there,

First of all, thanks for this piece of software! Love the summaries and how good the plots look!

I've been having issues with this old dataset I'm analysing though. It has been filtered with fast5-to-fastq (https://github.com/rrwick/Fast5-to-Fastq) to hold only reads with mean Phred scores > 9. However, plotting with pauvre gets me this:

imagem1

Trying to check if the filtering script was to blame, I used NanoPlot (https://github.com/wdecoster/NanoPlot) to plot the same files and got this:
bp_v1_qf9_nanoplot
Meaning it's a matter of plotting. Funnily enough, using NanoPlot's option --loglength to log read lenghts when plotting I got an approximation at what pauvre is currently plotting:
bp_v1_qf9_nanoplot_log

Any tips to get over this? Really wanted to use pauvre to generate my manuscript's figures...
Thanks!

Correct the mean quality score

Quality scores are currently inflated because we're taking the mean across the scores. The scores are in log scale, so, if we want to take the arithmetic mean of the non-log quality scores, we need to we need take the log of the mean we're currently calculating. This is how albacore calculates the average score.

The question is whether the arithmetic mean or some other value is more appropriate. The arithmetic mean is only meaningful if the distribution of individual base quality scores is normally distributed. Otherwise, a geometric mean, or the median would be more approapriate. Incidentally, the mean of the log scores (like we're currently calculating) is the geometric mean.

Please use version tags

Hi,
I intend to package pauvre for Debian since NanoPlot depends on it and NanoPlot is a target in our Debian Med COVID-19 sprint. The versioned dependency in NanoPlot is pauvre>=0.1.86. I can find some v0.1.86 snapshot tag here. But no later tags. When looking at pipy the version scheme is different and recently released versions seem to lack the '.' to separate minor and micro versions. It would be very helpful if you would use a clear versioning scheme and if you would tag your github releases in sync with the pipy versions.
Thanks a lot, Andreas.

wrong command in readme

Usage
stats
generate basic statistics about the fastq file. For example, if I want to know the number of bases and reads with AT LEAST a PHRED score of 5 and AT LEAST a read length of 500, run the program as below and look at the cells highlighted with .
pauvre marginplot --fastq miniDSMN15.fastq

The stats command is missing and instead there is marginplot.

output format

Hi There,
running it without specific output format as follows: pauvre marginplot --fastq my.fastq
gave me the following error
Traceback (most recent call last):
File "/Users/ben/anaconda3/bin/pauvre", line 11, in
sys.exit(main())
File "/Users/ben/anaconda3/lib/python3.5/site-packages/pauvre/pauvre_main.py", line 146, in main
args.func(parser, args)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/pauvre/pauvre_main.py", line 47, in run_subtool
submodule.run(parser, args)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/pauvre/marginplot.py", line 276, in run
marginplot(args)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/pauvre/marginplot.py", line 273, in marginplot
plt.savefig(outname, transparent=args.transparent)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py", line 696, in savefig
res = fig.savefig(*args, **kwargs)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/matplotlib/figure.py", line 1563, in savefig
self.canvas.print_figure(*args, **kwargs)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 2139, in print_figure
canvas = self._get_output_canvas(format)
File "/Users/ben/anaconda3/lib/python3.5/site-packages/matplotlib/backend_bases.py", line 2079, in _get_output_canvas
'%s.' % (format, ', '.join(formats)))
ValueError: Format "p" is not supported.
Supported formats: bmp, eps, gif, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff.

font missing in marginplot

When i run marginplot from pauvre 0.2.2 installed with conda i get a font missing error.

pauvre marginplot --fastq myreads.fastq.gz --fileform pdf png --filt_minlen 200 -o myreads

...
...
plotting in the following window:
        0 <= Q-score (x-axis) <= 29.0
        0 <= length  (y-axis) <= 22724
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial
findfont: Font family ['sans-serif'] not found. Falling back to DejaVu Sans.
findfont: Generic family 'sans-serif' not found because none of the following families were found: Arial

Mean Phred score

Is there a way to get a mean Phred score for the Phred Quality plot? I can visualize it and guess it's 9, etc, but having that number would be nice for a table.

Thanks in advance, and great tool you've created!

Dan

_tkinter.TclError: no display name and no $DISPLAY environment variable

Hi,

do you know where this error is coming from?

File "......./CentOS6/python64/3.5.2/lib/python3.5/tkinter/init.py", line 1868, in init
self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: no display name and no $DISPLAY environment variable

Matplotlib

#Hello,

Thank you for your package, I've been able to produce some awesome plots like those that I used to get with Metrichor (back when it was free).

When using Unix I could only get this to work by adding
'matplotlib.use("Agg")' after import matplotlib in the marginplot.py script.

I understand that this may cause issues for those not using unix but would you mind adding it in as a comment and inform people that they just need to remove the '#' when they get this error:
"raise RuntimeError('Invalid DISPLAY variable')"

Also, as someone who often fails to read the rest of the read me as soon as they see 'pip...'
would you mind changing 'pip' to 'pip3' in the README.md. My pip automatically reverts to my anaconda2 distribution instead and the #!/usr/bin/env python could also be #!/usr/bin/env python3 to prevent one having to type python3 before any command.

Thanks again, I'll be sure to continue using this package.

Kind Regards,
Alexis.

plot not showing

when using the command pauvre marginplot --fastq file.fastq, i get the results from the stats, but the plot is not showing up.

I run the command locally on a ubuntu machine with python3 and all dependencies installed.
I dont receive any errors. any clue what i am missing?

Speed

Takes about 30 minutes to run on a 9G fastq file. I've been looking into how we can speed it up. SeqIO.parse populates full SeqRecord objects, but we only need the length and quality scores. Those are likely the most time consuming part of the parsing, but might be able to shave some time off if we use FastqGeneralIterator.

RuntimeError: Invalid DISPLAY variable

Namespace(command='marginplot', dpi=600, fastq='1D_raw_1.0.3_bcd.fastq', fileform=['pdf', 'png'], func=<function run_subtool at 0x7f3bd8040f28>, lengthbin=None, maxlen=None, maxqual=None, qualbin=None, quiet=False, title='Read length vs mean quality.', transparent=True)
Traceback (most recent call last):
  File "/home/dschultz/python/anaconda3/bin/pauvre", line 11, in <module>
    sys.exit(main())
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/pauvre/pauvre_main.py", line 144, in main
    args.func(parser, args)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/pauvre/pauvre_main.py", line 47, in run_subtool
    submodule.run(parser, args)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/pauvre/marginplot.py", line 294, in run
    marginplot(args)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/pauvre/marginplot.py", line 134, in marginplot
    fig = plt.figure(figsize=(figWidth,figHeight))
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/pyplot.py", line 527, in figure
    **kwargs)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt4agg.py", line 46, in new_figure_manager
    return new_figure_manager_given_figure(num, thisFig)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt4agg.py", line 53, in new_figure_manager_given_figure
    canvas = FigureCanvasQTAgg(figure)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt4agg.py", line 76, in __init__
    FigureCanvasQT.__init__(self, figure)
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt4.py", line 68, in __init__
    _create_qApp()
  File "/home/dschultz/python/anaconda3/lib/python3.5/site-packages/matplotlib/backends/backend_qt5.py", line 138, in _create_qApp
    raise RuntimeError('Invalid DISPLAY variable')

was running this on a server and encountered this error. produced neither pdf nor png.

make something like Meraculous fasta_stats

Make something like Meraculous fasta_stats or poretools stats.

Poretools uses

total reads 2286.000000
total base pairs    8983574.000000
mean    3929.822397
median  4011.500000
min 13.000000
max 6864.000000

fasta_stats is more of a table with the amount of data in different size bins.

Add Biopython as a pre-requisite

Hi,

Thanks for the nice package. I hadn't installed Biopython, which is a pre-requisite for your code. Would be good to add that to your README.

Best,

Mark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.