alexarnimueller / modlamp Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 17.0 40.85 MB

Python package for peptide sequence generation, peptide descriptor calculation and sequence analysis.

Home Page: https://modlamp.org

License: Other

Makefile 0.05% Python 99.95%

modlamp's Introduction

Hi 👋, I'm Alex

Connect with me:

Languages and Tools:

modlamp's People

Contributors

Stargazers

Watchers

Forkers

ggabernet aspirincode hwang-happy ethmodlab songminghu2004 grisonifr manzhaohui sailfish009 shunsunsun biocreator unixjunkie biorabiei rocke2020 ys-arch

modlamp's Issues

For the pepCATS descriptor, are the bit patterns listed somewhere?

Hello,
In the code, is there a table giving the bit pattern for each of the 20 amino acids?
I looked in the code but couldn't find it.
Thanks a lot,
F.

Improve documentation for descriptors.GlobalDescriptor

https://modlamp.org/modlamp.html#modlamp.descriptors.GlobalDescriptor.calculate_all

The method descriptors.GlobalDescriptor returns an array of 10 elements (everything except molecular formula). However, the documented example shows an array of 9 elements; sequence charge is missing. This was quite baffling to me when I first used the method.

The documentation could be written as such for better clarity:
"Method combining all 10 global descriptors (except molecular formula)..."

Calculation of amino acid probabilities

How were the probabilities for amino acids in the class BaseSequence() calculated? Particularly, how was prob_ACPhel computed?
Thank You!

Issue in the modlAMP example script for peptide classification

@alexarnimueller There are a few issues in the example script for peptide classification. Like in Line 17, it shows a NameError for desc.

I have made some fixes to the script. Should I send a PR for the same?

Difficulty in Analysis of Different Sequence Libraries

Dear Sir
I am using modlamp.analysis module for analysing the peptide sequence dataset. I am able to run the g = GlobalAnalysis(['GLFDIVKKVVGALG', 'KLLKLLKKLLKLLK', ...], names=['Library1']) for amino acid frequency calculation and summary plot but facing difficulty in inputting the dataset as dataframe.

I converted csv file into dataframe but was not able to do analysis using above commands. i am getting the syntax error. Can you please guide me how to use dataframe in above script.
furthermore, I want to ask whether Analysis of Different Sequence Libraries only takes input in form of list/array only. If yes, how can i use my peptide dataframe dataset to do analysis.
I have 3 libraries of peptide in form of column.

Please help
Regards
Sandeep

GlobalAnalysis Plot

In the plot_summary of the GlobalAnalysis plot, the legend grows over the amide and pH info if more than 3 libraries are plotted.
if only one library is plotted, the bars in the AA distribution plot are shifted too much towards the right

Incorrect formula for Glutamine using GlobalDescriptor

Using the code
desc = GlobalDescriptor(['Q'])
desc.formula(amide=False)
for v in desc.descriptor:
print(v[0])

I get:
C4 H7 N1 O4

The correct formula for Glutamine is:
C5H10N2O3

py3?

Very nice project! Unfortunately, the rest of my code base if very much py3 based - have you considered adding py3 support? Would you be interested in contributions?

pips internal functions issue while building biocoda package

I have successfully added modlamp==4.1.2 to bioconda, however, while I am trying to build a package for 4.1.4 the latest release it's showing an error given below while building the conda package. The similar error I also experience while I am trying to install modlamp==4.1.4 using pip to my local computer and install failed eventually.

13:22:46 BIOCONDA INFO (OUT) Added file://$SRC_DIR to build tracker '/tmp/pip-req-tracker-u0iC5E'
13:22:46 BIOCONDA INFO (OUT) Running setup.py (path:$SRC_DIR/setup.py) egg_info for package from file://$SRC_DIR
13:22:46 BIOCONDA INFO (OUT) Running command python setup.py egg_info
13:22:46 BIOCONDA INFO (OUT) Created temporary directory: /tmp/pip-pip-egg-info-hJByB2
13:22:46 BIOCONDA INFO (OUT) Traceback (most recent call last):
13:22:46 BIOCONDA INFO (OUT) File "", line 1, in
13:22:46 BIOCONDA INFO (OUT) File "/opt/conda/conda-bld/modlamp_1588166316542/work/setup.py", line 11, in
13:22:46 BIOCONDA INFO (OUT) reqs = [str(ir.req) for ir in install_reqs][:-1]
13:22:46 BIOCONDA INFO (OUT) AttributeError: 'ParsedRequirement' object has no attribute 'req'

details can be found on this "bioconda/bioconda-recipes#21839"

After discussing this issue with Bioconda community, it looks like some pip's internal functions have been used that will break over time. resolve this issue will help to build this package for conda.

Issue in obtaining descriptors for FASTA data

Hello @alexarnimueller

I am having an issue in obtaining the descriptor data for the FASTA data here- http://caps.ncbs.res.in/3dswap-pred/data/3dswap-pred_positive_dataset.fasta

Here is the program I am running-

from modlamp.descriptors import PeptideDescriptor
pepdesc = PeptideDescriptor('3dswap-pred_negative_dataset.fasta', 'eisenberg') 
pepdesc.calculate_global()
pepdesc.calculate_moment(append=True)  
pepdesc.load_scale('z3')
pepdesc.calculate_autocorr(1, append=True)
col_names = 'ID,Sequence,H_Eisenberg,uH_Eisenberg,Z3_1,Z3_2,Z3_3'
pepdesc.save_descriptor('neg_descriptors1.csv', header=col_names)

I am obtaining this error-

Traceback (most recent call last):
  File "desc_negative.py", line 8, in <module>
    pepdesc.calculate_global()  # calculate global Eisenberg hydrophobicity
  File "/usr/local/lib/python2.7/dist-packages/modlamp/descriptors.py", line 802, in calculate_global
    mtrx.append(self.scale[str(seq[l])])
KeyError: 'X'

Running SVM on AMP vs UniProt shows an error.

@alexarnimueller I am getting an error on running the AMP classification using SVM.

Traceback (most recent call last):
  File "classify-amp.py", line 27, in <module>
    lib.generate_sequences()
  File "/home/ssouravsingh12/.local/lib/python2.7/site-packages/modlamp/sequences.py", line 536, in generate_sequences
    H.generate_sequences()
  File "/home/ssouravsingh12/.local/lib/python2.7/site-packages/modlamp/sequences.py", line 136, in generate_sequences
    seq = ['X'] * random.choice(range(self.lenmin, self.lenmax + 1))
  File "mtrand.pyx", line 1121, in mtrand.RandomState.choice (numpy/random/mtrand/mtrand.c:17200)
ValueError: a must be non-empty