GithubHelp home page GithubHelp logo

About Proprecess data about camp HOT 11 CLOSED

lbwfff avatar lbwfff commented on August 18, 2024 2
About Proprecess data

from camp.

Comments (11)

twopin avatar twopin commented on August 18, 2024

Hi,you need to generate pssm features by PSI-Blast and intrinsic features by IUPred with your own data.

from camp.

lbwfff avatar lbwfff commented on August 18, 2024

Hi,you need to generate pssm features by PSI-Blast and intrinsic features by IUPred with your own data.

Hi, twopin
Thank you for your reply. Are Peptide_Intrinsic_dict_v3 and Protein_Intrinsic_dict files that use the same process but the input sequence is peptide and protein respectively?
Besides that, I also encountered a problem, I'm using a small file to test, After changing fasta_filename to my input fasta file, I got the following error. How can I solve this problem?

(4, 16, 16)
(4, 0)
(4, 16, 16)
(4, 0)
Traceback (most recent call last):
  File "step3_generate_features.py", line 57, in <module>
    Intrinsic = raw_score_dict_long[key]
KeyError: '>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3'

Thanks,
LeeLee

from camp.

lbwfff avatar lbwfff commented on August 18, 2024

I seem to output an empty raw_score_dict, why is this happening? The following is one of the files output by my IUPred. Is there any problem with this file?

# IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding
# Balint Meszaros, Gabor Erdos, Zsuzsanna Dosztanyi
# Nucleic Acids Research 2018;46(W1):W329-W337.
#
# Prediction type: short
# Prediction output
# POS	RES	IUPRED2
1	M	0.9141
2	T	0.8713
3	M	0.8311
4	D	0.7458
5	K	0.6870
6	S	0.6650
7	E	0.6374
8	L	0.5711
9	V	0.5473
10	Q	0.5084
...........

from camp.

twopin avatar twopin commented on August 18, 2024
  1. https://github.com/twopin/CAMP/issues/12#issuecomment-992278927:
    Yes, Peptide_Intrinsic_dict_v3 and Protein_Intrinsic_dict use the same code for generation (difference input files, one for protein and on for peptide). 'KeyError: '>sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3'' should be the fasta name in your fasta file. The error message indicates that there is s sequence in your sequence in your fasta file whose fasta name is not in the key list of the raw_score_dict. I suspect that there is something wrong in the function 'extract_intrinsic_disorder'. You can run the function line by line and print the two dicts to check.

from camp.

twopin avatar twopin commented on August 18, 2024

hi, here are two of the results I got when I wrote these codes 1 years ago (I'm not sure if the output format changes recently). The fasta names for each sequence can be changed according to the input fasta files.
The original file name is ended with ".result". I change them just for file uploading (github don'e recognize files ending with ''.result".
cm4_pep_long.txt
cm4_pep_short.txt

from camp.

lbwfff avatar lbwfff commented on August 18, 2024

hi, here are two of the results I got when I wrote these codes 1 years ago (I'm not sure if the output format changes recently). The fasta names for each sequence can be changed according to the input fasta files. The original file name is ended with ".result". I change them just for file uploading (github don'e recognize files ending with ''.result". cm4_pep_long.txt cm4_pep_short.txt

Hi, Thanks for your reply, let me have solved this problem, there is still a small question, for this piece of code:

Intrinsic_score = {}
for seq in Intrinsic_score_short.keys():
    Intrinsic = Intrinsic_score_long[prot_seq][:,0]
    short_Intrinsic = Intrinsic_score_short[prot_seq]
    concat_Intrinsic = np.column_stack((long_Intrinsic,short_Intrinsic))
    Intrinsic_score[seq] = np.column_stack((long_Intrinsic,short_Intrinsic))

Here will report an error NameError: name'prot_seq' is not defined, And the long_Intrinsic here does not appear in the previous code, I guess it is Intrinsic?

from camp.

lbwfff avatar lbwfff commented on August 18, 2024

Hi, I encountered another troubles in preprocess_features.py, the following is my error:

(camp) leelee@ubuntu-PowerEdge-T440:~/tools/CAMP/testforadjust$ python -u preprocess_features.py test.tsv 
test.tsv
num of peptides 3 pad_pep_len 50
seq_set 4 pad_prot_len 247
num of peptide ss 3 pad_pep_len 50
seq_ss_set 4 pad_prot_len 247
Traceback (most recent call last):
  File "preprocess_features.py", line 137, in <module>
    f = open(datafile)
NameError: name 'datafile' is not defined

I guess the datafile here is equivalent to input_file is this? After changing the datafile to input_file, I got the following error:

(camp) leelee@ubuntu-PowerEdge-T440:~/tools/CAMP/testforadjust$ python -u preprocess_features.py test.tsv 
test.tsv
num of peptides 3 pad_pep_len 50
seq_set 4 pad_prot_len 247
num of peptide ss 3 pad_pep_len 50
seq_ss_set 4 pad_prot_len 247
MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN
Traceback (most recent call last):
  File "preprocess_features.py", line 148, in <module>
    feature = label_seq_ss(pep_ss, pad_pep_len, seq_ss_set)
  File "preprocess_features.py", line 49, in label_seq_ss
    X[i] = res_ind[res]
TypeError: 'set' object has no attribute '__getitem__'

How can I solve this problem? look forward to your reply.
Best wishes,
LeeLee

from camp.

twopin avatar twopin commented on August 18, 2024

It seems that the key 'MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN' is not in the dict? Actually I'm not sure about that, you can use the debug function to check.

from camp.

twopin avatar twopin commented on August 18, 2024

#12 (comment): Yes

from camp.

lbwfff avatar lbwfff commented on August 18, 2024

It seems that the key 'MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN' is not in the dict? Actually I'm not sure about that, you can use the debug function to check.

So datafile refers to the test_filename in Data curation? This amino acid sequence is something else I printed, I guess the error should be due to this piece of code.

def label_seq_ss(line, pad_prot_len, res_ind):
	line = line.strip().split(',')
	X = np.zeros(pad_prot_len)
	for i ,res in enumerate(line[:pad_prot_len]):
		X[i] = res_ind[res]
	return X

		if pep_ss not in peptide_ss_feature_dict:
			print(pep_ss)
			print(pad_pep_len)
			print(seq_ss_set)
			feature = label_seq_ss(pep_ss, pad_pep_len, seq_ss_set)
			peptide_ss_feature_dict[pep_ss] = feature

The following is my pep_ss, pad_pep_len and seq_ss_set:

"XC,YC,IE,QC,NC,CC,PC,LC,GC"
50
set(['"MC,VC,DC,RH,EH,QH,LH,VH,QH,KH,AH,RH,LH,AH,EH,QH,AC,EC,RC,YH,DH,DH,MH,AH,AH,AH,MH,KH,NH,VH,TH,EC,LC,NC,EC,PC,LC,SC,NH,EH,EH,RH,NH,LH,LH,SH,VH,AH,YH,KH,NH,VH,VH,GH,AH,RH,RH,SH,SH,WH,RH,VH,IH,SH,SH,IH,EH,QH,KH,TC,SC,AC,DC,GC,NC,EH,KH,KH,IH,EH,MH,VH,RH,AH,YH,RH,EH,KH,IH,EH,KH,EH,LH,EH,AH,VH,CH,QH,DH,VH,LH,SH,LH,LH,DH,NH,YH,LH,IH,KH,NH,CC,SC,EC,TC,QC,YH,EH,SH,KH,VH,FH,YH,LH,KH,MH,KH,GH,DH,YH,YH,RH,YH,LH,AH,EH,VH,AC,TC,GC,EH,KH,RH,AH,TH,VH,VH,EH,SH,SH,EH,KH,AH,YH,SH,EH,AH,HH,EH,IH,SH,KH,EH,HH,MC,QC,PC,TC,HC,PH,IH,RH,LH,GH,LH,AH,LH,NH,YH,SH,VH,FH,YH,YH,EH,IH,QC,NC,AC,PH,EH,QH,AH,CH,HH,LH,AH,KH,TH,AH,FH,DH,DH,AH,IH,AH,EC,LH,DH,TH,LC,NC,EC,DC,SC,YH,KH,DH,SH,TH,LH,IH,MH,QH,LH,LH,RH,DH,NH,LH,TH,LH,WH,TC,SC,DC,QC,QC,DC,DC,DC,GC,GC,EC,GC,NC,NC"', '"MC,GC,DC,RH,EH,QH,LH,LH,QH,RH,AH,RH,LH,AH,EH,QH,AC,EC,RC,YH,DH,DH,MH,AH,SH,AH,MH,KH,AH,VH,TH,EH,LC,NC,EC,PC,LC,SC,NH,EH,DH,RH,NH,LH,LH,SH,VH,AH,YH,KH,NH,VH,VH,GH,AH,RH,RH,SH,SH,WH,RH,VH,IH,SH,SH,IH,EH,QH,KH,TC,MC,AC,DC,GC,NC,EH,KH,KH,LH,EH,KH,VH,KH,AH,YH,RH,EH,KH,IH,EH,KH,EH,LH,EH,TH,VH,CH,NH,DH,VH,LH,SH,LH,LH,DH,KH,FH,LH,IH,KC,NC,CC,NC,DC,FC,QC,YH,EH,SH,KH,VH,FH,YH,LH,KH,MH,KH,GH,DH,YH,YH,RH,YH,LH,AH,EH,VH,AC,SC,GC,EH,KH,KH,NH,SH,VH,VH,EH,AH,SH,EH,AH,AH,YH,KH,EH,AH,FH,EH,IH,SH,KH,EH,QH,MC,QC,PC,TC,HC,PH,IH,RH,LH,GH,LH,AH,LH,NH,FH,SH,VH,FH,YH,YH,EH,IH,QC,NC,AC,PH,EH,QH,AH,CH,LH,LH,AH,KH,QH,AH,FH,DH,DH,AH,IH,AH,EC,LH,DH,TH,LC,NC,EC,DC,SC,YH,KH,DH,SH,TH,LH,IH,MH,QH,LH,LH,RH,DH,NH,LH,TH,LH,WH,TC,SC,DC,QC,QC,DC,EC,EC,AC,GC,EC,GC,NC"', '"MC,DC,DC,RH,EH,DH,LH,VH,YH,QH,AH,KH,LH,AH,EH,QH,AC,EC,RC,YH,DH,EH,MH,VH,EH,SH,MH,KH,KH,VH,AH,GC,MC,DC,VC,EC,LC,TC,VH,EH,EH,RH,NH,LH,LH,SH,VH,AH,YH,KH,NH,VH,IH,GH,AH,RH,RH,AH,SH,WH,RH,IH,IH,SH,SH,IH,EH,QH,KH,EH,EC,NC,KC,GC,GC,EH,DH,KH,LH,KH,MH,IH,RH,EH,YH,RH,QH,MH,VH,EH,TH,EH,LH,KH,LH,IH,CH,CH,DH,IH,LH,DH,VH,LH,DH,KH,HH,LH,IH,PH,AH,AC,NC,TC,GH,EH,SH,KH,VH,FH,YH,YH,KH,MH,KH,GH,DH,YH,HH,RH,YH,LH,AH,EH,FH,AC,TC,GC,NH,DH,RH,KH,EH,AH,AH,EH,NH,SH,LH,VH,AH,YH,KH,AH,AH,SH,DH,IH,AH,MH,TH,EH,LC,PC,PC,TC,HC,PH,IH,RH,LH,GH,LH,AH,LH,NH,FH,SH,VH,FH,YH,YH,EH,IH,LC,NC,SC,PH,DH,RH,AH,CH,RH,LH,AH,KH,AH,AH,FH,DH,DH,AH,IH,AH,EC,LH,DH,TH,LC,SC,EC,EC,SC,YH,KH,DH,SH,TH,LH,IH,MH,QH,LH,LH,RH,DH,NH,LH,TH,LH,WH,TC,SC,DC,MC,QC,GC,DC,GC,EC,EC,QH,NC,KC,EH,AH,LH,QH,DC,VC,EC,DC,EC,NC,QC"', '"MC,TC,MC,DC,KH,SH,EH,LH,VH,QH,KH,AH,KH,LH,AH,EH,QH,AC,EC,RC,YH,DH,DH,MH,AH,AH,AH,MH,KH,AH,VH,TH,EH,QC,GC,HC,EC,LC,SC,NH,EH,EH,RH,NH,LH,LH,SH,VH,AH,YH,KH,NH,VH,VH,GH,AH,RH,RH,SH,SH,WH,RH,VH,IH,SH,SH,IH,EH,QH,KH,TC,EC,RC,NC,EC,KH,KH,QH,QH,MH,GH,KH,EH,YH,RH,EH,KH,IH,EH,AH,EH,LH,QH,DH,IH,CH,NH,DH,VH,LH,EH,LH,LH,DH,KH,YH,LH,IH,PH,NH,AC,TC,QC,PH,EH,SH,KH,VH,FH,YH,LH,KH,MH,KH,GH,DH,YH,FH,RH,YH,LH,SH,EH,VC,AC,SC,GC,DH,NH,KH,QH,TH,TH,VH,SH,NH,SH,QH,QH,AH,YH,QH,EH,AH,FH,EH,IH,SH,KH,KH,EH,MC,QC,PC,TC,HC,PH,IH,RH,LH,GH,LH,AH,LH,NH,FH,SH,VH,FH,YH,YH,EH,IH,LC,NC,SC,PH,EH,KH,AH,CH,SH,LH,AH,KH,TH,AH,FH,DH,EH,AH,IH,AH,EC,LH,DH,TH,LC,NC,EC,EC,SC,YH,KH,DH,SH,TH,LH,IH,MH,QH,LH,LH,RH,DH,NH,LH,TH,LH,WH,TC,SC,EC,NC,QC,GC,DC,EC,GC,DC,AC,GC,EC,GC,EC,NC"'])

from camp.

twopin avatar twopin commented on August 18, 2024

I think this bug is due to the naming of variable seq_ss_set twice. I just fixed the bug and revised the script.

from camp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.