Comments (16)
Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.
Hi, I also don't know the crawl_results.csv format. Have you solved the crawl_results.csv format problem? Could you please provide this file or a sample data format?
from camp.
I did not find any file that fits the description, but I find a file which contains the Uniport ID and the proteins families
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar
from camp.
Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.
from camp.
Hi, you can download the sequence files from : https://www.uniprot.org/downloads.
from camp.
Sorry I did not save the intermediate file but you can use your own data. The important part of the script begins from line 49. You can adjust the script according to your own data format.
from camp.
Hi, you can download the sequence files from : https://www.uniprot.org/downloads.
Hi, twopin, thanks for your reply. When I visit the link https://www.uniprot.org/downloads, it represents as follows and which link needs to be clicked so as to get the file uniprot2seq_file. After download the file, should I further clean the data to get the columns Uniprot_id,Uniprot Sequence,Protein_name,Protein_families.
from camp.
Dear twopin, could you share us exactly how to get the uniprot2seq_file from https://www.uniprot.org/help/downloads.
what's final link inside the https://www.uniprot.org/help/downloads
thanks in advance!!
from camp.
Hi, you can download the sequence files from : https://www.uniprot.org/downloads.
Hi, twopin, thanks for your reply. When I visit the link https://www.uniprot.org/downloads, it represents as follows and which link needs to be clicked so as to get the file uniprot2seq_file. After download the file, should I further clean the data to get the columns Uniprot_id,Uniprot Sequence,Protein_name,Protein_families.
Do you finally get the download link? If so, could share with us? thanks!!
from camp.
I've uploaded the exact data I downloaded when doing this project. Actually the file was directly downloaded from UniProt when just selecting all the fasta sequence file. You guys can directly use mine or downloaded the latest version.
from camp.
Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.
This file is the output file using the crawling script.
from camp.
The UniProt website has updated since 2020 and here is how to download now:
from camp.
@twopin thanks for your reply.
Do you get the Protein_families from
https://www.uniprot.org/uniprotkb?query=reviewed:false
or https://www.uniprot.org/help/downloads
I think the downloaded fasta file only have ProteinName which is not its protein family name, yes?
I think we make this issue to ask help to you on where get the protein family with uniprot id.
Or do you treat ProteinName as protein faimily name, if there is "MHC" in the ProteinName, filter out it?
We find protein family name in the link below.
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar
db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion
BTW, could you share: this paper use the review uniprot sp sequences or the unreviewed sequences? your picture is the unreviewed sequences, while you upload and codes seems to use review uniprot sp sequences.
thanks for your share!
from camp.
Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.
from camp.
@twopin thanks for your reply. Do you get the Protein_families from https://www.uniprot.org/uniprotkb?query=reviewed:false or https://www.uniprot.org/help/downloads
I think the downloaded fasta file only have ProteinName which is not its protein family name, yes? I think we make this issue to ask help to you on where get the protein family with uniprot id. Or do you treat ProteinName as protein faimily name, if there is "MHC" in the ProteinName, filter out it? We find protein family name in the link below. https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar
db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion
BTW, could you share: this paper use the review uniprot sp sequences or the unreviewed sequences? your picture is the unreviewed sequences, while you upload and codes seems to use review uniprot sp sequences. thanks for your share!
please see my reply below.
from camp.
Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.
@twopin , in your paper and your shared codes, you filtered protein sequence which belong to MHC protein family. (●'◡'●)
from camp.
Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.
@twopin , in your paper and your shared codes, you filtered protein sequence which belong to MHC protein family. (●'◡'●)
Oh you mean filtering... I thought you were talking about the scripts in this Git. Actually you can get protein family from UniProt. First you have your list of multiple uniprot ids, and click "ID mapping" then load your list and click "map IDs". Then when the job finished, you click the job id and you can see a big table. Now, just click customize columns (figures below) and select protein families. You will get what you want.
from camp.
Related Issues (20)
- Could you check PLIPv2.2.2 output vs v1.4.2? current codes only for PLIPv1.4.2? HOT 1
- About blast database HOT 1
- Questions about creating "query_peptide.fasta" and "target_peptide.fasta" HOT 3
- step2_pepBDB_pep_bindingsites.py 'df_part_all' is not defined HOT 1
- dataset HOT 2
- error in preprocessing features HOT 3
- peptide_dense_feature_dict generated by step3 has only two features per AA whereas given example data has three features per AA
- interpretion of results HOT 1
- error in CAMP_BS.h5 modelling HOT 5
- We couldn't run program HOT 3
- Failed to predict protein-peptide interactions in PPD-bench HOT 3
- Unknown loss function: conditional_BCE; predict_CAMP.py HOT 2
- Request for Guidance in Step 1 to Step 3 of step2_pepBDB_pep_bindingsites.py
- can not load 'CAMP_BS.h5' successfully HOT 4
- problem unpickling CAMP_pytorch files HOT 4
- analyzed file generated by PLIP HOT 1
- Step 4 of step1_pdb_process.py
- "query_peptide.fasta" and "target_peptide.fasta" HOT 2
- How to generate 'pep_concat_seq' and 'prot_concat_seq' in my own camp input data
- The peptide-binding residue prediction in CAMP_train_CV.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from camp.