GithubHelp home page GithubHelp logo

Comments (16)

zhouruikang1024 avatar zhouruikang1024 commented on September 28, 2024 2

Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.

Hi, I also don't know the crawl_results.csv format. Have you solved the crawl_results.csv format problem? Could you please provide this file or a sample data format?

from camp.

Yiqiu-Zhang avatar Yiqiu-Zhang commented on September 28, 2024 2

I did not find any file that fits the description, but I find a file which contains the Uniport ID and the proteins families
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar

from camp.

milkboylyf avatar milkboylyf commented on September 28, 2024

Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.

from camp.

twopin avatar twopin commented on September 28, 2024

Hi, you can download the sequence files from : https://www.uniprot.org/downloads.

from camp.

twopin avatar twopin commented on September 28, 2024

Sorry I did not save the intermediate file but you can use your own data. The important part of the script begins from line 49. You can adjust the script according to your own data format.

from camp.

milkboylyf avatar milkboylyf commented on September 28, 2024

Hi, you can download the sequence files from : https://www.uniprot.org/downloads.

Hi, twopin, thanks for your reply. When I visit the link https://www.uniprot.org/downloads, it represents as follows and which link needs to be clicked so as to get the file uniprot2seq_file. After download the file, should I further clean the data to get the columns Uniprot_id,Uniprot Sequence,Protein_name,Protein_families.
image

image

from camp.

rocke2020 avatar rocke2020 commented on September 28, 2024

Dear twopin, could you share us exactly how to get the uniprot2seq_file from https://www.uniprot.org/help/downloads.
what's final link inside the https://www.uniprot.org/help/downloads
thanks in advance!!

from camp.

rocke2020 avatar rocke2020 commented on September 28, 2024

Hi, you can download the sequence files from : https://www.uniprot.org/downloads.

Hi, twopin, thanks for your reply. When I visit the link https://www.uniprot.org/downloads, it represents as follows and which link needs to be clicked so as to get the file uniprot2seq_file. After download the file, should I further clean the data to get the columns Uniprot_id,Uniprot Sequence,Protein_name,Protein_families. image

image

Do you finally get the download link? If so, could share with us? thanks!!

from camp.

twopin avatar twopin commented on September 28, 2024

I've uploaded the exact data I downloaded when doing this project. Actually the file was directly downloaded from UniProt when just selecting all the fasta sequence file. You guys can directly use mine or downloaded the latest version.

from camp.

twopin avatar twopin commented on September 28, 2024

Hi, twopin. I also get trouble in the file format called crawl_results.csv in the module data_prepare/query-mapping.py, can you provide template data for this file. Thanks.

This file is the output file using the crawling script.

from camp.

twopin avatar twopin commented on September 28, 2024

The UniProt website has updated since 2020 and here is how to download now:
image
image

from camp.

rocke2020 avatar rocke2020 commented on September 28, 2024

@twopin thanks for your reply.
Do you get the Protein_families from
https://www.uniprot.org/uniprotkb?query=reviewed:false
or https://www.uniprot.org/help/downloads

I think the downloaded fasta file only have ProteinName which is not its protein family name, yes?
I think we make this issue to ask help to you on where get the protein family with uniprot id.
Or do you treat ProteinName as protein faimily name, if there is "MHC" in the ProteinName, filter out it?
We find protein family name in the link below.
https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar

db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

BTW, could you share: this paper use the review uniprot sp sequences or the unreviewed sequences? your picture is the unreviewed sequences, while you upload and codes seems to use review uniprot sp sequences.
thanks for your share!

from camp.

twopin avatar twopin commented on September 28, 2024

Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.
image
image

from camp.

twopin avatar twopin commented on September 28, 2024

@twopin thanks for your reply. Do you get the Protein_families from https://www.uniprot.org/uniprotkb?query=reviewed:false or https://www.uniprot.org/help/downloads

I think the downloaded fasta file only have ProteinName which is not its protein family name, yes? I think we make this issue to ask help to you on where get the protein family with uniprot id. Or do you treat ProteinName as protein faimily name, if there is "MHC" in the ProteinName, filter out it? We find protein family name in the link below. https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/docs/similar

db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

BTW, could you share: this paper use the review uniprot sp sequences or the unreviewed sequences? your picture is the unreviewed sequences, while you upload and codes seems to use review uniprot sp sequences. thanks for your share!

please see my reply below.

from camp.

rocke2020 avatar rocke2020 commented on September 28, 2024

Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.

@twopin , in your paper and your shared codes, you filtered protein sequence which belong to MHC protein family. (●'◡'●)

from camp.

twopin avatar twopin commented on September 28, 2024

Oh actually you don't need the protein family information for CAMP but if you do need that for downstreaming analysis, you can customize the column information on UniProt when downloading the sequence.

@twopin , in your paper and your shared codes, you filtered protein sequence which belong to MHC protein family. (●'◡'●)

Oh you mean filtering... I thought you were talking about the scripts in this Git. Actually you can get protein family from UniProt. First you have your list of multiple uniprot ids, and click "ID mapping" then load your list and click "map IDs". Then when the job finished, you click the job id and you can see a big table. Now, just click customize columns (figures below) and select protein families. You will get what you want.

from camp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.