genescf / genescf Goto Github PK
View Code? Open in Web Editor NEWGene Set Clustering based on Functional annotation
Gene Set Clustering based on Functional annotation
hi, I am trying to prepare database for Oryza species using the following comands.
./prepare_database -db=GO_all -org=dosa (RAPD)
./prepare_database -db=GO_all -org=osa (Refseq)
But the database output files are empty. I am not sure why its doing so. looks like databases are not present link from where it's trying to download. Kindly help me sort this issue.
./prepare_database -db=GO_all -org=osa
Downloading GO database....
Extracting osa information...
gzip: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa.gz: unexpected end of file
cat: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa: No such file or directory
cat: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa: No such file or directory
Updating gene information...
Do not panic. The processing is going on...
gzip: /home/navi/software/GeneSCF/class/lib/db/gene_info.gz: invalid compressed data--format violated
cat: /home/navi/software/GeneSCF/class/lib/db/gene_info: No such file or directory
rm: cannot remove '/home/navi/software/GeneSCF/class/lib/db/gene_info': No such file or directory
rm: cannot remove '/home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa': No such file or directory
Database retreived..You are now ready to use geneSCF with organism osa from --database GO
Done....2022. 12. 08. (목) 08:59:36 KST
Ran in cygwin and it says it completed successfully, but no files are output.
"$ ./geneSCF -i=./141120-Eb_publication_cnvs-Entrez.txt -t=gid -db=REACTOME -o=eb_publication_cnvs
Parameters used:
background genes: 30000
Identitiy: Entrez GeneID
Database used: REACTOME
Output file: eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv
WARNING: Your output is not sorted with P-val/FDR.
Author: Santhilal Subhash
[email protected]
Last Updated: 2014 June 14
Fri, Nov 21, 2014 11:57:58 AM finished processing
rsicko@HR17423 ~/geneSCF-master
$ which eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv
which: no eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv in (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/ImageMagick-6.8.9-Q16:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Program Files (x86)/Common Files/Roxio Shared/DLLShared:/cygdrive/c/Program Files (x86)/Common Files/Roxio Shared/10.0/DLLShared:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files (x86)/Lotus/Notes:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files (x86)/MATLAB/MATLAB Compiler Runtime/v714/runtime/win32:/cygdrive/c/Program Files/SlikSvn/bin:/cygdrive/c/Applied Biosystems/PeakScanner)"
Hi Lal and others,
So I tried to work with geneSCF for KEGG analysis. I had executed the code as mentioned in the docs. The gene list is a csv file with gene names in rows. It works for Go_all not KEGG
Steps i additionally tried,
a. Created a directory manually
b. I gave a chmod 777 so that the program can create and write within directories.
end up with this error.
Does anyone have a work around, I can send you the gene list if you would like.
I tried to update GO database, but failed.
command line: ./prepare_database -db=GO_all -org=goa_human
The error is
gzip: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human.gz: unexpected end of file
cat: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human: No such file or directory
cat: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human: No such file or directory
Updating gene information...
Do not panic. The processing is going on...
rm: cannot remove '//geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human': No such file or directory
Database retreived..You are now ready to use geneSCF with organism goa_human from --database GO
Done....
The annotation files, like GO_all_gid.txt, is blank and 0 bytes.
I use this in ubuntu 16.04. It works fine to update KEGG database. Does anyone know how to fix this?
Thanks a lot.
Hi, thank you for this excellent library, I've used it successfully for several years. Today I tried the "-m=update" command with GO_BP
and goa_human
, e.g.,:
./GeneSCF-1.1-p3.beta/geneSCF -m=update -i="./GeneSCF-1.1-p3.beta/io/xxxxx.csv" -o="./GeneSCF-1.1-p3.beta/io/" -t=gid -db=GO_BP -bg=20000 --plot=no, -org=goa_human
But this seemingly failed to unpack the files correctly. It produced a gzip error:
/GeneSCF-1.1-p3.beta/class/lib/db/goa_human/gene_association.goa_human.gz: unexpected end of file
and yielded empty files (ls -l
):
.
I tried using /prepare_database
but received the same result. My workaround was a fresh install with the -m=normal
parameter, which returned successfully. Could you please investigate and advise? Thank you.
I have the same problem as her(#9).
I performed GO analysis on the fish DEGs (DEGs.txt).
The database used zebrafish (zfin).
./prepare_database -db=GO_BP -org=zfin
DEGs.txt head
MROH1
TMEM67
PLAAT4
ENDOD1
FAM111A
The following is the output result.
geneSCF -m=normal -i=DEGs.txt -o=test/output/ -t=sym -db=GO_BP -bg=20000 --plot=yes -org=zfin
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_BP last updated 2021-06-07 18:30
Illegal division by zero at geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, <IN2> chunk 1.
I changed the database to a person and it ran successfully.
geneSCF -m=normal -i=DEGs.txt -o=test/output/ -t=sym -db=GO_BP -bg=20000 --plot=yes -org=goa_human
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
=================
Run successful. Check your output directory test/output/
=================
Parameters used:
Organism: Human/Homo sapiens
background genes: 20000
Identitiy: Gene Symbol
Database used: GO_BP
Output file: test/output/DEGs.txt_GO_BP_functional_classification.tsv
WARNING: Your output is not sorted with P-val/FDR.
---------------------
Author: Santhilal Subhash
[email protected]
GeneSCF elapse time 3 seconds
Is there a problem with the zfin database?
Hi,
Thanks for the tool.
I tried to prepare the database using the following code:
perl prepare_database -db=GO_all -org=goa_uniprot
But then I get the following errors:
rm: cannot remove '/XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/gene_info_limit.gz': No such file or directory
Downloading GO database....
Extracting goa_uniprot information...
gzip: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot.gz: unexpected end of file
cat: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot: No such file or directory
cat: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot: No such file or directory
rm: cannot remove '/XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot': No such file or directory
Database retreived..You are now ready to use geneSCF with organism goa_uniprot from --database GO
After, I try to run the test data using this DB but it keeps giving the same error.
Thanks
Dear Author,
I am desperately in need of your GeneSCF results but I cannot manage to make it work. I am working on E. coli MG1655 and when I prepared the database with ecocyc I realized that the file format was not good (wrong order of informations, a "t" instead of a "~" and other stuff). I then decided to reconstruct a GO_BP_sym.txt from UNIPROT (see the following attachment). But I still get errors after running your script. I pasted the message from the terminal window if this can help ...
Many thanks for your help !
Nicolas GINET, research scinetist at CNRS
[GO_BP_sym.txt] (https://github.com/genescf/GeneSCF/files/9657541/GO_BP_sym.txt)
root@lemale:/media/pc-ma-le/DATA/Bioinfo/GeneSCF# ./geneSCF -m=normal -i=Input/BACCHATposFC4.list -o=Output/ -t=sym -db=GO_BP -bg=4000 --plot=yes -org=eco
processing in 'normal' mode started....mardi 27 septembre 2022, 17:26:48 (UTC+0200)
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_BP last updated 2022-09-27 16:54
Illegal division by zero at /media/pc-ma-le/DATA/Bioinfo/GeneSCF/class/lib/List/Vectorize/lib/List.pl line 599, chunk 1.
mardi 27 septembre 2022, 17:26:48 (UTC+0200) finished processing
root@lemale:/media/pc-ma-le/DATA/Bioinfo/GeneSCF#
I believe I have ggplot2 installed in my global environment however I don't see any bubble plots of my output files when I run my commands. Is there a way I am miss understanding the installation needs?
command line: ./geneSCF -m=normal -i=/Users/.../.../geneID_list.txt -t=gid -o=simple1_CSF.txt -db=GO_all -p=yes -bg=24850 -org=mgi
ERROR:
processing in 'normal' mode started....Fri 9 Nov 2018 17:31:39 EST
=> Finished retriving database...
=> Calculating statistics...
find: -printf: unknown primary or operator
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_all last updated
Illegal division by zero at /Users/.../.../geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, chunk 1.
Fri 9 Nov 2018 17:31:40 EST finished processing
Hi,
I've just tried to run geneSCF on my own data and got the following error message:
Illegal division by zero at //class/lib/List/Vectorize/lib/List.pl line 599, chunk 1
Any suggestion?
Thank you in advance,
Tony
Hi there,
I'm trying to run geneSCF but get an error after downloading the relevant database and trying to run the go enrichment.
I first use the command:
./prepare_database -db=GO_BP -org=tair
and then run go enrichment with:
./geneSCF -m=normal -i=../1.ATtest -db=GO_BP -org=tair -o=1.go -t=gid --plot=no --background=27462
but get the following error:
GO_BP last updated 2021-01-15 15:58
Illegal division by zero at /ohta/julia.kreiner/waterhemp/data/fixed_assembly/reveal_psuedoassembly/toshare/permuted_outliers/geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, <IN2> chunk 1.
Fri Jan 15 16:05:14 EST 2021 finished processing
Thanks for your help.
Hello,
I have been trying to use GeneSCF v1.1 and have faced issues from the beginning. Now, I can easily run the test provided by the package but I cannot open my own gene list file.
Initially I had this file made in a text file using windows excel/ or saving from R as a text file. I noticed the problem of CRFL in windows and FL endings in linux, so I made my list using Gedit, I also converted my original list using dos2unix to have a unix formatted text file. But, I am unable to open my input file and I get the following error whether I use the update mode or normal mode after preparing the database.
/GSF/geneSCF-master-source-v1.1-p2# ./geneSCF -m=normal -i=test/DEG.list -o=/GSF/output/ -t=sym -db=GO_MF -bg=20000 --plot=yes -org=mgi
processing in 'normal' mode started....Sat Oct 12 16:45:41 EDT 2019
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_MF last updated 2019-10-12 16:20
Error opening input file: test/DEG.list
Illegal division by zero at /root/GSF/geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599.
Sat Oct 12 16:45:42 EDT 2019 finished processing
I also need to add that I use ubuntu on a windows machine.
Can anyone help me figure out the problem.
Hi to all
I'm trying to make an analysis of gene ontology in a set of differentially expressed genes of prunus persica. To do so I provide to the program with a list in a .txt file with all the prupes of my analysis and run the following code line
[CODE]
./geneSCF -m=update -i=GEnes_Go.txt -t=sym -o=/Go_Prupe -db=KEGG -p=yes -org=pper
[/CODE]
and I get the following
[CODE]
Since you have selected 'update' mode. It will take a while to prepare new updated database
Connecting remote RUD..
processing started....mar ago 30 11:13:14 CLST 2016
Retreiving 129 KEGG pathways for pper
Do not panic. The processing is going on...
Database retreived..You are now ready to use geneSCF with organism pper from --database KEGG
Done....mar ago 30 11:20:23 CLST 2016
=>processing in update started....mar ago 30 11:20:23 CLST 2016
=> Finished retriving database...
=> Calculating statistics...
find: «pper/class/lib/db/yes/kegg_database.txt»: No existe el archivo o el directorio
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
KEGG last updated
gid | sym
=> Retreving gene list for yes from KEGG
sh: 1: cannot create pper/mapping/DB/GEnes_Go.txt_gene_list.txt: Directory nonexistent
=> Mapping user list
Can't open perl script "pper/class/scripts/mappingIDS.pl": No existe el archivo o el directorio
sh: 1: cannot create /Go_Prupe/GEnes_Go.txt_user_mapped.list: Directory nonexistent
cat: /Go_Prupe/GEnes_Go.txt_user_mapped.list: No existe el archivo o el directorio
sh: 1: cannot create pper/mapping/GEnes_Go.txt_input_list.txt: Directory nonexistent
wc: pper/mapping/GEnes_Go.txt_input_list.txt: No existe el archivo o el directorio
Note: There were genes mapped from 324 user provided unique genes (0 %)
Please cross-check your gene identifier.mar ago 30 11:20:25 CLST 2016 finished processing
[/CODE]
And I don't know where to look for the missing files, please help
Hi,
I downloaded the software
downloads.php
I'm not sure how to open or run the .php file.
There is something I don't understand.
I see that there are 362 genes in geneontology with the number GO:0006936. but the data downloaded with genescf has a total of 104 genes. I guess only direct annotation data is downloaded. I'm wondering is how can I download all data from Gene ontology "without direct annotation".
Or why should I use this data in this way?
When I ran geneSF using this command, it gives me this error message.
Command:
./geneSCF -m=normal -i=test/annotation.tsv -o=test/output/ -t=sym -db=GO_MF -bg=20000 --plot=yes -org=goa_human
Message:
processing in 'normal' mode started....Sun May 24 08:41:35 PDT 2020
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_MF last updated 2017-07-07 14:04
nohup: redirecting stderr to stdout
Run successful. Check your output directory test/output/
Parameters used:
Organism: Human/Homo sapiens
background genes: 20000
Identitiy: Gene Symbol
Database used: GO_MF
Output file: test/output/annotation.tsv_GO_MF_functional_classification.tsv
WARNING: Your output is not sorted with P-val/FDR.
Author: Santhilal Subhash
[email protected]
GeneSCF elapse time 6 seconds
Sun May 24 08:41:41 PDT 2020 finished processing
Furthermore, because I saw your previous post about the getting such error could due to database not being updated. Then, I tried using this command to update the database.
Command:./prepare_database -db=[GO_all|GO_BP|GO_MF|GO_CC|KEGG|REACTOME] -org=[hsa]
Message:
KEGG: command not found
GO_CC: command not found
REACTOME]: command not found
GO_MF: command not found
GO_BP: command not found
Please help! Thank you so much for your time
Hello and thank you for creating GeneSCF. We are using it with great results, but when we try the NCG database it doesn't work on the latest version. I have reviewed the code and I see it has been disabled, but this is not stated in the documentation. Is it possible to make it work?
Best,
Luis.
Hi, and thanks for creating GeneSCF,
We are using GeneSCF to annotate genes with the KEGG database. However, we occurred an issue that 0 genes can be mapped, as shown in the attached screenshot. We guess there is an issue in retrieving the KEGG database, as the prepare_database retrieves 0 pathways and the kegg_database.txt in class/lib/db/hsa folder is empty. Could you please tell us how to make it works?
Best,
Zhenwen
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.