genescf / genescf Goto Github PK

Gene Set Clustering based on Functional annotation

Perl 94.26% Shell 3.98% R 1.76%

gene enrichment-analysis gsea geneontology kegg-pathway reactome-pathway cancer real-time up-to-date download-pathways download-ontology functional-enrichment functional-enrichment-analysis gene-set-enrichment gene-set-clustering gene-set-analysis enrichment-testing

genescf's People

Contributors

Stargazers

Watchers

Forkers

crisvel miguel-m-o justintzeji dataxplora paolabc

genescf's Issues

Preparing Database for Rice (ORYZA)

hi, I am trying to prepare database for Oryza species using the following comands.
./prepare_database -db=GO_all -org=dosa (RAPD)
./prepare_database -db=GO_all -org=osa (Refseq)

But the database output files are empty. I am not sure why its doing so. looks like databases are not present link from where it's trying to download. Kindly help me sort this issue.

./prepare_database -db=GO_all -org=osa

Downloading GO database....
Extracting osa information...

gzip: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa.gz: unexpected end of file
cat: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa: No such file or directory
cat: /home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa: No such file or directory
Updating gene information...
Do not panic. The processing is going on...

gzip: /home/navi/software/GeneSCF/class/lib/db/gene_info.gz: invalid compressed data--format violated
cat: /home/navi/software/GeneSCF/class/lib/db/gene_info: No such file or directory
rm: cannot remove '/home/navi/software/GeneSCF/class/lib/db/gene_info': No such file or directory
rm: cannot remove '/home/navi/software/GeneSCF/class/lib/db/osa/gene_association.osa': No such file or directory
Database retreived..You are now ready to use geneSCF with organism osa from --database GO
Done....2022. 12. 08. (목) 08:59:36 KST

No output file - cygwin

Ran in cygwin and it says it completed successfully, but no files are output.

"$ ./geneSCF -i=./141120-Eb_publication_cnvs-Entrez.txt -t=gid -db=REACTOME -o=eb_publication_cnvs

processing started....Fri, Nov 21, 2014 11:57:55 AM

Run successful. Check your output directory eb_publication_cnvs

Parameters used:

background genes: 30000
Identitiy: Entrez GeneID
Database used: REACTOME
Output file: eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv
WARNING: Your output is not sorted with P-val/FDR.

Author: Santhilal Subhash
[email protected]
Last Updated: 2014 June 14
Fri, Nov 21, 2014 11:57:58 AM finished processing

rsicko@HR17423 ~/geneSCF-master
$ which eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv
which: no eb_publication_cnvs141120-Eb_publication_cnvs-Entrez.txt_REACTOME_functional_classification.tsv in (/usr/local/bin:/usr/bin:/cygdrive/c/Program Files/ImageMagick-6.8.9-Q16:/cygdrive/c/Perl/site/bin:/cygdrive/c/Perl/bin:/cygdrive/c/Windows/system32:/cygdrive/c/Windows:/cygdrive/c/Windows/System32/Wbem:/cygdrive/c/Program Files (x86)/Common Files/Roxio Shared/DLLShared:/cygdrive/c/Program Files (x86)/Common Files/Roxio Shared/10.0/DLLShared:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files (x86)/Lotus/Notes:/cygdrive/c/Windows/System32/WindowsPowerShell/v1.0:/cygdrive/c/Program Files (x86)/MATLAB/MATLAB Compiler Runtime/v714/runtime/win32:/cygdrive/c/Program Files/SlikSvn/bin:/cygdrive/c/Applied Biosystems/PeakScanner)"

CLI for KEGG analysis getting stopped with errors

Hi Lal and others,

So I tried to work with geneSCF for KEGG analysis. I had executed the code as mentioned in the docs. The gene list is a csv file with gene names in rows. It works for Go_all not KEGG

Steps i additionally tried,
a. Created a directory manually
b. I gave a chmod 777 so that the program can create and write within directories.

end up with this error.

Does anyone have a work around, I can send you the gene list if you would like.

unable to update GO database files

I tried to update GO database, but failed.
command line: ./prepare_database -db=GO_all -org=goa_human

The error is
gzip: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human.gz: unexpected end of file
cat: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human: No such file or directory
cat: //geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human: No such file or directory
Updating gene information...
Do not panic. The processing is going on...
rm: cannot remove '//geneSCF-master-source-v1.1-p2/class/lib/db/goa_human/gene_association.goa_human': No such file or directory
Database retreived..You are now ready to use geneSCF with organism goa_human from --database GO
Done....

The annotation files, like GO_all_gid.txt, is blank and 0 bytes.
I use this in ubuntu 16.04. It works fine to update KEGG database. Does anyone know how to fix this?
Thanks a lot.

using "-m=update" yields empty database files?

Hi, thank you for this excellent library, I've used it successfully for several years. Today I tried the "-m=update" command with GO_BP and goa_human, e.g.,:

./GeneSCF-1.1-p3.beta/geneSCF -m=update -i="./GeneSCF-1.1-p3.beta/io/xxxxx.csv" -o="./GeneSCF-1.1-p3.beta/io/" -t=gid -db=GO_BP -bg=20000 --plot=no, -org=goa_human

But this seemingly failed to unpack the files correctly. It produced a gzip error:

/GeneSCF-1.1-p3.beta/class/lib/db/goa_human/gene_association.goa_human.gz: unexpected end of file

and yielded empty files (ls -l):

I tried using /prepare_database but received the same result. My workaround was a fresh install with the -m=normal parameter, which returned successfully. Could you please investigate and advise? Thank you.

Doesn't work if I change the database

I have the same problem as her(#9).
I performed GO analysis on the fish DEGs (DEGs.txt).
The database used zebrafish (zfin).

./prepare_database -db=GO_BP -org=zfin

DEGs.txt head

MROH1
TMEM67
PLAAT4
ENDOD1
FAM111A

The following is the output result.

geneSCF -m=normal -i=DEGs.txt -o=test/output/ -t=sym -db=GO_BP -bg=20000 --plot=yes -org=zfin

=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_BP last updated 2021-06-07 18:30
Illegal division by zero at geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, <IN2> chunk 1.

I changed the database to a person and it ran successfully.

geneSCF -m=normal -i=DEGs.txt -o=test/output/ -t=sym -db=GO_BP -bg=20000 --plot=yes -org=goa_human

=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).

=================
Run successful. Check your output directory test/output/ 
=================

Parameters used:

Organism:		Human/Homo sapiens
background genes:	20000
Identitiy:		Gene Symbol
Database used:		GO_BP
Output file:		test/output/DEGs.txt_GO_BP_functional_classification.tsv
		WARNING: Your output is not sorted with P-val/FDR.


---------------------

Author: Santhilal Subhash
[email protected]
GeneSCF elapse time 3 seconds

Is there a problem with the zfin database?

GO uniprot database not working

Hi,
Thanks for the tool.
I tried to prepare the database using the following code:
perl prepare_database -db=GO_all -org=goa_uniprot

But then I get the following errors:
rm: cannot remove '/XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/gene_info_limit.gz': No such file or directory
Downloading GO database....
Extracting goa_uniprot information...

gzip: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot.gz: unexpected end of file
cat: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot: No such file or directory
cat: /XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot: No such file or directory

rm: cannot remove '/XX/XX/Desktop/geneSCF-master-source-v1.1-p2/class/lib/db/goa_uniprot/gene_association.goa_uniprot': No such file or directory
Database retreived..You are now ready to use geneSCF with organism goa_uniprot from --database GO

After, I try to run the test data using this DB but it keeps giving the same error.

Thanks

Cannot run genscf with a locally constructed database for E. coli

Dear Author,

I am desperately in need of your GeneSCF results but I cannot manage to make it work. I am working on E. coli MG1655 and when I prepared the database with ecocyc I realized that the file format was not good (wrong order of informations, a "t" instead of a "~" and other stuff). I then decided to reconstruct a GO_BP_sym.txt from UNIPROT (see the following attachment). But I still get errors after running your script. I pasted the message from the terminal window if this can help ...
Many thanks for your help !
Nicolas GINET, research scinetist at CNRS
[GO_BP_sym.txt] (https://github.com/genescf/GeneSCF/files/9657541/GO_BP_sym.txt)

root@lemale:/media/pc-ma-le/DATA/Bioinfo/GeneSCF# ./geneSCF -m=normal -i=Input/BACCHATposFC4.list -o=Output/ -t=sym -db=GO_BP -bg=4000 --plot=yes -org=eco
processing in 'normal' mode started....mardi 27 septembre 2022, 17:26:48 (UTC+0200)
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_BP last updated 2022-09-27 16:54
Illegal division by zero at /media/pc-ma-le/DATA/Bioinfo/GeneSCF/class/lib/List/Vectorize/lib/List.pl line 599, chunk 1.
mardi 27 septembre 2022, 17:26:48 (UTC+0200) finished processing
root@lemale:/media/pc-ma-le/DATA/Bioinfo/GeneSCF#

'ggplot2' issue

I believe I have ggplot2 installed in my global environment however I don't see any bubble plots of my output files when I run my commands. Is there a way I am miss understanding the installation needs?

Error: Illegal division by zero at List.pl line 599, <IN2> chunk 1

command line: ./geneSCF -m=normal -i=/Users/.../.../geneID_list.txt -t=gid -o=simple1_CSF.txt -db=GO_all -p=yes -bg=24850 -org=mgi

ERROR:
processing in 'normal' mode started....Fri 9 Nov 2018 17:31:39 EST
=> Finished retriving database...
=> Calculating statistics...
find: -printf: unknown primary or operator
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_all last updated
Illegal division by zero at /Users/.../.../geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, chunk 1.
Fri 9 Nov 2018 17:31:40 EST finished processing

Error: Illegal division by zero at //List.pl line 599, <IN2> chunk 1

Hi,

I've just tried to run geneSCF on my own data and got the following error message:
Illegal division by zero at //class/lib/List/Vectorize/lib/List.pl line 599, chunk 1
Any suggestion?
Thank you in advance,

Tony

Illegal division by zero at /geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, <IN2> chunk 1.

Hi there,

I'm trying to run geneSCF but get an error after downloading the relevant database and trying to run the go enrichment.

I first use the command:
./prepare_database -db=GO_BP -org=tair
and then run go enrichment with:
./geneSCF -m=normal -i=../1.ATtest -db=GO_BP -org=tair -o=1.go -t=gid --plot=no --background=27462

but get the following error:

GO_BP last updated 2021-01-15 15:58
Illegal division by zero at /ohta/julia.kreiner/waterhemp/data/fixed_assembly/reveal_psuedoassembly/toshare/permuted_outliers/geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599, <IN2> chunk 1.
Fri Jan 15 16:05:14 EST 2021 finished processing

Thanks for your help.

GeneSCF - Error opening input file and Illegal division by zero

Hello,

I have been trying to use GeneSCF v1.1 and have faced issues from the beginning. Now, I can easily run the test provided by the package but I cannot open my own gene list file.
Initially I had this file made in a text file using windows excel/ or saving from R as a text file. I noticed the problem of CRFL in windows and FL endings in linux, so I made my list using Gedit, I also converted my original list using dos2unix to have a unix formatted text file. But, I am unable to open my input file and I get the following error whether I use the update mode or normal mode after preparing the database.

~~/GSF/geneSCF-master-source-v1.1-p2# ./geneSCF -m=normal -i=test/DEG.list -o=~~/GSF/output/ -t=sym -db=GO_MF -bg=20000 --plot=yes -org=mgi
processing in 'normal' mode started....Sat Oct 12 16:45:41 EDT 2019
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_MF last updated 2019-10-12 16:20

Error opening input file: test/DEG.list

Illegal division by zero at /root/GSF/geneSCF-master-source-v1.1-p2/class/lib/List/Vectorize/lib/List.pl line 599.
Sat Oct 12 16:45:42 EDT 2019 finished processing

I also need to add that I use ubuntu on a windows machine.

Can anyone help me figure out the problem.

Unable to start geneSCF

Hi to all
I'm trying to make an analysis of gene ontology in a set of differentially expressed genes of prunus persica. To do so I provide to the program with a list in a .txt file with all the prupes of my analysis and run the following code line
[CODE]
./geneSCF -m=update -i=GEnes_Go.txt -t=sym -o=/Go_Prupe -db=KEGG -p=yes -org=pper
[/CODE]

and I get the following

[CODE]
Since you have selected 'update' mode. It will take a while to prepare new updated database
Connecting remote RUD..
processing started....mar ago 30 11:13:14 CLST 2016
Retreiving 129 KEGG pathways for pper
Do not panic. The processing is going on...
Database retreived..You are now ready to use geneSCF with organism pper from --database KEGG
Done....mar ago 30 11:20:23 CLST 2016
=>processing in update started....mar ago 30 11:20:23 CLST 2016
=> Finished retriving database...
=> Calculating statistics...
find: «pper/class/lib/db/yes/kegg_database.txt»: No existe el archivo o el directorio
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
KEGG last updated

Example input types

gid | sym
=> Retreving gene list for yes from KEGG
sh: 1: cannot create pper/mapping/DB/GEnes_Go.txt_gene_list.txt: Directory nonexistent
=> Mapping user list
Can't open perl script "pper/class/scripts/mappingIDS.pl": No existe el archivo o el directorio
sh: 1: cannot create /Go_Prupe/GEnes_Go.txt_user_mapped.list: Directory nonexistent
cat: /Go_Prupe/GEnes_Go.txt_user_mapped.list: No existe el archivo o el directorio
sh: 1: cannot create pper/mapping/GEnes_Go.txt_input_list.txt: Directory nonexistent
wc: pper/mapping/GEnes_Go.txt_input_list.txt: No existe el archivo o el directorio
Note: There were genes mapped from 324 user provided unique genes (0 %)
Please cross-check your gene identifier.mar ago 30 11:20:25 CLST 2016 finished processing
[/CODE]

And I don't know where to look for the missing files, please help

run the .php file

Hi,
I downloaded the software
downloads.php
I'm not sure how to open or run the .php file.

About the database

There is something I don't understand.
I see that there are 362 genes in geneontology with the number GO:0006936. but the data downloaded with genescf has a total of 104 genes. I guess only direct annotation data is downloaded. I'm wondering is how can I download all data from Gene ontology "without direct annotation".
Or why should I use this data in this way?

Unable to run geneSCF on linux

When I ran geneSF using this command, it gives me this error message.
Command:
./geneSCF -m=normal -i=test/annotation.tsv -o=test/output/ -t=sym -db=GO_MF -bg=20000 --plot=yes -org=goa_human
Message:
processing in 'normal' mode started....Sun May 24 08:41:35 PDT 2020
=> Finished retriving database...
=> Calculating statistics...
Note:Only KEGG and Geneontology supports multiple organisms (GeneSCF-xx/org_codes_help). If you choose REACTOME/NCG database please specify organism as 'Hs'. Currently REACTOME and NCG in GeneSCF only supports Human (Hs).
GO_MF last updated 2017-07-07 14:04
nohup: redirecting stderr to stdout

Run successful. Check your output directory test/output/

Parameters used:

Organism: Human/Homo sapiens
background genes: 20000
Identitiy: Gene Symbol
Database used: GO_MF
Output file: test/output/annotation.tsv_GO_MF_functional_classification.tsv
WARNING: Your output is not sorted with P-val/FDR.

Author: Santhilal Subhash
[email protected]
GeneSCF elapse time 6 seconds
Sun May 24 08:41:41 PDT 2020 finished processing

Furthermore, because I saw your previous post about the getting such error could due to database not being updated. Then, I tried using this command to update the database.
Command:./prepare_database -db=[GO_all|GO_BP|GO_MF|GO_CC|KEGG|REACTOME] -org=[hsa]
Message:
KEGG: command not found
GO_CC: command not found
REACTOME]: command not found
GO_MF: command not found
GO_BP: command not found

Please help! Thank you so much for your time

NCG database not working

Hello and thank you for creating GeneSCF. We are using it with great results, but when we try the NCG database it doesn't work on the latest version. I have reviewed the code and I see it has been disabled, but this is not stated in the documentation. Is it possible to make it work?

Best,
Luis.

KEGG hsa database retreiving issue

Hi, and thanks for creating GeneSCF,

We are using GeneSCF to annotate genes with the KEGG database. However, we occurred an issue that 0 genes can be mapped, as shown in the attached screenshot. We guess there is an issue in retrieving the KEGG database, as the prepare_database retrieves 0 pathways and the kegg_database.txt in class/lib/db/hsa folder is empty. Could you please tell us how to make it works?

Best,
Zhenwen