sunpengchuan / wgdi Goto Github PK

View Code? Open in Web Editor NEW

105.0 3.0 22.0 1.24 MB

WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes

Home Page: https://wgdi.readthedocs.io/en/latest/

License: BSD 2-Clause "Simplified" License

Python 100.00%

bioinformatics collinearity polyploidy ancestral-chromosomal-karyotype

wgdi's People

Contributors

Stargazers

Watchers

wgdi's Issues

Error when run wgdi alignment

Dear Sun,

Wgdi is a wonderful tool for genomic evolution analysis, but i m in plight when run wgdi -a
here is STDERR when run this comman

gff1  =  BhD.wgdi.gff
gff2  =  Bdis.wgdi.gff
lens1  =  BhD.wgdi.lens
lens2  =  Bdis.wgdi.lens
genome1_name  =  BhD
genome2_name  =  Bdis
markersize  =  0.5
position  =  order
colors  =  red
figsize  =  10,10
savefile  =  BhD-Bdis.alignment.csv
savefig  =  BhD-Bdis.alignment.png
blockinfo  =  BhD-Bdis.blockinfo.list.csv
classid  =  class1
ks_area  =  -1,0.3
blockinfo_reverse  =  false
Traceback (most recent call last):
  File "/data/00/user/user112/.local/bin//wgdi", line 11, in <module>
    sys.exit(main())
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/wgdi/run.py", line 148, in main
    module_to_run(arg, value)
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/wgdi/run.py", line 110, in module_to_run
    run_subprogram(program, conf, name)
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/wgdi/run.py", line 78, in run_subprogram
    r.run()
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/wgdi/align_dotplot.py", line 88, in run
    gff2 = base.gene_location(gff2, lens2, step2, self.position)
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/wgdi/base.py", line 208, in gene_location
    gff.loc[:, 'loc'] = ''
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 670, in __setitem__
    iloc._setitem_with_indexer(indexer, value)
  File "/data/00/user/user112/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1589, in _setitem_with_indexer
    "cannot set a frame with no "
ValueError: cannot set a frame with no defined index and a scalar

and all files needed in this step has been uploaded in appendix.
I don't know anything about pandas and python, so i can't fix this problem by myself, i will be very grateful if you tell me what's wrong with it and how to solve it.

Nemo Wu
wgdi_nemowu.zip

how much time ’wgdi -icl total.conf‘ need for vvi161s？

As above.

No "-bi" result file generation

Hello. @SunPengChuan

Just for this one comparison, I ran the comparison for both species but the result file generated with the "-bi" parameter is empty. I carefully checked the .conf file but no problem was found. The comparison of the two species by themselves and the comparison of the two species with any other species produces the results normally. So I don't really understand what is causing this. Can you take a look for me?

Ro_cac.conf_et_al.zip
Ro_cac.cds.zip
Ro_cac.pep.zip

PeaksFit do not show two peaks

Hi there,
I have the following pattern showing two peaks.

However, when I try to plot the PeaksFit, I have got this:

How to get two (or more) sets of Gaussian curves?

I was not able to follow the solution found in: #14

wgdi -icl error

When using wgdi, I encountered the following question. Can you help me answer it? thanks.

gff1 = mgly.gff
gff2 = tyun.chr.gene.gff
lens1 = mgly.len
lens2 = tyun.len
blast = tyun_mgly.txt
blast_reverse = false
multiple = 1
process = 20
evalue = 1e-5
score = 100
grading = 50,40,25
mg = 25,25
pvalue = 1
repeat_number = 2
positon = order
savefile = tyun_mgly.collinearity.txt
Traceback (most recent call last):
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/appl/anaconda3/envs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run_colliearity.py", line 65, in run
lens2 = base.newlens(self.lens2, 'order')
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/base.py", line 187, in newlens
lens = lens[2]
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/pandas/core/frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "/home/appl/anaconda3/envs/wgdi/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 2

ks result is empty

Hello
I prepared the total.conf as follows but the result is empty. Could you please guide?

[ks]
cds_file = /scratch/project_mnt/S0030/wgdi/male-female/male.cds.fa
pep_file = /scratch/project_mnt/S0030/wgdi/male-female/male.pep.fa
align software = muscle
pairs_file = /scratch/project_mnt/S0030/wgdi/male-female/collinearity_file
ks_file = ks_result

the collinerarity file is from Improved collinearity step.

Also how to prepare files for two species?

ERROR: inconsistency between the following pep and nuc seqs

when I run the wgdi ks , i met the following error

#---  ERROR: inconsistency between the following pep and nuc seqs  ---#
>gene_name
cds seq
>gene_name
pep seq

which: no bl2seq in (xxx)
Run bl2seq (-p tblastn) or GeneWise to see the inconsistency.

When search the bl2seq, I found the bl2seq was only in legacy blast (see https://www.biostars.org/p/17580/

Could I ignore this error, or how could I solve the problem.

issue with deal_gff.py

Hi, Its a great tool. I have some issues with wgdi.
I get following error:

/home/wgdi-example/genome/Aquilegia_coerulea/Aquilegia_coerulea/deal_gff.py:24: FutureWarning: The default value of regex will change from True to False in a future version.
gff[0] = gff[0].str.replace('Chr0?','')
Traceback (most recent call last):
File "/home/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 4554, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 4562, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 2

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/wgdi-example/genome/Aquilegia_coerulea/Aquilegia_coerulea/deal_gff.py", line 20, in
gff = gff[gff[2] == 'CDS']
File "/home/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3024, in getitem
indexer = self.columns.get_loc(key)
File "/home/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
raise KeyError(key) from err
KeyError: 2

Thanks

KeyError: 'evm.model.Chr11.615.evm.model.Chr11.616'

wgdi -ks pto.conf
when running this code I encountered this problem, and I can't figure out why, the following things are the error tips

cds_file = pto.cds
pep_file = pto.pep
align_software = muscle
pairs_file = pto.collinearity.txt
ks_file = pto.ks
Traceback (most recent call last):
File "/home/lip/miniconda3/envs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
File "/home/lip/miniconda3/envs/wgdi/lib/python3.9/site-packages/wgdi/run.py", line 218, in main
module_to_run(arg)
File "/home/lip/miniconda3/envs/wgdi/lib/python3.9/site-packages/wgdi/run.py", line 183, in module
return switcher.get(argument)()
File "/home/lip/miniconda3/envs/wgdi/lib/python3.9/site-packages/wgdi/run.py", line 151, in run_ca
calks.run()
File "/home/lip/miniconda3/envs/wgdi/lib/python3.9/site-packages/wgdi/ks.py", line 101, in run
kaks = self.pair_kaks(k)
File "/home/lip/miniconda3/envs/wgdi/lib/python3.9/site-packages/wgdi/ks.py", line 120, in pair_ka
kaks_new = [kaks[k[0]][k[1]]['NG86']['dN'], kaks[k[0]][k[1]]['NG86']
KeyError: 'evm.model.Chr11.615.evm.model.Chr11.616'

Problem of ancestral_karyotype_repertoire

Hello, thank you very much for this program !

I am doing the step of ancestral_karyotype_repertoire, and now i have a problem about it.
Here is my script (ANC.akr.conf):

[ancestral_karyotype_repertoire]
blockinfo =  Cipangopaludina_cathayensis.ANC.block.correspondence.csv
blockinfo_reverse = False
gff1 = Cipangopaludina_cathayensis.gff4
gff2 = allspe.ancestor.gff
gap = 5
mark = ANC.on.Cipangopaludina_cathayensis
ancestor = allspe.ancestor.txt
ancestor_new =  allspe.ancestor.on.Cipangopaludina_cathayensis.txt
ancestor_pep =  allspe.ancestor.on.Cipangopaludina_cathayensis.pep
ancestor_pep_new =  allspe.ancestor.on.Cipangopaludina_cathayensis.pep_new
ancestor_gff =  allspe.ancestor.on.Cipangopaludina_cathayensis.gff
ancestor_lens =  allspe.ancestor.on.Cipangopaludina_cathayensis.lens

and i run it with the commond: wgdi -akr ANC.akr.conf
I got the error message as follows:

+ wgdi -akr ANC.akr.conf
/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/scipy/__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.26.0 is required for this version of SciPy (detected version 1.26.0
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
blockinfo  =  Cipangopaludina_cathayensis.ANC.block.correspondence.csv
blockinfo_reverse  =  False
gff1  =  Cipangopaludina_cathayensis.gff4
gff2  =  allspe.ancestor.gff
gap  =  5
mark  =  ANC.on.Cipangopaludina_cathayensis
ancestor  =  allspe.ancestor.txt
ancestor_new  =  allspe.ancestor.on.Cipangopaludina_cathayensis.txt
ancestor_pep  =  allspe.ancestor.on.Cipangopaludina_cathayensis.pep
ancestor_pep_new  =  allspe.ancestor.on.Cipangopaludina_cathayensis.pep_new
ancestor_gff  =  allspe.ancestor.on.Cipangopaludina_cathayensis.gff
ancestor_lens  =  allspe.ancestor.on.Cipangopaludina_cathayensis.lens
Traceback (most recent call last):
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3790, in get_loc
    return self._engine.get_loc(casted_key)
  File "index.pyx", line 152, in pandas._libs.index.IndexEngine.get_loc
  File "index.pyx", line 181, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '1'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/public1/home/stu_qs/miniconda3/bin/wgdi", line 10, in <module>
    sys.exit(main())
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/wgdi/run.py", line 163, in main
    module_to_run(arg, value)
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/wgdi/run.py", line 122, in module_to_run
    run_subprogram(program, conf, name)
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/wgdi/run.py", line 87, in run_subprogram
    r.run()
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/wgdi/ancestral_karyotype_repertoire.py", line 63, in run
    ancestor.at[index, 2] = lens.at[str(row[0]),'order']
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 2488, in __getitem__
    return super().__getitem__(key)
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/pandas/core/indexing.py", line 2440, in __getitem__
    return self.obj._get_value(*key, takeable=self._takeable)
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/pandas/core/frame.py", line 4012, in _get_value
    row = self.index.get_loc(index)
  File "/public1/home/stu_qs/miniconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3797, in get_loc
    raise KeyError(key) from err
KeyError: '1'

This is the file I used to run it. To avoid the uploading error, i changed the suffix of the gff file to gff.txt
allspe.ancestor.txt
Cipangopaludina_cathayensis.ANC.block.correspondence.csv
allspe.ancestor.gff.txt
Cipangopaludina_cathayensis.gff4.txt

Could you please help me check the problem?

Why polyploidy classification without a underline (_)

Hi,

All parameters name with a underline (_) to connect spaces except [polyploidy classification], why?

[polyploidy classification]
blockinfo = block information (*.csv)
ancestor_left = ancestor file
ancestor_top = ancestor file
classid = class1,class2
savefile = result file(.csv)

Best,
Kun

Preparation of input file

Hello SunPeng
I could not find the python files 01, 02, 03 for the preparation of .gff and .lens files.
Are they integrated with the Wgdi tool?
Could you please simply give some command examples how to use these .py files for the preparation of input files?
The videos are not very clear.
Regards

Error in calculate Ks

Hi sun,
When I run:
wgdi -ks ks.conf.txt
wgdi report an error

cds_file  =  cds_represent.fa
pep_file  =  protein_represent.fa
align_software  =  muscle
pairs_file  =  collinearity.txt
ks_file  =  sk.ks
Traceback (most recent call last):
  File "/home/caigui/miniconda3/envs/wgdi/bin/wgdi", line 10, in <module>
    sys.exit(main())
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 148, in main
    module_to_run(arg, value)
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 110, in module_to_run
    run_subprogram(program, conf, name)
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/run.py", line 78, in run_subprogram
    r.run()
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/ks.py", line 67, in run
    df_pairs = self.auto_file()
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/ks.py", line 33, in auto_file
    p = pd.read_csv(self.pairs_file, sep='\n', header=None, nrows=30)
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 665, in read_csv
    kwds_defaults = _refine_defaults_read(
  File "/home/caigui/miniconda3/envs/wgdi/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1533, in _refine_defaults_read
    raise ValueError(
ValueError: Specified \n as separator or delimiter. This forces the python engine which does not accept a line terminator. Hence it is not allowed to use the line terminator as separator.

This is the file I use:
Archive.zip
Thank you for your tools and generous help!

Legend error for "-kf" results

Hello! @SunPengChuan

I have encountered a bug in the legend of the "-kf" results, could you suggest a solution? And could you suggest a way to change the fill colour of the lines to white?

all.kf.conf

[ksfigure]
ksfit = ks_fit_result.csv
labelfontsize = 15
legendfontsize = 15
xlabel = none
ylabel = none
title = none
area = 0,4
figsize = 10,6.18
savefig =  all_ks.svg

ks_fit_result.csv

师哥

师哥,什么时候看的到我,回复我一下,我也搞这个了,我是康康

wgdi -kf legend

Hello,

I using wgdi -kf all.conf to plot ks figure, ksfit file from wgdi -kf ?, but I found the figure legend has some promblem.

I don't know if it's my fault or bug.

wgdi was installed through conda, and python version was 3.8.6 or 3.9.7.

Thanks.
Best wishes!

_tkinter.TclError: unknown color name "white" I don't know how to solve this error

Traceback (most recent call last):
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/block_ks.py", line 71, in run
fig, ax = plt.subplots(figsize=self.figsize)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/pyplot.py", line 1432, in subplots
fig = figure(**fig_kw)
^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/pyplot.py", line 773, in figure
manager = new_figure_manager(
^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/pyplot.py", line 349, in new_figure_manager
return _get_backend_mod().new_figure_manager(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/backend_bases.py", line 3505, in new_figure_manager
return cls.new_figure_manager_given_figure(num, fig)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/backend_bases.py", line 3510, in new_figure_manager_given_figure
return cls.FigureCanvas.new_manager(figure, num)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/backend_bases.py", line 1703, in new_manager
return cls.manager_class.create_with_canvas(cls, figure, num)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/backends/_backend_tk.py", line 482, in create_with_canvas
canvas = canvas_class(figure, master=window)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/matplotlib/backends/_backend_tk.py", line 173, in init
self._tkcanvas = tk.Canvas(
^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/tkinter/init.py", line 2744, in init
Widget.init(self, master, 'canvas', cnf, kw)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/tkinter/init.py", line 2628, in init
self.tk.call(
_tkinter.TclError: unknown color name "white"

Strange collinearity results compared to MCscanX.

Dear @SunPengChuan

Recently I've been using WGDI software. But something is confusing me, mainly about the colinearity (-icl).

The number of colinearity blocks found by the MCscanX I'm using is 583, and the number of colinearity blocks found by using WGDI is 1642 (parameters below). This makes me puzzled. Then I checked the colinearity block file generated by WGDI. Then I found some possible problems and would like you to comment on them?

[collinearity]
gff1 = Rb.gff
gff2 = Rb.gff
lens1 = Rb1.len
lens2 = Rb1.len
blast = Rb.blast
blast_reverse = false
multiple  = 1
process = 30
evalue = 1e-10
score = 100
grading = 50,40,25
mg = 40,40
pvalue = 0.2
repeat_number = 20
positon = order
savefile = Rb.wgdi.collinearity1

For example, some of the colinearity blocks seem to be false positives. Some of the colinearity blocks are simply genes in the opposite order and written there;

# Alignment 1638: score=191 pvalue=0.0283 N=5 9&9 minus
Rb.9.5708 3051 Rb.9.5714 3056 1
Rb.9.5709 3052 Rb.9.5713 3055 1
Rb.9.5710 3053 Rb.9.5711 3054 1
Rb.9.5711 3054 Rb.9.5709 3052 1
Rb.9.5713 3055 Rb.9.5708 3051 1

Then there are colinearity that seem to simply "slide" a gene, as follows.

# Alignment 1589: score=438 pvalue=0.0841 N=11 9&9 plus
Rb.9.1324 888 Rb.9.1323 887 1
Rb.9.1326 889 Rb.9.1327 890 1
Rb.9.1328 891 Rb.9.1330 893 1
Rb.9.1331 894 Rb.9.1333 895 1
Rb.9.1346 904 Rb.9.1337 897 -1
Rb.9.1348 905 Rb.9.1349 906 1
Rb.9.1349 906 Rb.9.1352 909 1
Rb.9.1353 910 Rb.9.1362 913 1
Rb.9.1381 927 Rb.9.1382 928 1
Rb.9.1389 932 Rb.9.1390 933 1
Rb.9.1390 933 Rb.9.1392 934 1

But these problems do not seem to exist in the MCscanX results. These may be the reason why there are more blocks in the "WGDI -icl" results than MCscanX results.

Best Regards!
Sincerely,
Wen

Attached:
Rb.collinearity.txt MCscanX reresults
Rb.wgdi.collinearity.txt
Rb2.conf.txt

wgdi -icl error

Hi!@SunPengChuan
When i use wgdi -icl *_collinearity.conf to find collinearity genes in one genome. I got an error!

I don't know why.
Thank you for your reply!

ZeroDivisionError: division by zero and ancestor_location

Dear Chuan,
Thank you very much for your help.
I started with example analysis vv1s and followed the instruction but the following tow errors occur in retain and circos steps.
1
/Vitis_vinifera$ wgdi -p total.conf
alignment = allignment.csv
gff = vv1s.gff
lens = vv1s.lens
colors = red,blue,gree
gap = 50
retention = 0.05
diff = 0.05
remove_delta = (true/false)
savefile = index.csv

Polyploidy-index between subgenomes are []
Traceback (most recent call last):
File "/home/sajjad/anaconda3/envs/wgd/bin/wgdi", line 10, in
sys.exit(main())
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 208, in main
module_to_run(arg)
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 174, in module_to_run
return switcher.get(argument)()
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 131, in run_pindex
p.run()
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/pindex.py", line 66, in run
p = self.cal_pindex(alignment)
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/pindex.py", line 98, in cal_pindex
return sum(data)/len(data)
ZeroDivisionError: division by zero

Please help me to resolve this issue.

TH 2nd query is regarding the final circos map, from where i can get the following files
!) ancestor location
2) ancestor
3) It also shows that column names attribute is missing.

wgdi -conf ./conf.ini error

It encounter the error bellow:

Traceback (most recent call last):
  File "/home/anaconda3/bin/wgdi", line 8, in <module>
    sys.exit(main())
  File "/home/anaconda3/lib/python3.7/site-packages/wgdi/run.py", line 146, in main
    module_to_run(arg, value)
  File "/home/anaconda3/lib/python3.7/site-packages/wgdi/run.py", line 107, in module_to_run
    program, conf, name = tuple(switcher.get(argument))

I have modified the row 107 in run.py and fix the problem:

if argument == 'configure':
    run_configure()
else:
    program, conf, name = tuple(switcher.get(argument))
    run_subprogram(program, conf, name)

Wgd events of multi-genomes

Hello
I am trying to use this tool. I was wondering to use it for the WGD of 3 or more different genomes and combine the graph of multiple genomes in one figure file?

sep="\n" cause ValueError?

Hi Pengchuan,

Thanks for your great work!

I had trouble when running WGDI. Pandas raise ValueError as follows:

--skip--
File "/home/testchamber/anaconda3/envs/wgdi/lib/python3.10/site-packages/wgdi/ks.py", line 33, in auto_file
    p = pd.read_csv(self.pairs_file, sep='\n', header=None, nrows=30)
--skip--
File "/home/testchamber/anaconda3/envs/wgdi/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1533, in _refine_defaults_read
    raise ValueError(
ValueError: Specified \n as separator or delimiter. This forces the python engine which does not accept a line terminator. Hence it is not allowed to use the line terminator as separator.

Could you kindly give me some suggestions?

IndexError: index 2 is out of bounds for axis 0 with size 0 (ksfigure.conf)

Hello,
Could you kindly check the below error.
This error occurs in example as well as my own data.

wgdi -kf ksfigure.conf
ksfit = ksnew.csv
labelfontsize = 9
legendfontsize = 9
xlabel = nonewgdi -kf ksfigure.conf
ksfit = ksnew.csv
labelfontsize = 9
legendfontsize = 9
xlabel = none
ylabel = none
title = none
area = 0,3
figsize = 10,10
savefig = image4.png
Traceback (most recent call last):
File "/home/sajjad/anaconda3/envs/wgd/bin/wgdi", line 10, in
sys.exit(main())
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 208, in main
module_to_run(arg)
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 174, in module_to_run
return switcher.get(argument)()
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/run.py", line 125, in run_ksfigure
kf.run()
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/ksfigure.py", line 48, in run
ax.plot(t, self.Gaussian_distribution(
File "/home/sajjad/anaconda3/envs/wgd/lib/python3.9/site-packages/wgdi/ksfigure.py", line 32, in Gaussian_distribution
if np.isnan(k[3 * i + 2]):
IndexError: index 2 is out of bounds for axis 0 with size 0

Ancestor is unknown

Hi SunPeng

What is the best approach when the ancestor is unknown and there is no information about ancestor of testing plant species?

Regards

UnboundLocalError: local variable 'group' referenced before assignment

Hi,
I get follows stderr when run wgdi -icl for collinearity.

Traceback (most recent call last):
  File "/home/wgdi", line 10, in <module>
    sys.exit(main())
  File "/home/lib/python3.7/site-packages/wgdi/run.py", line 218, in main
    module_to_run(arg)
  File "/home/lib/python3.7/site-packages/wgdi/run.py", line 183, in module_to_run
    return switcher.get(argument)()
  File "/home/lib/python3.7/site-packages/wgdi/run.py", line 157, in run_collinearity
    col.run()
  File "/home/lib/python3.7/site-packages/wgdi/run_colliearity.py", line 82, in run
    del blast, group
UnboundLocalError: local variable 'group' referenced before assignment

I have tried my best to find a solution. But no ideal. So, can you help me?
Thankls.

Error in creating dotplot, ValueError: cannot reindex on an axis with duplicate labels

I am trying to create dotplot using two genomes for whole genome duplication detection. I am facing issue (ValueError: cannot reindex on an axis with duplicate labels) screenshot of error attached. I have also attached the Screenshot of GFF1, GFF2, Lens1 & Lens2 & Blast files used. Please look into it and guide about the issue and make me able to proceed with the analysis.

Below is the total.conf script
[dotplot]
blast = /home/lilin/wgd-exp/dotplot/updated/blastp_output.blast
gff1 =/home/lilin/wgd-exp/dotplot/updated/X.ripnewide_gene.gff
gff2 = /home/lilin/wgd-exp/dotplot/updated/S. gregaria_updatedgene.gff
lens1 = /home/lilin/wgd-exp/dotplot/updated/lens1_X.riparia.lens
lens2 =/home/lilin/wgd-exp/dotplot/updated/lens2_S.gregaria.lens
genome1_name = Xya_riparia
genome2_name = Schistocerca_gregaria
multiple = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = /home/lilin/wgd-exp/dotplot/updated/dotplot.png

Attachments

Is there an example to explain how to use wgdi to get cross-species Ks plot?

Hi,

Is there an example to explain how to use wgdi to get cross-species Ks plot?
All cases I found are single species Ks analysis.

Best,
Kun

Error in Dotplot stage

Hello SunPeng

I started Wgdi but after running wgdi -d total.conf I get the following error:

wgdi -d total.conf
blast = /scratch/project_mnt/S0030/wgdi/male-female/blast_results/male_blast
gff1 = /scratch/project_mnt/S0030/wgdi/male-female/male-v1.0.a4.62d0dba6b61fa-publish.genes.gff3
gff2 = /scratch/project_mnt/S0030/wgdi/male-female/male-v1.0.a4.62d0dba6b61fa-publish.genes.gff3
lens1 = /scratch/project_mnt/S0030/wgdi/male-female/jojoba_male.lens
lens2 = /scratch/project_mnt/S0030/wgdi/male-female/jojoba_male.lens
genome1_name = jojoba male
genome2_name = jojoba male
multiple = 1
score = 100
evalue = 1e-5
repeat_number = 10
position = order
blast_reverse = false
ancestor_left = none
ancestor_top = none
markersize = 0.5
figsize = 10,10
savefig = male_Jojoba(.png, .pdf, .svg)
Traceback (most recent call last):
File "/scratch/project/qaafi-cnafs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/wgdi/dotplot.py", line 96, in run
gff1 = base.newgff(self.gff1)
^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/wgdi/base.py", line 177, in newgff
gff['start'] = gff['start'].astype(np.int64)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/generic.py", line 6240, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 450, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 352, in apply
applied = getattr(b, f)(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 526, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch/project/qaafi-cnafs/wgdi/lib/python3.11/site-packages/pandas/core/dtypes/astype.py", line 170, in astype_nansafe
return arr.astype(dtype, copy=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'gene'

error with -kf function in wgdi used to make ks figure

Hi, I am using WGDI and all steps were successfully completed but when tried to draw the ks figure using the following command
wgdi -kf total.conf

a chunk of data is shared here
ks_medain.csv

Moreover, I used the input data (ks_medain), output of speaks (-kp)

[ksfigure] ksfit = ks_medain labelfontsize = 15 legendfontsize = 15 xlabel = none ylabel = none title = none area = 0,2 figsize = 10,6.18 shadow = true savefig = ksfigure

here is the error

File "/home/tariqr/.local/bin/wgdi", line 33, in <module> sys.exit(load_entry_point('wgdi==0.6.5', 'console_scripts', 'wgdi')()) File "/ibex/sw/rl9c/wgdi/0.6.3/rl9.1_conda3/Miniconda3/envs/python3.8/lib/python3.8/site-packages/wgdi/run.py", line 163, in main module_to_run(arg, value) File "/ibex/sw/rl9c/wgdi/0.6.3/rl9.1_conda3/Miniconda3/envs/python3.8/lib/python3.8/site-packages/wgdi/run.py", line 122, in module_to_run run_subprogram(program, conf, name) File "/ibex/sw/rl9c/wgdi/0.6.3/rl9.1_conda3/Miniconda3/envs/python3.8/lib/python3.8/site-packages/wgdi/run.py", line 87, in run_subprogram r.run() File "/ibex/sw/rl9c/wgdi/0.6.3/rl9.1_conda3/Miniconda3/envs/python3.8/lib/python3.8/site-packages/wgdi/ksfigure.py", line 50, in run ax.plot(t, self.Gaussian_distribution( File "/ibex/sw/rl9c/wgdi/0.6.3/rl9.1_conda3/Miniconda3/envs/python3.8/lib/python3.8/site-packages/wgdi/ksfigure.py", line 33, in Gaussian_distribution if np.isnan(k[3 * i + 2]): IndexError: index 2 is out of bounds for axis 0 with size 0

KeyError: "None of [Int64Index ... in wgdi -c

Hi there,
I'm facing the error below when running the "wgdi -c " command
I'm using Ptrichocarpa from Phytozome

blockinfo = Ptrichocarpa_Ptrichocarpa.blockinfo.csv
lens1 = Ptrichocarpa.lens
lens2 = Ptrichocarpa.lens
tandem = false
tandem_length = 200
pvalue = 0.2
block_length = 5
tandem_ratio = 0.5
multiple = 1
homo = -1,1
savefile = Ptrichocarpa_Ptrichocarpa.blockinfo.new.csv
Traceback (most recent call last):
File "/home/amvarani/.local/bin/wgdi", line 8, in
sys.exit(main())
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 163, in main
module_to_run(arg, value)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 122, in module_to_run
run_subprogram(program, conf, name)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/run.py", line 87, in run_subprogram
r.run()
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 47, in run
arr = self.collinearity_region(cor, bkinfo, lens1)
File "/home/amvarani/.local/lib/python3.10/site-packages/wgdi/block_correspondence.py", line 70, in collinearity_region
df1[[int(k) for k in b1]] += 1
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1007, in getitem
return self._get_with(key)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/series.py", line 1042, in _get_with
return self.loc[key]
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1073, in getitem
return self._getitem_axis(maybe_callable, axis=axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1301, in _getitem_axis
return self._getitem_iterable(key, axis=axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1239, in _getitem_iterable
keyarr, indexer = self._get_listlike_indexer(key, axis)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexing.py", line 1432, in _get_listlike_indexer
keyarr, indexer = ax._get_indexer_strict(key, axis_name)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6113, in _get_indexer_strict
self._raise_if_missing(keyarr, indexer, axis_name)
File "/home/amvarani/.local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 6173, in _raise_if_missing
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Int64Index([8462, 8465, 8469, 8477, 8481, 8484, 8502, 8503, 8508, 8517, 8520,\n 8534, 8537, 8541, 8545, 8551, 8554, 8558, 8572, 8585, 8591],\n dtype='int64')] are in the [index]"

Problem of preparing the data

Hello, Thank you very much for this program !

I am at the first step of preparing the input, yet I have an issue with deal_gff.py.
If I am not mistaken, I can modify and generate all required files with this single script, without using 0.1.py , 0.2.py and 03.py.
However, deal_gff.py returned empty cds and pep files , but a complete lens files. Could you please help me with this problem?

All my data file was downloaded from NCBI Refseq (without any editing). And here is my command:
python deal_gff.py Tigriopus_californicus_GCF_007210705.1_Tcal_SD_v2.1_genomic.gff Tigriopus_californicus_GCF_007210705.1_Tcal_SD_v2.1_cds_from_genomic.cds.fasta Tigriopus_californicus_GCF_007210705.1_Tcal_SD_v2.1_protein.pep.fasta tig1

Thank you very much for helping !

Regards,
Alex

deal_gff.py improvement

Hello SunPeng

Can we change the python deal_gff.py in a way that can sort the genome file and gff file in descending order (based on the length of chromosomes in fasta file from larger to smaller) and also shorten the name of sequences (seq_id). For example, I used words "Chr1", "Chr2", "Chr3" and it was still large to be fitted in to dotplot figure nicely. Even the word "Chr" is long.

I ordered the order of chromosomes in .lens file but it did not help and still giving me unsorted chromosome orders.

When I manually edited/sorted lens and gff file to make the the sequence ids shorter and also in descending order (biggest Chr first to smallest) it did not help and it still started from chr10, chr11,.......... instead of chr1, chr2 etc..

I suggest we change this deal_gff.py in a way that can either make the fonts smaller or shorter the name of sequence ids in a way that can be fitted into the dot plot (whichever easier) and also sort the the sequences (in both pep.fa and .gff) descending (Chr1-Chr..n, larger to smallest, respectively).

Hope this is possible to improve the quality of your great work.

Strange Ks distribution analysis results

Dear @SunPengChuan

The species I studied was the rhododendron (Rhododendron bailiense). I am a user of your software for research purposes and have recently come across a query when analysing the distribution of Ks (Figure below) values in my species and would appreciate your explanation and guidance.

I would like to ask you if you have any explanation or advice for this high number of small Ks value distributions that do not seem to have similar distributions in species of the same genus（e.g., R. simsii, figure below）? This is confusing to me as I would expect that there should be some similar pattern of Ks value distributions in species from the same genus. Is this due to the possibility of a large number of recent tandem duplication in my species?

The Ks model I use is YN80.

I use the following process reference:
https://blog.csdn.net/u012110870/article/details/115511709?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168735417916800222864610%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=168735417916800222864610&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-6-115511709-null-null.268^v1^koosearch&utm_term=%E5%A6%82%E4%BD%95%E7%94%A8WGDI%E8%BF%9B%E8%A1%8C%E5%85%B1%E7%BA%BF%E6%80%A7%E5%88%86%E6%9E%90&spm=1018.2226.3001.4450

confs:
Rb1.conf.txt
peak1.conf.txt
peak2.conf.txt

ks.csv:
all_ks.csv

block information:
Rb.block.information4YN80.csv

.ks:
Rb.ks3.txt

Thank you very much for your valuable time and expertise, and I look forward to your reply and guidance.
Best regards!

_tkinter.TclError: couldn't connect to display "localhost:21.0"

Hi dear developer,

Thanks for the great tools. As I upgrated wgdi to 0.6.4, the error occured. I supposed it may due to that I use a remote sge server without the support of interactive backend. Could please give me some suggestions how to handle the problem? Many thanks!

$ wgdi -d conf.dot
blast  =  pame.pep.dia
gff1  =  pame.gff
gff2  =  pame.gff
lens1  =  pame.len
lens2  =  pame.len
genome1_name  =  pame
genome2_name  =  pame
multiple  =  1
score  =  100
evalue  =  1e-5
repeat_number  =  10
position  =  end
blast_reverse  =  false
ancestor_left  =  none
ancestor_top  =  none
markersize  =  0.5
figsize  =  10,10
savefig  =  dot.pdf
Traceback (most recent call last):
  File "/export/home/ydn/.conda/envs/yty/bin/wgdi", line 8, in <module>
    sys.exit(main())
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/wgdi/run.py", line 163, in main
    module_to_run(arg, value)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/wgdi/run.py", line 122, in module_to_run
    run_subprogram(program, conf, name)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/wgdi/run.py", line 87, in run_subprogram
    r.run()
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/wgdi/dotplot.py", line 92, in run
    fig, ax = plt.subplots(figsize=self.figsize)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/cbook/deprecation.py", line 451, in wrapper
    return func(*args, **kwargs)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/pyplot.py", line 1288, in subplots
    fig = figure(**fig_kw)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/pyplot.py", line 694, in figure
    **kwargs)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/pyplot.py", line 316, in new_figure_manager
    return _backend_mod.new_figure_manager(*args, **kwargs)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 3494, in new_figure_manager
    return cls.new_figure_manager_given_figure(num, fig)
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/site-packages/matplotlib/backends/_backend_tk.py", line 885, in new_figure_manager_given_figure
    window = tk.Tk(className="matplotlib")
  File "/export/home/ydn/.conda/envs/yty/lib/python3.7/tkinter/__init__.py", line 2023, in __init__
    self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use)
_tkinter.TclError: couldn't connect to display "localhost:21.0"

wgdi -d get Segmentation fault (core dumped)

I tried to run wgdi -d grape.total.conf, It shows like this:

blast  =  grape.blast.txt
gff1  =  grape_Chr_uniq.gff
gff2  =  grape_Chr_uniq.gff
lens1  =  grape_Chr.len
lens2  =  grape_Chr.len
genome1_name  =  Vitis_vinifera
genome2_name  =  Vitis_vinifera
multiple  =  1
score  =  100
evalue  =  1e-5
repeat_number  =  10
position  =  order
blast_reverse  =  false
ancestor_left  =  none
ancestor_top  =  none
markersize  =  0.5
figsize  =  10,10
savefig  =  grape.dot.png
failed to get the current screen resources
Segmentation fault (core dumped)

and I run gdb core.76309 :

(base) [root@localhost colinearlity]# gdb core.76309 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-114.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
[New LWP 76309]
[New LWP 76396]
Core was generated by `/home/qinsong/anaconda3/bin/python /home/qinsong/anaconda3/bin/wgdi -d grape.to'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f9d55cd9f03 in ?? ()
"/home/qinsong/WGDI/colinearlity/core.76309" is a core file.
Please specify an executable to debug.

How should I pinpoint the wrong location?

We prepared three input files according to the example, but an error occurred after the last step "wgdi -d total.cof" was entered

We prepared three input files according to the example, but an error occurred after the last step "wgdi -d total.cof" was enteredTraceback (most recent call last):
File "/home/bycl0009/miniconda3/envs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
^^^^^^
File "/home/bycl0009/miniconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 163, in main
module_to_run(arg, value)
File "/home/bycl0009/miniconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 122, in module_to_run
run_subprogram(program, conf, name)
File "/home/bycl0009/miniconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 87, in run_subprogram
r.run()
File "/home/bycl0009/miniconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/dotplot.py", line 98, in run
gff1 = base.gene_location(gff1, lens1, step1, self.position)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bycl0009/miniconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/base.py", line 214, in gene_location
gff.loc[group.index, 'loc'] = (dict_chr[name]+group[position])*step
~~~~~~~~^^^^^^
KeyError: ('LG1',)

How should I solve this problem. I would be grateful if I could receive a reply from you

example data

I wanted to recreate your results with the sample data starting with Vitis vinifera against Vitis vinifera, but I had this error:
[Errno 2] No such file or directory: '../../blast/vvi161s_vvi161s.blast'

I checked the directory and vvi161s_vvi161s.blast wasn't there. Can you provide the command you used to create vvi161s_vvi161s.blast? I looked for it and the documentation just says to use BLASTP, MMseqs2, or DIAMOND. I tried this command for rundiamond.py:
python ./rundiamond.py vvi161s.pep.fa vvi161s.pep.fa vvi161s vvi161s_vvi161s.blast

but I got this error:
sh: diamond: command not found
sh: diamond: command not found

Any help with this would be greatly appreciated

The number of gene pairs in the KS result file is not equal to the number of gene pairs in collinearity file

Hi.@SunPengChuan

My collinear file has 40,528 gene pairs. Why does the result Ks file have only 23,960 gene pairs?

[ks]
cds_file = all.cds.fa
#cat all cds files together
pep_file = all.pep.fa
#cat all pep files together
align_software = mafft
pairs_file = A_A.collinear
ks_file = A_A.ks

A_A.collinear: 40,528 gene pairs
A_A.ks: 23,960 gene pairs

Getting problem at karyotype mapping step

Hello Sir
I am using WGDI tool for chromosomal rearrangements, I followed the tutorial given, but I am stucked at karyotype mapping step , I have five eudicot species, I want to get the chromosomal rearrangement for multiple species. I have mapped my plant species with AEK given in example files. Please help me out with this step:
Till now i have permormed the dotplot analysis, followed by collinearity detection with -icl and WGDI with the "-bi" parameter, and WGDI with the "-c.

For karyotype mapping I am getting this error:
File "/home/user/genome_hic/lib/python3.10/site-packages/pandas/core/groupby/grouper.py", line 888, in get_grouper
raise KeyError(gpr)

Please suggest.

关于祖先核型构建的几个疑问

孙老师您好！我有几个疑问.

首先只用一个物种T. sinense(tsi)得到了第一个祖先核型即AEK, 植物存在WGD, 所以一堆WGD产生的染色体选一条即可.
但如果是不存在WGD的动物, 原始的lens.file就等于老师用到的aek_tsi13s.txt(WGD产生的多条同源染色体中选一条)吗？
再接着运行-d -icl -bi -c -km, 最后再画一个点图, 初步的流程就跑通了;
老师在github中提到"We used V. vinifera to validate this AEK result", 用第二个物种V. vinifera(vvi)验证AEK的结果;
(dotplot/vvi161s_aek_tsi13s)
该路径下total.conf中"ancestor_left = vvi161s.ancestor.txt", 该文件是咋写出来的呢,
我看了您给的例子, 不像1.所提到的那样, "keep the dotplot collinear blocks together as much as possible" "We separately extracted haplotypes with whole chromosomes as protochromosomes from different clusters."

如果是按照1.所写的，那这个文件依然应该是7行，一条原染色体一行, 老师写的是
1 1 69 #99CC00 1
1 70 209 red 1
1 210 1406 #99CC00 1
vvi这个物种的1号染色体被拆成了三份, 这个拆分是从何而来呢

4.这里的验证AEK是什么意思, AEK的生成只用了tsi一个物种的信息
dotplot/vvi161s_aek_tsi13s/toal.conf中并没有 [ancestral_karyotype] 这一步, 似乎默认了只用tsi一个物种的信息构建的祖先核型就是可靠的, 后面只是换物种去验证AEK

5.我的问题是:
我也只用一个物种D试了一套流程，但是我通过姐妹物种间的共线性可以确定物种D是有自己特有的染色体重排的
只用物种D试流程构建出来的祖先核型没有鉴定到这个特有的重排, 我觉得这是合理的，因为从头到尾只有一个物种的信息

但是我注意到老师您给出的例子中, tis和AEK的共线性点图就有重排了, 前面流程中的aek_tsi13s.txt并没有包含这些信息, 也没有其他物种可以用作对比, 这种重排从何而来呢？
“At the same time, Chr1 of T. sinense can be formed by the insertion of AEK1 into AEK2 through the NCF model and then fused with another AEK1 again through the EEJ model”

老师的研究类群已经有了一个公认的祖先核型数量, 对于动物大部分都是没有的, 这种情况下(同时动物也没有WGD), 当我拿到几个核型不同的现存的物种, 是不是也就限制了wgdi所能推断的祖先核型数量只能和现存的物种之一相同？.

因为pipeline中也是直接拿一个现存物种的数据作为输入, 这个时候我们如果用的是一些亲缘关系比较远的物种，是不是就不太适用了。。从这个点出发，wgdi似乎只能适用姐妹物种(对于祖先核型未知的)？？

wgdi can not recognize the PATH of mafft

Dear developer,

I met the following issue using wgdi to calculate the ks and need some help. Looking forward to your reply, thanks!

cds_file = tsin.cds
pep_file = tsin.prot
align_software = muscle
pairs_file = icl.txt
ks_file = ks.txt
Traceback (most recent call last):
File "/export/home/ydn/.conda/envs/cg/bin/wgdi", line 8, in
sys.exit(main())
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 101, in run
kaks = self.pair_kaks(k)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 113, in pair_kaks
self.align()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 134, in align
stdout, stderr = muscle_cline()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/Bio/Application/init.py", line 574, in call
raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 127 from 'C:\bio\muscle3.8.31_i86win32.exe -in pair.pep -out prot.aln -seqtype protein -clwstrict', message '/bin/sh: C:biomuscle3.8.31_i86win32.exe: command not found'

Traceback (most recent call last):
File "/export/home/ydn/.conda/envs/cg/bin/wgdi", line 8, in
sys.exit(main())
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 101, in run
kaks = self.pair_kaks(k)
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 113, in pair_kaks
self.align()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/wgdi/ks.py", line 128, in align
stdout, stderr = mafft_cline()
File "/export/home/ydn/.conda/envs/cg/lib/python3.9/site-packages/Bio/Application/init.py", line 574, in call
raise ApplicationError(return_code, str(self), stdout_str, stderr_str)
Bio.Application.ApplicationError: Non-zero return code 127 from 'C:\bio\mafft-win\mafft.bat --auto pair.pep', message '/bin/sh: C:biomafft-winmafft.bat: command not found'

helllo friend ,now the wgdi had eroror, it shows ValueError: cannot reindex on an axis with duplicate labels

/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/base.py:213: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
for name, group in gff.groupby(['chr']):
/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/base.py:213: FutureWarning: In a future version of pandas, a length 1 tuple will be returned when iterating over a groupby with a grouper equal to a list of length 1. Don't supply a list with a single grouper to avoid this warning.
for name, group in gff.groupby(['chr']):
/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/base.py:214: FutureWarning: reindexing with a non-unique Index is deprecated and will raise in a future version.
gff.loc[group.index, 'loc'] = (dict_chr[name]+group[position])*step
Traceback (most recent call last):
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/bin/wgdi", line 10, in
sys.exit(main())
^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 158, in main
module_to_run(arg, value)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 118, in module_to_run
run_subprogram(program, conf, name)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/run.py", line 84, in run_subprogram
r.run()
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/dotplot.py", line 99, in run
gff2 = base.gene_location(gff2, lens2, step2, self.position)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/wgdi/base.py", line 214, in gene_location
gff.loc[group.index, 'loc'] = (dict_chr[name]+group[position])*step
~~~~~~~^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/indexing.py", line 818, in setitem
iloc._setitem_with_indexer(indexer, value, self.name)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/indexing.py", line 1795, in _setitem_with_indexer
self._setitem_with_indexer_split_path(indexer, value, name)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/indexing.py", line 1816, in _setitem_with_indexer_split_path
value = self._align_series(indexer, Series(value))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/indexing.py", line 2277, in _align_series
return ser.reindex(new_ix)._values
^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/series.py", line 5094, in reindex
return super().reindex(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/generic.py", line 5289, in reindex
return self._reindex_axes(
^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/generic.py", line 5309, in _reindex_axes
obj = obj._reindex_with_indexers(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/generic.py", line 5355, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 729, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)
File "/public/home/zhaoli/software/anaconda3/envs/wgdi/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 4359, in _validate_can_reindex
raise ValueError("cannot reindex on an axis with duplicate labels")
ValueError: cannot reindex on an axis with duplicate labels

separate Gaussian Fitting for two or more peaks

Dear teacher,
After acquiring the correspondence block output file c.csv using -c module and then running the kspeaks module, I got the kp.png which showed two peaks.
I would like to peaksfit a Ks density distribution which contained two peaks. So I tried to separate the area in peaksfit.conf to 0,1.2 and 1.2,2.5 to run the peaksfit module and I did obtain two sets of Gaussian Fitting parameter. However, these parameter seemed not to share a common Frequency (axis Y) scale comparing to the kp.png.
How can I get multiple sets of Gaussian Fitting parameter which could share same layout in the kp.png?
By the way, this is an inter-species ks distribution, I do not really understand what the two peaks mean respectively. I think the peak in ks≈0.1 represents the divergence event of these two species, am I right or not? And I have no idea about the first peak.
MANY THANKS !

Karyotype Evolution

Dear @SunPengChuan , I need your help again

I was following your documentation on: https://github.com/SunPengChuan/wgdi-example/blob/main/Karyotype_Evolution.md and trying to generate a paleogenomics analysis of my species, which is close to Populus-Salix.

Following the example, I got it

After using -bk my results are:

Finally, when I map using the -km to map AEK I got only seven chromossomes from the AEK, and have this:

I would also like to plot the Karyotype style figure. Can you help me ?

How to install and run wgdi

I am new to python.
Please guide me how to install and run this software.
I tried but end with following error

sajjad@sajjad-ThinkStation-P910:~/Downloads/wgdi-master$ pip install wgdi
DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Defaulting to user installation because normal site-packages is not writeable
Collecting wgdi
Using cached WGDI-0.1.6.tar.gz (10 kB)
ERROR: Command errored out with exit status 1:
command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-7IAocz/wgdi/setup.py'"'"'; file='"'"'/tmp/pip-install-7IAocz/wgdi/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-8CE1wk
cwd: /tmp/pip-install-7IAocz/wgdi/
Complete output (5 lines):
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-7IAocz/wgdi/setup.py", line 6, in
with open("README.md", "r",encoding='utf-8') as fh:
TypeError: 'encoding' is an invalid keyword argument for this function
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

ksfigure becomes color-filled blocks instead of lines

Dear developers, thanks to your excellent software!
I have tried the wgdi -kf command using the example ks_fit_result.csv, however I got a ks figure with color-filled blocks instead of lines. Could you please tell me how can I change them to lines? Many thanks!

关于将现存物种映射到祖先物种并作ks图的问题

孙老师好，
https://github.com/SunPengChuan/wgdi-example/blob/main/Karyotype_Evolution.md 的教程中提到，用_T. sinense_映射到AEK上，并依次使用-icl -bi -c -bk得到如下的ks图，#45 中也展示了类似的图片

但-bi需要ks文件作为输入，而-ks分析需要物种的cds文件，组装得到的祖先只有蛋白质序列，请问这一步应当如何复现呢？

sunpengchuan / wgdi Goto Github PK

wgdi's People

Contributors

Stargazers

Watchers

Forkers

wgdi's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs