yollct / spycone Goto Github PK

Spicing-aware time-course network enricher - exploratory analysis for transcriptomics and/or proteomics time series data

License: GNU General Public License v3.0

Shell 0.14% Python 96.67% Cython 3.19%

transcriptomics alternative-splicing networks-biology systems-biology time-series isoform-switches

spycone's Introduction

🔭 I'm a PhD student in Bioinformatics

🔥 My Stats :

spycone's People

Contributors

Stargazers

Watchers

spycone's Issues

Installation error from biopython and not able to reproduce tutorial

Hi @yollct ,

I wanted to try spycone, but got some trouble during the installation process. I tried to install spycone in a virtual environment following the instruction in the repository:

python -m venv .spycone
source .spycone/bin/activate
python -m pip install ---upgrade pip
python -m pip install https://github.com/fraenkel-lab/pcst_fast/archive/refs/tags/1.0.7.tar.gz
python -m pip install spycone

I got the following warning and error message:

/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/tslearn/bases/bases.py:15: UserWarning: h5py not installed, hdf5 features will not be supported.
Install h5py to use hdf5 features: http://docs.h5py.org/
  warn(h5py_msg)

{
	"name": "ImportError",
	"message": "cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)",
	"stack": "---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import spycone as spy

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/__init__.py:11
      9 from .run_domino import run_domino, run_domain_domino
     10 from .DOMINO.src.core import domino
---> 11 from .splicingfactor import SF_coexpression, SF_motifsearch
     12 #from ._NEASE import nease

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/splicingfactor.py:14
     12 from scipy.stats import pearsonr
     13 from scipy.stats import mannwhitneyu, fisher_exact, kruskal
---> 14 from Bio.SeqUtils import GC
     15 from joblib import Parallel, delayed
     16 import gc

ImportError: cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)"
}

After doing some digging I found this github issue which also mention biopython#4622. I downgraded biopython 1.83 to 1.80 with python -m pip install biopython==1.80 and the error message is gone, but I still get the warning message about hdf5.

After that, I tried to reproduce the tutorial in your documentation and it didn't work. Both gene and transcript level workflow return the same error message. I stricly followed the documentation but when I run the code for spy.dataset(...) it returns this error:

{
	"name": "TypeError",
	"message": "dataset.__init__() got an unexpected keyword argument 'keytype'",
	"stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 flu_dset = spy.dataset(ts=flu_ts,
      2                         gene_id = gene_list,
      3                         symbs=gene_list,
      4                         species=9606,
      5                         keytype='entrezgeneid',
      6                         reps1 = 5,
      7                         timepts = 9)

TypeError: dataset.__init__() got an unexpected keyword argument 'keytype'"
}

I do not know what is wrong and any help would be appreciated! Spycone looks great, I would like to give it a try on my own data after that.

How to deal with incomplete data set and not ENTREZ IDs (novel isoforms)

Hi There,
I am giving this tool a try instead my normal R packages and so far I required some code modification in the documentation https://spycone.readthedocs.io/en/latest/gene-level-workflow.html#Prepare-the-dataset. It would be great to have a more update version of this.

Now, my questions:

I normally do use ENTREZ IDs (ENSEMBL IDs), and I am also using novel isoforms, which means that not all of them have regular "gene Names". How could this be implemented in the pipelines (transcript and gene level)?
I have 5 time points and 5 replicates for each time point, but unfortunately, one of the samples needed to be removed from the data set due to quality issues. Then, when creating the Spycone object, the function complains. Is there a way to solve this?

Here the error:
`Cell In[24], line 1
----> 1 tp5_dset = spy.dataset(ts=df_counts_sort,
2 gene_id = gene_list,
3 symbs=gene_list,
4 species=9606,
5 reps1 = 5,
6 timepts = 5)

File ~/miniconda3/envs/jypyTimeSeries/lib/python3.11/site-packages/spycone/DataSet.py:126, in dataset.init(self, ts, species, reps1, timepts, gtf, gene_id, transcript_id, timeserieslist, symbs, discretization_steps)
123 self.ts[0] = np.array(self.ts[0], dtype="double")
125 if self.timepts*self.reps1 != self.ts[0].shape[1]:
--> 126 raise ValueError("Number of columns is not the same as number of time points.")
128 if self.species not in self.SPECIES:
129 raise ValueError("Please provide a supported species ID.")

ValueError: Number of columns is not the same as number of time points.`

Thanks and all the best,
Nicolas

Test data error

Hi， @yollct
`subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_sorted_bc_tpm.csv?download=1 -O alt_sorted_bc_tpm.csv", shell=True)
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_genelist.csv?download=1 -O alt_genelist.csv", shell=True)

data = pd.read_csv("alt_sorted_bc_tpm.csv", sep="\t")
genelist = pd.read_csv("alt_genelist.csv", sep="\t")

geneid= list(map(lambda x: str(int(x)) if not np.isnan(x) else x, genelist['gene'].tolist()))
transcriptid = genelist['isoforms'].to_list()

dset = spy.dataset(ts=data,
transcript_id=transcriptid,
gene_id = geneid,
species=9606,
timepts=4, reps1=3)`

thr error message is Traceback (most recent call last):
File "", line 1, in
File "/public/home/zwliu/miniconda3/lib/python3.10/site-packages/spycone/DataSet.py", line 197, in init
self._get_gene_level()
File "/public/home/zwliu/miniconda3/lib/python3.10/site-packages/spycone/DataSet.py", line 226, in _get_gene_level
self.genelevel_symb.append(self.symbs[v[0]])
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

What is the reason for this and how to solve it? If it is not a model species, can the species parameter be eliminated?

Clustering of total isoform usage missing one time point

Hi there,

Using the Spycone tutorial and my own data, I noticed that there is one missing time point when plotting the clusters for the total isoform usage. Is this expected?
For instance, If you see the tutorial cluster figure https://spycone.readthedocs.io/en/latest/alternative.html you have in total 4 time points in the test set, but when plotting the clusters there are only 3. Any idea how to solve this?

Also, the parameters to change axis labels and x-axis tick labels are not working, so it is completely impossible to see which timepoint is missing.

Thanks and all the best,
Nicolas

species object

Hi, @yollct @kadam0
dset = spy.dataset(ts=data, transcript_id=transcriptid, gene_id = geneid, species=9606, keytype='entrezgeneid', timepts=4, reps1=3)
Can I ask if I can use non-model species here, like cotton, because I see that the code has some parameters and species need to be provided

AttributeError: module 'spycone' has no attribute 'dataset'

I am attempting to execute the "transcriptTranscript-level Workflow" as described in the Spycone documentation on a Linux system using Python 3. However, I encounter the following error:

line 21, in <module>
dset = spy.dataset(ts=data,
AttributeError: module 'spycone' has no attribute 'dataset'

I have followed the steps outlined in the documentation and used the provided sample data. Here is the code I used:


import sys
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
sys.path.insert(0, "../../")
import spycone as spy
import subprocess
from gtfparse import read_gtf

#sample data
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_sorted_bc_tpm.csv?download=1 -O alt_sorted_bc_tpm.csv", shell=True)
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_genelist.csv?download=1 -O alt_genelist.csv", shell=True)

data = pd.read_csv("alt_sorted_bc_tpm.csv", sep="\t")
genelist = pd.read_csv("alt_genelist.csv", sep="\t")

geneid= list(map(lambda x: str(int(x)) if not np.isnan(x) else x,  genelist['gene'].tolist()))
transcriptid = genelist['isoforms'].to_list()

dset = spy.dataset(ts=data,
        transcript_id=transcriptid,
        gene_id = geneid,
        species=9606,
        # keytype='entrezgeneid',
        timepts=4, reps1=3)
        
bionet = spy.BioNetwork(path="human", data=(('weight',float),))

spy.preprocess(dset, bionet, cutoff=1)

iso = spy.iso_function(dset)
#run isoform switch
ascov=iso.detect_isoform_switch(filtering=False, min_diff=0.05, corr_cutoff=0.5, event_im_cutoff=0.1, p_val_cutoff=0.05)

ascov.head()

#matplotlib inline
spy.switch_plot("CDK4", dset, ascov)

#%matplotlib inline
spy.switch_plot("BRCC3", dset, ascov, all_isoforms=True)

I have commented out the %matplotlib inline lines as they were causing a SyntaxError. Despite this, I am still encountering the "AttributeError" mentioned earlier. I am uncertain about the cause of this issue and would appreciate your guidance on how to resolve it and successfully run the "transcriptTranscript-level Workflow."

Thank you for your assistance.

yollct / spycone Goto Github PK

spycone's Introduction

🔥 My Stats :

spycone's People

Contributors

Stargazers

Watchers

spycone's Issues

Installation error from biopython and not able to reproduce tutorial

How to deal with incomplete data set and not ENTREZ IDs (novel isoforms)

Test data error

Clustering of total isoform usage missing one time point

species object

AttributeError: module 'spycone' has no attribute 'dataset'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs