GithubHelp home page GithubHelp logo

yollct / spycone Goto Github PK

View Code? Open in Web Editor NEW
10.0 2.0 0.0 96.2 MB

Spicing-aware time-course network enricher - exploratory analysis for transcriptomics and/or proteomics time series data

License: GNU General Public License v3.0

Shell 0.14% Python 96.67% Cython 3.19%
transcriptomics alternative-splicing networks-biology systems-biology time-series isoform-switches

spycone's Introduction

๐Ÿ”ญ I'm a PhD student in Bioinformatics


๐Ÿ”ฅ My Stats :

profile views

spycone's People

Contributors

yollct avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

spycone's Issues

Installation error from biopython and not able to reproduce tutorial

Hi @yollct ,

I wanted to try spycone, but got some trouble during the installation process. I tried to install spycone in a virtual environment following the instruction in the repository:

python -m venv .spycone
source .spycone/bin/activate
python -m pip install ---upgrade pip
python -m pip install https://github.com/fraenkel-lab/pcst_fast/archive/refs/tags/1.0.7.tar.gz
python -m pip install spycone

I got the following warning and error message:

/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/tslearn/bases/bases.py:15: UserWarning: h5py not installed, hdf5 features will not be supported.
Install h5py to use hdf5 features: http://docs.h5py.org/
  warn(h5py_msg)
{
	"name": "ImportError",
	"message": "cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)",
	"stack": "---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 1
----> 1 import spycone as spy

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/__init__.py:11
      9 from .run_domino import run_domino, run_domain_domino
     10 from .DOMINO.src.core import domino
---> 11 from .splicingfactor import SF_coexpression, SF_motifsearch
     12 #from ._NEASE import nease

File ~/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/spycone/splicingfactor.py:14
     12 from scipy.stats import pearsonr
     13 from scipy.stats import mannwhitneyu, fisher_exact, kruskal
---> 14 from Bio.SeqUtils import GC
     15 from joblib import Parallel, delayed
     16 import gc

ImportError: cannot import name 'GC' from 'Bio.SeqUtils' (/home/louisphilippe/Documents/sno_splicing_analysis/.spycone/lib/python3.10/site-packages/Bio/SeqUtils/__init__.py)"
}

After doing some digging I found this github issue which also mention biopython#4622. I downgraded biopython 1.83 to 1.80 with python -m pip install biopython==1.80 and the error message is gone, but I still get the warning message about hdf5.

After that, I tried to reproduce the tutorial in your documentation and it didn't work. Both gene and transcript level workflow return the same error message. I stricly followed the documentation but when I run the code for spy.dataset(...) it returns this error:

{
	"name": "TypeError",
	"message": "dataset.__init__() got an unexpected keyword argument 'keytype'",
	"stack": "---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 flu_dset = spy.dataset(ts=flu_ts,
      2                         gene_id = gene_list,
      3                         symbs=gene_list,
      4                         species=9606,
      5                         keytype='entrezgeneid',
      6                         reps1 = 5,
      7                         timepts = 9)

TypeError: dataset.__init__() got an unexpected keyword argument 'keytype'"
}

I do not know what is wrong and any help would be appreciated! Spycone looks great, I would like to give it a try on my own data after that.

LP

How to deal with incomplete data set and not ENTREZ IDs (novel isoforms)

Hi There,
I am giving this tool a try instead my normal R packages and so far I required some code modification in the documentation https://spycone.readthedocs.io/en/latest/gene-level-workflow.html#Prepare-the-dataset. It would be great to have a more update version of this.

Now, my questions:

  1. I normally do use ENTREZ IDs (ENSEMBL IDs), and I am also using novel isoforms, which means that not all of them have regular "gene Names". How could this be implemented in the pipelines (transcript and gene level)?
  2. I have 5 time points and 5 replicates for each time point, but unfortunately, one of the samples needed to be removed from the data set due to quality issues. Then, when creating the Spycone object, the function complains. Is there a way to solve this?

Here the error:
`Cell In[24], line 1
----> 1 tp5_dset = spy.dataset(ts=df_counts_sort,
2 gene_id = gene_list,
3 symbs=gene_list,
4 species=9606,
5 reps1 = 5,
6 timepts = 5)

File ~/miniconda3/envs/jypyTimeSeries/lib/python3.11/site-packages/spycone/DataSet.py:126, in dataset.init(self, ts, species, reps1, timepts, gtf, gene_id, transcript_id, timeserieslist, symbs, discretization_steps)
123 self.ts[0] = np.array(self.ts[0], dtype="double")
125 if self.timepts*self.reps1 != self.ts[0].shape[1]:
--> 126 raise ValueError("Number of columns is not the same as number of time points.")
128 if self.species not in self.SPECIES:
129 raise ValueError("Please provide a supported species ID.")

ValueError: Number of columns is not the same as number of time points.`

Thanks and all the best,
Nicolas

Test data error

Hi๏ผŒ @yollct
`subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_sorted_bc_tpm.csv?download=1 -O alt_sorted_bc_tpm.csv", shell=True)
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_genelist.csv?download=1 -O alt_genelist.csv", shell=True)

data = pd.read_csv("alt_sorted_bc_tpm.csv", sep="\t")
genelist = pd.read_csv("alt_genelist.csv", sep="\t")

geneid= list(map(lambda x: str(int(x)) if not np.isnan(x) else x, genelist['gene'].tolist()))
transcriptid = genelist['isoforms'].to_list()

dset = spy.dataset(ts=data,
transcript_id=transcriptid,
gene_id = geneid,
species=9606,
timepts=4, reps1=3)`

thr error message is Traceback (most recent call last):
File "", line 1, in
File "/public/home/zwliu/miniconda3/lib/python3.10/site-packages/spycone/DataSet.py", line 197, in init
self._get_gene_level()
File "/public/home/zwliu/miniconda3/lib/python3.10/site-packages/spycone/DataSet.py", line 226, in _get_gene_level
self.genelevel_symb.append(self.symbs[v[0]])
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed

What is the reason for this and how to solve it? If it is not a model species, can the species parameter be eliminated?

Clustering of total isoform usage missing one time point

Hi there,

Using the Spycone tutorial and my own data, I noticed that there is one missing time point when plotting the clusters for the total isoform usage. Is this expected?
For instance, If you see the tutorial cluster figure https://spycone.readthedocs.io/en/latest/alternative.html you have in total 4 time points in the test set, but when plotting the clusters there are only 3. Any idea how to solve this?

Also, the parameters to change axis labels and x-axis tick labels are not working, so it is completely impossible to see which timepoint is missing.
Screenshot 2024-05-22 at 10 24 20

Thanks and all the best,
Nicolas

species object

Hi, @yollct @kadam0
dset = spy.dataset(ts=data, transcript_id=transcriptid, gene_id = geneid, species=9606, keytype='entrezgeneid', timepts=4, reps1=3)
Can I ask if I can use non-model species here, like cotton, because I see that the code has some parameters and species need to be provided

AttributeError: module 'spycone' has no attribute 'dataset'

I am attempting to execute the "transcriptTranscript-level Workflow" as described in the Spycone documentation on a Linux system using Python 3. However, I encounter the following error:

line 21, in <module>
dset = spy.dataset(ts=data,
AttributeError: module 'spycone' has no attribute 'dataset'

I have followed the steps outlined in the documentation and used the provided sample data. Here is the code I used:


import sys
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
sys.path.insert(0, "../../")
import spycone as spy
import subprocess
from gtfparse import read_gtf

#sample data
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_sorted_bc_tpm.csv?download=1 -O alt_sorted_bc_tpm.csv", shell=True)
subprocess.call("wget https://zenodo.org/record/7228475/files/tutorial_alt_genelist.csv?download=1 -O alt_genelist.csv", shell=True)

data = pd.read_csv("alt_sorted_bc_tpm.csv", sep="\t")
genelist = pd.read_csv("alt_genelist.csv", sep="\t")

geneid= list(map(lambda x: str(int(x)) if not np.isnan(x) else x,  genelist['gene'].tolist()))
transcriptid = genelist['isoforms'].to_list()

dset = spy.dataset(ts=data,
        transcript_id=transcriptid,
        gene_id = geneid,
        species=9606,
        # keytype='entrezgeneid',
        timepts=4, reps1=3)
        
bionet = spy.BioNetwork(path="human", data=(('weight',float),))

spy.preprocess(dset, bionet, cutoff=1)

iso = spy.iso_function(dset)
#run isoform switch
ascov=iso.detect_isoform_switch(filtering=False, min_diff=0.05, corr_cutoff=0.5, event_im_cutoff=0.1, p_val_cutoff=0.05)

ascov.head()

#matplotlib inline
spy.switch_plot("CDK4", dset, ascov)

#%matplotlib inline
spy.switch_plot("BRCC3", dset, ascov, all_isoforms=True)

I have commented out the %matplotlib inline lines as they were causing a SyntaxError. Despite this, I am still encountering the "AttributeError" mentioned earlier. I am uncertain about the cause of this issue and would appreciate your guidance on how to resolve it and successfully run the "transcriptTranscript-level Workflow."

Thank you for your assistance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.