yoseflab / cassiopeia Goto Github PK
View Code? Open in Web Editor NEWA Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
Home Page: https://cassiopeia-lineage.readthedocs.io/en/latest/
License: MIT License
A Package for Cas9-Enabled Single Cell Lineage Tracing Tree Reconstruction
Home Page: https://cassiopeia-lineage.readthedocs.io/en/latest/
License: MIT License
Hello, I am following the preprocessing notebook in the refactor branch with my own GESTALT data. I believe it has been working well (or at least has not errored out; I haven't had a chance to dig into the various outputs to see if things make sense with my experiment), up until the last step in the notebook. When I ran allele_table = cassiopeia.pp.call_lineage_groups(umi_table, output_dir)
, I got the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'Sample'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/pandas/core/generic.py in _set_item(self, key, value)
3575 try:
-> 3576 loc = self._info_axis.get_loc(key)
3577 except KeyError:
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2899 except KeyError as err:
-> 2900 raise KeyError(key) from err
2901
KeyError: 'Sample'
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-10-2b5fa699a3d8> in <module>
----> 1 allele_table = cassiopeia.pp.call_lineage_groups(umi_table, output_dir)
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/pipeline.py in call_lineage_groups(input_df, output_directory, min_umi_per_cell, min_avg_reads_per_umi, min_cluster_prop, min_intbc_thresh, inter_doublet_threshold, kinship_thresh, verbose, plot)
997 )
998
--> 999 allele_table = l_utils.filtered_lineage_group_to_allele_table(filtered_lgs)
1000
1001 if verbose:
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/lineage_utils.py in filtered_lineage_group_to_allele_table(filtered_lgs)
408
409 final_df["Sample"] = final_df.apply(
--> 410 lambda x: x.cellBC.split(".")[0], axis=1
411 )
412
~/.local/lib/python3.6/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
3042 else:
3043 # set column
-> 3044 self._set_item(key, value)
3045
3046 def _setitem_slice(self, key: slice, value):
~/.local/lib/python3.6/site-packages/pandas/core/frame.py in _set_item(self, key, value)
3119 self._ensure_valid_index(value)
3120 value = self._sanitize_column(key, value)
-> 3121 NDFrame._set_item(self, key, value)
3122
3123 # check if we are modifying a copy
~/.local/lib/python3.6/site-packages/pandas/core/generic.py in _set_item(self, key, value)
3577 except KeyError:
3578 # This item wasn't present, just insert at end
-> 3579 self._mgr.insert(len(self._info_axis), key, value)
3580 return
3581
~/.local/lib/python3.6/site-packages/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
1196 value = _safe_reshape(value, (1,) + value.shape)
1197
-> 1198 block = make_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1))
1199
1200 for blkno, count in _fast_count_smallints(self.blknos[loc:]):
~/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py in make_block(values, placement, klass, ndim, dtype)
2742 values = DatetimeArray._simple_new(values, dtype=dtype)
2743
-> 2744 return klass(values, ndim=ndim, placement=placement)
2745
2746
~/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
2398 values = np.array(values, dtype=object)
2399
-> 2400 super().__init__(values, ndim=ndim, placement=placement)
2401
2402 @property
~/.local/lib/python3.6/site-packages/pandas/core/internals/blocks.py in __init__(self, values, placement, ndim)
129 if self._validate_ndim and self.ndim and len(self.mgr_locs) != len(self.values):
130 raise ValueError(
--> 131 f"Wrong number of items passed {len(self.values)}, "
132 f"placement implies {len(self.mgr_locs)}"
133 )
ValueError: Wrong number of items passed 9, placement implies 1
Any insights as to what might have happened? I haven't changed any of the options or arguments for any of the steps in the preprocessing notebook. The only difference is that I started with a different bam file. Thanks for any help!
In Makefile
the install rule doesn't refer to the pip variable.
Hello I'm trying to run reconstruct_lineages.ipynb but with the test data data/test_at.txt as the allele table. It seemed to output the allele table fine but when I ran reconstruct lineage I got an error. I am wondering what is the exact input format for the reconstruct-lineage section? Right now the test_lg4_character_matrix.txt I have is below (just first 3 lines). Below that is the error I encountered. Thank you.
cellBC r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 r17 r18 r19 r20 r21r22 r23 r24 r25 r26 r27 r28 r29 r30 r31 r32 r33 r34 r35 r36 r37 r38 r39 r40 r41 r42 r43 r44 r45 r46 r47 r48 r49 r50
IVLT-2B_00.AAACCTGGTCTGGTCG-1 0 0 0 2 2 0 2 0 0 0 2 2 0 2 0 0 0 0 2 20 0 2 2 0 2 0 2 0 0 0 2 0 0 0 0 2 0 0 - - - - -- - - - - - -
IVLT-2B_00.AAACCTGTCCACTCCA-1 0 0 0 2 0 0 2 0 2 - - - 0 2 0 2 0 0 2 20 0 2 2 2 0 0 2 0 2 - - - 2 2 0 0 2 2 0 0 0 - -- - - - - - -
The command and error message
reconstruct-lineage test_lg4_character_matrix.txt test_lg4_tree.pkl --hybrid
running algorithm...
Using 1 threads, 48 available.
Sending off Target Sets: 1
Started new thread for: 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 (num targets = 10) , pid = ea70d39f9eb88a0a5b0739128f47b4d5
(2, 51)
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 421, in wrapped
return func(*args, **kwargs)
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 445, in prune_unique_alleles
counts,
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 443, in <lambda>
if len(np.where(x[1] == 1)) > 0
IndexError: index 2 is out of bounds for axis 0 with size 2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 421, in wrapped
return func(*args, **kwargs)
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 572, in find_good_gurobi_subgraph
proot, targets_pruned, pruned_to_orig = prune_unique_alleles(root, targets)
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 423, in wrapped
traceback_str = traceback.format_exc(e)
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 121, in format_exception
type(value), value, tb, limit=limit).format(chain=chain))
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 509, in __init__
capture_locals=capture_locals)
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 338, in extract
if limit >= 0:
TypeError: '>=' not supported between instances of 'IndexError' and 'int'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 423, in wrapped
traceback_str = traceback.format_exc(e)
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 167, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 121, in format_exception
type(value), value, tb, limit=limit).format(chain=chain))
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 498, in __init__
_seen=_seen)
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 509, in __init__
capture_locals=capture_locals)
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/traceback.py", line 338, in extract
if limit >= 0:
TypeError: '>=' not supported between instances of 'TypeError' and 'int'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/isshamie/software/anaconda3/envs/cass/bin/reconstruct-lineage", line 33, in <module>
sys.exit(load_entry_point('cassiopeia-lineage', 'console_scripts', 'reconstruct-lineage')())
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/reconstruct_tree.py", line 245, in main
lookahead_depth=lookahead_depth,
File "/home/isshamie/software/Cassiopeia/cassiopeia/TreeSolver/lineage_solver/lineage_solver.py", line 228, in solve_lineage_instance
results, r, pid, graph_sizes = future.result()
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/concurrent/futures/_base.py", line 425, in result
return self.__get_result()
File "/data/isshamie/software/anaconda3/envs/cass/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
raise self._exception
TypeError: '>=' not supported between instances of 'TypeError' and 'int'
Hello,
I was following the tutorial for the preprocessing, and when I get to the resolve_umi_sequence I get an error thrown.
__init__() got an unexpected keyword argument 'basey'
I believe it's a deprecation of the params in the new matplotlib version.
Best,
Chang
From the following code snippet, we see that even though the cached edges do not change, the underlying backend networkx object has changed, with the singleton edge at the top of the tree being collapsed. Hence we believe that the behavior of the collapse_unifurcations
function that is called on the (supposed) copy of the trees passed into triplets_correct
and robinson_foulds
somehow persists.
import cassiopeia as cas
simulated_tree_dir = "/data/yosef2/users/richardz/projects/CassiopeiaV2-Reproducibility/topologies/exponential_plus_c/400cells/no_fit/"
reconstructed_tree_dir = '/data/yosef2/users/richardz/projects/CassiopeiaV2-Reproducibility/reconstructed/exponential_plus_c/400cells/no_priors/no_fit/char40/'
ind = 0
tree = pic.load(open(f'{simulated_tree_dir}/topology{ind}.pkl', 'rb'))
print(len(tree.edges))
print(tree._CassiopeiaTree__network.number_of_edges())
print(tree.root, tree.children(tree.root))
recon_tree = cas.data.CassiopeiaTree(tree=f'{reconstructed_tree_dir}/greedy/recon{ind}')
triplets = cas.critique.triplets_correct(tree, recon_tree, min_triplets_at_depth = 50)[0] # or rf = cas.critique.robinson_foulds(tree, recon_tree)
print(len(tree.edges))
print(tree._CassiopeiaTree__network.number_of_edges())
print(tree.root, tree.children(tree.root))
Some target sites end up getting an empty string ''
as their allele. Still not entirely sure where/why this is happening, but the pipeline should deal with these more appropriately.
Currently, if a table has such empty-string-target site entries, reading the table with pd.read_csv
causes these entries to be replaced with a floating-point NaN
, which causes problems when we try to use the target site indels to construct a string. Specifically, if we try to join the target sites into a single allele string, as so
molecule_table[['r1', 'r2', 'r3']].apply(lambda x: '_'.join(x))
this will cause an error because all the arguments to join
must be strings, but NaN
is a float.
One (hacky?) way to get around this is to simply provide na_filter=False
to pd.read_csv
, which reads the empty strings literally. But it is unclear if this is the behavior we want. Or do we want to simply filter out all UMIs with missing target site indels?
Here is an example of a UMI that has a missing target site indel.
AAAAAACCGTCGTA_GTGGTCGGG_1 777.0 110M1D52M1D55M 0.0 0.0 AATCCAGCTAGCTGTGCAGCCGTGAGTCTCTGATATTCAACTGCAGTAATGCTACCTAGTACTCACGCTTTCCAAGTGCTTGGCGTCGCATCACGGTCCTTTGTACGCCGAAAATCGCCAGACAACTAAGCTACGGCACGCTGCCATGTTGGGTCATACCCAAAATCAGGTTACTCCTGGGCCGCACAAGTCATGGAGAAGTCGAGACTATTAATTCATGATTCCCAATCTGGACTTACTACATGTATACCCC GTGGTCGGG [111:1D][164:1D]AAAAAACCGTCGTA CGTGAGTCTCTGAT [111:1D] [164:1D] 1.0
After discussing with @mattjones315, this behavior is mainly due to the low sequencing quality of this dataset, compounded by technical artifacts in local alignment (no alignment for the particular target site)
Hi,
I used ilp Solver on around 20 taxa dataset. The output from ilp Solver is a Digragh. When i try to export it into a newick fomat. Sometimes it will succeed and sometimes not. I wonder why would this happen?
Another question is , when i have the output as a newick, some subtree/groups of taxa will appear more than once. the itol view will display them randomly. However, how should choose the best tree.
Thank you for your answer.
Make an argument use_priors
in CassiopeiaSolvers
that dictates whether or not to use priors during reconstruction of a tree.
The current behavior is that if priors exist within a CassiopeiaTree
, then they are used. This is a bit presumptuous of the user's intentions.
Hi @mattjones315, I am trying out the preprocess.ipynb
notebook you mentioned in issue #79 from the refactor
branch on some of my recently acquired GESTALT data. I am getting the following error: AttributeError: module 'cassiopeia' has no attribute 'pp'
when trying to run any command that contains cassiopeia.pp
. I'm not sure why; I was able to import cassiopeia no problem into my jupyter notebook (and as an installation check, I confirmed that running reconstruct-lineage -h
in the command line gave me usage details). I remember when I first started using your tool back in March I was able to (mostly) run through the process_fastq.ipynb
notebook and cassiopeia was working fine. Was there a recent update to cassiopeia that would be throwing things off, or did I perhaps miss something in the install?
Currently, there is no option to plot indel heatmaps only using the character matrix. As of now, the full allele table is required. Extend the plotting functionality to support this use case. #159
You mentioned that the character matrix:
This simple data structure is an N x M matrix, where we represent each of the N cells in a population by a vector of M characters that can take on a mutation. In the context of Cas9-based lineage tracers, each of these M characters is a specific cut-site that can take on one of several possible indels. The entry 𝑛𝑖,𝑗 represents that mutation observed in the 𝑖𝑡ℎ cell at the 𝑗𝑡ℎ cut-site. For simplicity, we abstract away actual indentities and represent each unique mutation as integer, so that these character matrices are filled with integers. Importantly, Cassiopeia represents missing data with the integer -1, though users can change this as long as they specify this to the CassiopeiaTree downstream.
I don't fully understand this structure. Does it mean that n_{ij} represent a particular form of mutation from cell i that occurs at j-th cutting site? What if a particular mutation occupies several cut targets, will then n_ij>0 for consecutive columns?
A more important question: I am now analyzing a Cas-9 based lineage tracing data from a different experimental design, and I have run the preprocessing using a different pipeline. Now, I have the (Cell ID, mutation) table. For a given cell, it may have several independent mutations (like cuts at different targets), and for a given mutation, it may be observed across several different cells. I wonder if it is more natural to just convert the data into a cell-by-mutation matrix, where each column is a different mutation, and the entry n_{ij} will be whether a particular mutation is observed at a given cell or not. Can I just pass this matrix to your pipeline as the character matrix, and run it?
Will be very happy to discuss more :)
Implement utility to compute cophenetic distance for a tree. The cophenetic distance is the correlation between phylogenetic distance and some dissimilarity map between samples' character information.
The preprocessing pipeline performs a lot of grouped iterations of the following form:
for cellBC, group in moleculetable.groupby("cellBC"):
...
Wherever possible, we should update these instances to use groupby operations, instead of iterating over each group for the following reasons.
This applies to several functions in pp.utilities
, pp.UMI_utils
, pp.lineage_utils
, pp.doublet_utils
.
The bottom_solver doesn't see the logfile passed into the solve
method in the HybridSolver
Hello,
I wonder whether Cassiopeia pipeline can be used for scRNAseq data obtained at different time points during the process of cancer transformation? Please let me know.
Thank you,
Shikha
Cassiopeia/cassiopeia/solver/NeighborJoiningSolver.py
Lines 16 to 20 in 4a4ba1a
solver_utilities
is a file in solver
but the other two are imported into that namespace via the init file.Cassiopeia/cassiopeia/simulator/CompleteBinarySimulator.py
Lines 11 to 13 in 4a4ba1a
cassiopeia
to remove the capital C so that the name reflects the python package name. All old urls will work still.I was just trying out some examples, and in the 'Post-Process Tree & Add Redundant Leaves' section, the command:
g.newick = data_pipeline.convert_network_to_newick_format(g)
Gives me a
'TypeError: 'Cassiopeia_Tree' object is not iterable' error.
Looking at the tree class, it seems class function get_newick(self) calls this same function with the network, not the full object. Is the example notebook out of date? Thanks!
Hi, Dear Developer
First, I want to thanks for your tool and shared code :).
And I am wondering whether you can offer some simulation data link, like the test_possorted_genome_bam.bam
in process_fastq.ipynb. After all, CellRange is time consuming. I believe some prepared data may help people to get acquainted faster.
## first specify the home directory, and possorted genome bam
home_dir = "."
genome_bam = "data/test_possorted_genome_bam.bam"
And I have another question about test_possorted_genome_bam.bam. It looks like this bam is the result of CellRange, but I found in the paper method, there is another step between Assigning cell barcode and UMIs to each read
and Aligning to the target site reference
.
Finding the consensus sequence for each UMI. To potentially increase the speed of consensussequence finding, we attempt to trim reads to the same length for each UMI.
But I do not found this step in the Cassiopeia.
Please forgive me If I misunderstand something :)
Best wishes
Guandong Shang
We've found that the local alignment strategy used in our Cassiopeia preprocessing pipeline is not ideal for some technologies like GESTALT. We'd like to implement a global alignment option in cas.pp.align_sequences
as was described in the original GESTALT paper (McKenna et al, 2016).
It seems that the itol_utilities
module requires a Path
object. Passing a string results into an error:
itol.py:38, in Itol.add_file(self, file_path)
34 def add_file(self, file_path: Path) -> None:
35 """
36 Add a file to be uploaded, tree or dataset
37 """
---> 38 if not file_path.is_file():
39 raise IOError('%s is not a file' % file_path)
40 self.files.append(file_path)
AttributeError: 'str' object has no attribute 'is_file'
By passing a Path
object this gets solved, e.g.
# in itol_utilities.py
# from pathlib import Path
itol_uploader = Itol()
itol_uploader.add_file(
Path(os.path.join(temporary_directory, "tree_to_plot.tree"))
)
Thanks
Hi Matt,
I have a quick question about showing legend for colorstrip when using cassiopeia.pl.plot_matplotlib. For example, when the colorstrip represents the organ locations where the cells in the tree are collected, I hope to display which color corresponds to which location in a legend in the plot produced from cassiopeia.pl.plot_matplotlib. Is it possible to do so? I tried something like:
cas.pl.plot_matplotlib(cas_tree, add_root=True, meta_data=['sampleID'], colorstrip_kwargs=dict(showlegend=True))
but it doesn't work and returns an error. Thanks very much!
When creating a wrapper around a dissimilarity function from cas.solver.dissimilarity
and applying it to a DistanceSolver
's dissimilarity_function
argument, I get a numba error.
In order to recreate the issue, I used the pip install
command from the repo's readme, and ran this python script:
## From cass.py
from typing import Dict, List, Optional
import cassiopeia as cas
import pandas as pd
import pickle as pic
import os
gt_tree_dir = "/data/yosef2/users/richardz/projects/CassiopeiaV2-Reproducibility/trees/exponential_plus_c/400cells/no_fit/char40/"
gt_tree_file = os.path.join(gt_tree_dir, "tree0.pkl")
gt_tree = pic.load(open(gt_tree_file, "rb"))
cm_file = os.path.join(gt_tree_dir, f"cm0.txt")
cm = pd.read_table(cm_file, index_col = 0)
recon_tree = cas.data.CassiopeiaTree(
character_matrix=cm,
missing_state_indicator = -1
)
def my_distance_function(
s1: List[int],
s2: List[int],
missing_state_indicator=-1,
weights: Optional[Dict[int, Dict[int, float]]] = None,
) -> float:
return cas.solver.dissimilarity.weighted_hamming_distance(
s1,
s2,
missing_state_indicator=missing_state_indicator,
weights=weights,
)
solver = cas.solver.NeighborJoiningSolver(
add_root = True,
dissimilarity_function=my_distance_function
)
solver.solve(recon_tree)
Upon running the script above, the following error pops up:
## From stderr
Traceback (most recent call last):
File "cass.py", line 38, in <module>
solver.solve(recon_tree)
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/solver/DistanceSolver.py", line 140, in solve
dissimilarity_map = self.get_dissimilarity_map(cassiopeia_tree, layer)
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/solver/DistanceSolver.py", line 106, in get_dissimilarity_map
self.setup_dissimilarity_map(cassiopeia_tree, layer)
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/solver/DistanceSolver.py", line 227, in setup_dissimilarity_map
self.setup_root_finder(cassiopeia_tree)
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/solver/NeighborJoiningSolver.py", line 264, in setup_root_finder
self.dissimilarity_function, self.prior_transformation
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/CassiopeiaTree.py", line 1855, in compute_dissimilarity_map
self.missing_state_indicator,
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py", line 214, in compute_dissimilarity_map
cm, C, missing_state_indicator, nb_weights
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/numba/core/dispatcher.py", line 468, in _compile_for_args
error_rewrite(e, 'typing')
File "/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/numba/core/dispatcher.py", line 409, in error_rewrite
raise e.with_traceback(None)
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'weighted_hamming_distance' of type Module(<module 'cassiopeia.solver.dissimilarity_functions' from '/home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/solver/dissimilarity_functions.py'>)
File "cass.py", line 27:
def my_distance_function(
<source elided>
return cas.solver.dissimilarity.weighted_hamming_distance(
^
During: typing of get attribute at cass.py (27)
File "cass.py", line 27:
def my_distance_function(
<source elided>
return cas.solver.dissimilarity.weighted_hamming_distance(
^
During: resolving callee type: type(CPUDispatcher(<function my_distance_function at 0x7f1322269f80>))
During: typing of call at /home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py (197)
During: resolving callee type: type(CPUDispatcher(<function my_distance_function at 0x7f1322269f80>))
During: typing of call at /home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py (197)
During: resolving callee type: type(CPUDispatcher(<function my_distance_function at 0x7f1322269f80>))
During: typing of call at /home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py (197)
During: resolving callee type: type(CPUDispatcher(<function my_distance_function at 0x7f1322269f80>))
During: typing of call at /home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py (197)
File "../../../../../home/eecs/ivalexander13/datadir/miniconda3/envs/fake_cass/lib/python3.7/site-packages/cassiopeia/data/utilities.py", line 197:
def _compute_dissimilarity_map(cm, C, missing_state_indicator, nb_weights):
<source elided>
dm[k] = dissimilarity_func(
s1, s2, missing_state_indicator, nb_weights
^
When inspecting the source code, I noticed that in /home/eecs/ivalexander13/datadir/Cassiopeia/cassiopeia/data/utilities.py
, there seems to be safeguards that are supposed to catch numba failures, as follows
## From utilities.py at lines 159 to 171
numbaize = True
try:
dissimilarity_func = numba.jit(dissimilarity_function, nopython=True)
except TypeError:
warnings.warn(
"Failed to numbaize dissimilarity function. Falling back to Python.",
CassiopeiaTreeWarning,
)
numbaize = False
dissimilarity_func = dissimilarity_function
## From utilities.py at lines 206 to 215
with warnings.catch_warnings():
warnings.simplefilter("ignore", category=numba.NumbaDeprecationWarning)
warnings.simplefilter("ignore", category=numba.NumbaWarning)
_compute_dissimilarity_map = numba.jit(
_compute_dissimilarity_map, nopython=numbaize
)
return _compute_dissimilarity_map(
cm, C, missing_state_indicator, nb_weights
)
When these two snippets are changed to completely avoid using numba, the bug disappears. So I think the bug is due to the numbaization functions not working properly, and somehow bypassing the try-catch.
Currently, as the end of the solve procedure, the ILPSolver maintains the names of the internal nodes as tuples representing the character vector used in the Steiner solution. These should be changed to the "cassiopeia_internal_node" naming convention adopted by the other solvers.
Implement utilities from recent KP-Tracer manuscript into the Cassiopeia codebase. Specifically:
Hi Developers,
I failed at install Cassiopeia at "python3 setup.py build". It looks like the arguments for PyCode_New are not in the correct format.
Please see the log below. Do you have any suggestions? Thanks!
running build
running build_py
running egg_info
writing cassiopeia_lineage.egg-info/PKG-INFO
writing dependency_links to cassiopeia_lineage.egg-info/dependency_links.txt
writing entry points to cassiopeia_lineage.egg-info/entry_points.txt
writing requirements to cassiopeia_lineage.egg-info/requires.txt
writing top-level names to cassiopeia_lineage.egg-info/top_level.txt
reading manifest file 'cassiopeia_lineage.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '' under directory 'cython'
writing manifest file 'cassiopeia_lineage.egg-info/SOURCES.txt'
running build_ext
building 'cassiopeia.TreeSolver.lineage_solver.solver_utils' extension
gcc -pthread -B /data/xliu23/binaries/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/xliu23/binaries/anaconda3/include/python3.8 -c cassiopeia/TreeSolver/lineage_solver/solver_utils.c -o build/temp.linux-x86_64-3.8/cassiopeia/TreeSolver/lineage_solver/solver_utils.o
cassiopeia/TreeSolver/lineage_solver/solver_utils.c: In function ‘__Pyx_InitCachedConstants’:
cassiopeia/TreeSolver/lineage_solver/solver_utils.c:5934:3: warning: passing argument 6 of ‘PyCode_New’ makes pointer from integer without a cast [enabled by default]
__pyx_codeobj__11 = (PyObject)__Pyx_PyCode_New(2, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__10, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_cassiopeia_TreeSolver_lineage_so_2, __pyx_n_s_node_parent, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__11)) __PYX_ERR(0, 6, __pyx_L1_error)
^
In file included from /data/xliu23/binaries/anaconda3/include/python3.8/compile.h:5:0,
from /data/xliu23/binaries/anaconda3/include/python3.8/Python.h:138,
from cassiopeia/TreeSolver/lineage_solver/solver_utils.c:16:
/data/xliu23/binaries/anaconda3/include/python3.8/code.h:122:28: note: expected ‘struct PyObject *’ but argument is of type ‘int’
PyAPI_FUNC(PyCodeObject ) PyCode_New(
^
cassiopeia/TreeSolver/lineage_solver/solver_utils.c:5934:3: warning: passing argument 14 of ‘PyCode_New’ makes integer from pointer without a cast [enabled by default]
__pyx_codeobj__11 = (PyObject)__Pyx_PyCode_New(2, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__10, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_cassiopeia_TreeSolver_lineage_so_2, __pyx_n_s_node_parent, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__11)) __PYX_ERR(0, 6, __pyx_L1_error)
^
In file included from /data/xliu23/binaries/anaconda3/include/python3.8/compile.h:5:0,
from /data/xliu23/binaries/anaconda3/include/python3.8/Python.h:138,
from cassiopeia/TreeSolver/lineage_solver/solver_utils.c:16:
/data/xliu23/binaries/anaconda3/include/python3.8/code.h:122:28: note: expected ‘int’ but argument is of type ‘struct PyObject *’
PyAPI_FUNC(PyCodeObject ) PyCode_New(
^
cassiopeia/TreeSolver/lineage_solver/solver_utils.c:5934:3: warning: passing argument 15 of ‘PyCode_New’ makes pointer from integer without a cast [enabled by default]
__pyx_codeobj__11 = (PyObject)__Pyx_PyCode_New(2, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__10, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_cassiopeia_TreeSolver_lineage_so_2, __pyx_n_s_node_parent, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__11)) __PYX_ERR(0, 6, __pyx_L1_error)
^
In file included from /data/xliu23/binaries/anaconda3/include/python3.8/compile.h:5:0,
from /data/xliu23/binaries/anaconda3/include/python3.8/Python.h:138,
from cassiopeia/TreeSolver/lineage_solver/solver_utils.c:16:
/data/xliu23/binaries/anaconda3/include/python3.8/code.h:122:28: note: expected ‘struct PyObject *’ but argument is of type ‘int’
PyAPI_FUNC(PyCodeObject ) PyCode_New(
^
cassiopeia/TreeSolver/lineage_solver/solver_utils.c:5934:3: error: too many arguments to function ‘PyCode_New’
__pyx_codeobj__11 = (PyObject)__Pyx_PyCode_New(2, 0, 6, 0, CO_OPTIMIZED|CO_NEWLOCALS, __pyx_empty_bytes, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_tuple__10, __pyx_empty_tuple, __pyx_empty_tuple, __pyx_kp_s_cassiopeia_TreeSolver_lineage_so_2, __pyx_n_s_node_parent, 6, __pyx_empty_bytes); if (unlikely(!__pyx_codeobj__11)) __PYX_ERR(0, 6, __pyx_L1_error)
^
In file included from /data/xliu23/binaries/anaconda/include/python3.8/compile.h:5:0,
from /data/xliu23/binaries/anaconda3/include/python3.8/Python.h:138,
from cassiopeia/TreeSolver/lineage_solver/solver_utils.c:16:
/data/xliu23/binaries/anaconda3/include/python3.8/code.h:122:28: note: declared here
PyAPI_FUNC(PyCodeObject *) PyCode_New(
Hi @mattjones315 and @Lioscro! I am running through your preprocessing pipeline with my GESTALT data, this time trying out the new global alignment option in cassiopeia.pp.align_sequences
(thanks so much for including that!). Reading the API, it looks like for my GESTALT data that I should leave the gap_open_penalty
and gap_extend_penalty
as the default values (since those are what were originally used for the GESTALT technology) and set method = "global"
. I did this and got the following error:
AttributeError Traceback (most recent call last)
<ipython-input-14-84b24ae4ce70> in <module>
4 umi_table = cs.pp.align_sequences(umi_table,
5 ref_filepath = target_site_reference,
----> 6 method = "global")
~/.local/lib/python3.7/site-packages/ngs_tools/logging.py in inner(*args, **kwargs)
60 try:
61 self.namespace = namespace
---> 62 return func(*args, **kwargs)
63 finally:
64 self.namespace = previous
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/utilities.py in wrapper(*args, **kwargs)
84 def wrapper(*args, **kwargs):
85 logger.debug(f"Keyword arguments: {kwargs}")
---> 86 return wrapped(*args, **kwargs)
87
88 return wrapper
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/utilities.py in wrapper(*args, **kwargs)
62 logger.info("Starting...")
63 try:
---> 64 return wrapped(*args, **kwargs)
65 finally:
66 logger.info(f"Finished in {time.time() - t0} s.")
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/pipeline.py in align_sequences(queries, ref_filepath, ref, gap_open_penalty, gap_extend_penalty, method, n_threads)
528 )(
529 delayed(align_partial)(ref, queries.loc[umi].seq)
--> 530 for umi in queries.index
531 ),
532 ):
~/.local/lib/python3.7/site-packages/ngs_tools/utils.py in __call__(self, *args, **kwargs)
221 def __call__(self, *args, **kwargs):
222 try:
--> 223 return Parallel.__call__(self, *args, **kwargs)
224 finally:
225 self._pbar.close()
~/.local/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
1039 # remaining jobs.
1040 self._iterating = False
-> 1041 if self.dispatch_one_batch(iterator):
1042 self._iterating = self._original_iterator is not None
1043
~/.local/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
857 return False
858 else:
--> 859 self._dispatch(tasks)
860 return True
861
~/.local/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
775 with self._lock:
776 job_idx = len(self._jobs)
--> 777 job = self._backend.apply_async(batch, callback=cb)
778 # A job can complete so quickly than its callback is
779 # called before we get here, causing self._jobs to
~/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
206 def apply_async(self, func, callback=None):
207 """Schedule a func to be run"""
--> 208 result = ImmediateResult(func)
209 if callback:
210 callback(result)
~/.local/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
570 # Don't delay the application, to avoid keeping the input
571 # arguments in memory
--> 572 self.results = batch()
573
574 def get(self):
~/.local/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
262 return [func(*args, **kwargs)
--> 263 for func, args, kwargs in self.items]
264
265 def __reduce__(self):
~/.local/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
261 with parallel_backend(self._backend, n_jobs=self._n_jobs):
262 return [func(*args, **kwargs)
--> 263 for func, args, kwargs in self.items]
264
265 def __reduce__(self):
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/alignment_utilities.py in align_global(ref, seq, substitution_matrix, gap_open_penalty, gap_extend_penalty)
78 aln = aligner.align(ref, seq)
79 return (
---> 80 ngs.sequence.alignment_to_cigar(aln.result_a, aln.result_b),
81 aln.pos_b,
82 aln.pos_a,
AttributeError: module 'ngs_tools.sequence' has no attribute 'alignment_to_cigar'
I just pulled the latest updates to the Master branch this morning, so I am not if it is perhaps an issue on my end with a package being out of date. Thanks so much for any help!
Hi! I followed the installation instructions, and the make test
runs successfully. While I can run the command cassiopeia-preprocess --help
, I could run reconstruct-lineage --help
. Any suggestions?
Here is my script for installation
conda create -n lineage_tree python=3.6 --yes
conda activate lineage_tree
pip install pytest
conda install cython --yes
make install
pip install --user ipykernel
python -m ipykernel install --user --name=lineage_tree
Hello! Thanks for creating this great tool to process data from single cell lineage tracing experiments. I am going through the process_fastq.ipynb
notebook with my own example data (from Raj et al. 2020's scGESTALT paper). Things seemed to be working well until I hit the process.call_indels
step, which returned an output .sam file that contained a header but was otherwise blank (i.e., no called indels for each sequence). Digging into this a little more, it looks like the input .sam file (generated from process.align_sequences
) is not valid. The reference listed in the @SQ
line is PCT48.ref
, which is the reference used in the process_fastq.ipynb
notebook. However, the alignments all show alignments to the dsRed
reference, which is what I was using for my example data. This seems to be hardcoded in the pipeline_utils.py
script (see line 18 and line 210). I am wondering if there is some issue with the PCT48.ref
being hardcoded that is causing the process.call_indels
step to fail on my end with different example data?
I've included the following files from my example run: 1) sw_aligned.sam (output .sam file from process.align_sequences
), 2) DsRed.fa (FASTA reference sequence), and 3) umi_table.sam (output .sam file from process.call_indels
).
gestalt_ex.zip
Versions:
Python 3.7.4
Emboss 6.6.0.0
Gurobi 9.1.1
Mac OS 10.15.7
I don't see any version numbers for Cassiopeia, but I just finished downloading and installing Cassiopeia yesterday so I should be working with the most current versions of everything.
Thank you for any help or insight!
Hi @mattjones315, I am working through the new Reconstructing trees with Cassiopeia tutorial with my GESTALT data, and so far mostly good (I think). However, I am having a problem with cassiopeia.pp.compute_empirical_indel_priors
if I open a new jupyter notebook and load in an allele_table
that was saved in my preprocessing notebook:
My code:
indel_priors = cassiopeia.pp.compute_empirical_indel_priors(allele_table, grouping_variables=['intBC', 'lineageGrp'])
The error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-14-2d54866e2893> in <module>
----> 1 indel_priors = cassiopeia.pp.compute_empirical_indel_priors(allele_table, grouping_variables=['intBC', 'lineageGrp'])
~/Desktop/code/Cassiopeia/cassiopeia/preprocess/utilities.py in compute_empirical_indel_priors(allele_table, grouping_variables, cut_sites)
689 for g in groups.index:
690
--> 691 alleles = np.unique(np.concatenate(groups.loc[g].values))
692 for a in alleles:
693 if "none" not in a.lower():
<__array_function__ internals> in unique(*args, **kwargs)
~/.pyenv/versions/3.6.12/lib/python3.6/site-packages/numpy/lib/arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
259 ar = np.asanyarray(ar)
260 if axis is None:
--> 261 ret = _unique1d(ar, return_index, return_inverse, return_counts)
262 return _unpack_tuple(ret)
263
~/.pyenv/versions/3.6.12/lib/python3.6/site-packages/numpy/lib/arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
320 aux = ar[perm]
321 else:
--> 322 ar.sort()
323 aux = ar
324 mask = np.empty(aux.shape, dtype=np.bool_)
TypeError: '<' not supported between instances of 'float' and 'str'
I am not sure why this is happening. Using the same allele_table
in my preprocessing notebook I am able to run cassiopeia.pp.compute_empirical_indel_priors
with no issue. And I guess related to this, is it appropriate for GESTALT data to be using compute_empirical_indel_priors
? Based on the tutorial, it sounds like this might be something that is unique to your system with intBCs, and yet if I don't run compute_empirical_indel_priors
then I have issues downstream generating the character_matrix
. And if I tweak indel_priors = cassiopeia.pp.compute_empirical_indel_priors(allele_table, grouping_variables=['intBC', 'lineageGrp'])
to instead be indel_priors = cassiopeia.pp.compute_empirical_indel_priors(allele_table, grouping_variables=['lineageGrp'])
(removing intBC since the GESTALT system doesn't have them, so perhaps I wouldn't want to group by them) then I get a tree that looks weird (almost no branches or nodes or leaves). This is using the vanilla_greedy
and 'nj' neighbor-joining solvers; I can't current use the ilp-solver
because my Gurobi license has expired, but I will look into renewing that if it looks like the ilp-solver
will be the most useful to me.
All that said, I ran through the rest of your tutorial as a continuation of my preprocessing steps in the same notebook and everything seems to be working for the most part. I am getting one other error and I have a few questions, and was not sure if I should open them all as separate issues, or keep them running in this thread as a potential singular resource to troubleshoot GESTALT data at the reconstruction stage.
Thanks!
Should be a simple fix, but just posting as an issue so that we get to it some time.
We should unify local importing to use relative imports (i.e. from .data import CassiopeiaTree
) instead of specifying the full package name (i.e. from cassiopeia.data import CassiopeiaTree
).
As of right now, most submodules use the absolute importing scheme.
Thanks for developing such a useful tool for reconstructing phylogenetic tree. when looking into the source code of assigning lineage groups to each cells, I am not sure about if it is correct in several details. There are two main questions confusing me as follows:
https://github.com/YosefLab/Cassiopeia/blob/master/cassiopeia/preprocess/lineage_utils.py
1.
in the last 3rd line, should it be 'prev_clust_size = piv_nolg.shape[0]' because we only need to assign lineage group to the undefined cells with number more than 'min_clust_size'. Is there something important I've missed or misunderstood?
the above intBC set was used in iteratively assignment of lineage groups and the bottom one master_intBC was applied to calculate kinship_score. but these two intBC sets were actually not same because they were filtered via different criteria. so I am wondering if it is more efficient and accurate than using the same criteria to selecting intBC set and master_intBC?
Hope I have got the meaning of your code and looking forward to your reply~
Trying to run process_fastq.ipynb, I get an error regarding the Sequencing module:
ModuleNotFoundError: No module named 'cassiopeia.ProcessingPipeline.process.sequencing'
It seems like this module is loaded in cassiopeia/ProcessingPipeline/process/collapseUMIReadsByMSALargeFile.py from a local directory:
sys.path.append("/home/jah/projects/sequencing/code")
Could you please point me to where I can access / install that module?
Other solvers explicitly group duplicates together. At the beginning of the solving process, they remove the duplicates and solve on the duplicate-cleaned character matrix, appending duplicates at the end. The DistanceSolver should adopt this convention, instead of just solving on the character matrix with the duplicates still included.
Hello,
Congratulations on your publication! I’m so impressed by the article and was hoping to use the gastrulation compendium data as a reference for my study.
I used the accession number GSE122187 and got all 10x data. Although you have provided CellStates file and CellStatesKernel file, I'm wondering if you guys could share a metadata file included the cell-by-cell annotation you used in the paper?
Really appreciate all your help!
#106
In the CassiopeiaTree, the character matrix attribute is called by self._character_matrix
in some methods and in other methods called by self.character_matrix
.
Sorry I didn't catch this earlier!
Hello,
I was following the tutorial for the preprocessing, and when I get to the "align" step, I get an error thrown.
IndexError: list index out of range
Looking forward to your answers!
Best,
xinyi
Hi,
When I ran the [error_correct_intbcs_to_whitelist], 6 intBCs were used in the intbc_whitelist and got error. Generally, how many intBCs are in the intbc_whitelist? How could i get the intbc_whitelist in the paper Single-cell lineages reveal the rates, routes, and drivers of metastasis in cancer xenografts
Looking forward to your reply
2 Issues with ILP
Add a method to the CassiopeiaTree
class for computing:
This should take advantage of the layers functionality in CassiopeiaTree
.
Line 34 in 4a4ba1a
This isn't required and you can't install python from pip?!
Currently we are invoking copy.deepcopy(self)
when trying to copy a CassiopeiaTree. It would be better to implement a custom version of both copy
and deepcopy
for users. See https://docs.python.org/3/library/copy.html , specifically:
In order for a class to define its own copy implementation, it can define special methods copy() and deepcopy(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the deepcopy() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.
I have a dataset with scarring-arrays and am trying to follow the preprocessing user guide here. When correcting the Cell UMIs to the whitelist, the program crashes very early with a segmentation fault during the error_correct_cellbcs_to_whitelist function call.
Code:
bam_fp = cas.pp.error_correct_cellbcs_to_whitelist(
bam_fp,
whitelist='data/3M-february-2018.txt',
output_directory=output_dir,
n_threads=n_threads,
)
stderr:
[3/4] Finding mismatches: 1%|1 | 2776/252224 [01:52<2:48:23, 24.69it/s]
/software/sge-2011.11/default/spool/fermat/job_scripts/9617950:
line 14: 11249 Segmentation fault python3 main.py
When rerunning the same code, the program crashes reproducible at the same index +-2.
Segmentation faults in python code are rare, so I suspect a compiled code to be the origin of the segfault.
I traced the logging message back to ngs-tools, which uses numba to speedup some private functions. Against my expectation, disabling numba compilation with NUMBA_DISABLE_JIT=1 did not prevent the segfault, so it seems that function is not the origin of the segfault.
I'd be glad about pointers how to make Cassiopeia work on my dataset. I looked into the bam file at the appropriate index, but the read looks like any other read. I'm currently running the pipeline on a another sample to see if that runs through. I thought about tracing with gdb to see where the segfault comes from, or using a separate package to do the cell-UMI correction, but I hoped that you have some insight how this problem could be fixed.
Hi all,
Firstly, thank you for creating an excellent pipeline and such a versatile set of tools! I have used your reconstruct.ipynb notebook extensively and it has been great!
However, I am having some issues with the preprocess.pynb notebook, specifically with the initial conversion from fastqs to an unmapped .bam. The fastq files that I am using are from your previous Quinn et al paper but unfortunately when I am trying to use the convert_fastqs_to_unmapped_bam() function I am met with the following error:
File "/home/george/anaconda3/envs/lineage_tracing/lib/python3.7/site-packages/ngs_tools/chemistry/Chemistry.py", line 129, in parse
raise IndexError('string index out of range')
IndexError: string index out of range
I know that this is probably just a stupid mistake on my part but I can't work out where I am going wrong.
If you have any suggestions it would be greatly appreciated!
Thanks
George
Hi, it would be really useful if you could provide an example iTol config file ~/.itolconfig, so that we know how to make our own file!
Thanks for the well-documented software package!
I just wanted to check in about reading the data sets published on Zenado (https://zenodo.org/record/3706351) with pickle. I am getting the error message below, potentially because the module was renamed from Cassiopeia
to cassiopeia
at some point. I was wondering which version of the Cassiopeia code on GitHub I should be using to read this data. Thank you!
>>> import cassiopeia
dir_path = /nfshomes/ekmolloy/.local/lib/python3.10/site-packages/cassiopeia/tools/fitness_estimator
>>> import pickle
>>> with open('true_network_characters_20_run_9.pkl', 'rb') as f:
... x = pickle.load(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'Cassiopeia'
Hi! I want to know whether the target site reference fasta in the Science article "Single-cell lineages reveal the rates, routes, and drivers of metastasis in cancer xenografts" is the same as the file "PCT48.ref.fasta" in the data directory or not. Looking forward to your reply. Thank you very much.
Hi again @mattjones315, I wanted follow up on the missing data issue that we discussed in issues #110 and #126. I went through the allele_table
generated from my GESTALT data and counted up the number of instances I had a "Missing" entry/missing allele identity for each of my cut sites (r1, r2, r3, ... r10 for GESTALT). For cut sites r1-r8 there were no "Missing" entries. For cut site r9, ~24% of the entries were "Missing" and for cut site r10, ~36% of the entries were "Missing". I know you mentioned here that I can probably get away with missing data for now so long as it is less than ~30%, which is the case for r9 but not for r10. It seems interesting to me that the first 8 cut sites have no issues, and that missing data only increases at the end of my barcode array. Does this mean that in general the alignment is probably working, or do you think that we'll need to implement the global-alignment strategy described in the original GESTALT paper? Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.