mrtomrod / scoary-2 Goto Github PK
View Code? Open in Web Editor NEWCalculate assocations between genes and traits
License: MIT License
Calculate assocations between genes and traits
License: MIT License
Hello @MrTomRod,
I have been trying to run scoary2 for a long time now. For some strange reason, scoary2 will throw up a columns missing error. I have literally copy pasted the names in the traits file to the gene presence absence file. Why is this happening? I have pasted the error here for your ready reference.
Cheers,
Arun Sai.
cols_missing={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
restrict_to={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
have_cols=set()
hi sir, hope you are doing great. Could you please help me with this:
(scoary-env) d@dpc:~/Documents/IMSAR/scoary$ scoary2 --genes gene_presence_absence.csv --traits traits.csv --outdir ./caca
Welcome to Scoary2! (0.0.11)
Loading traits...
Traceback (most recent call last):
File "/home/d/scoary-env/bin/scoary2", line 8, in <module>
sys.exit(main())
File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 289, in main
fire.Fire(scoary)
File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 88, in scoary
numeric_df, traits_df = load_traits(
File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 410, in load_traits
traits_df = load_binary(
File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 58, in load_binary
assert traits_df.index.is_unique, f'{traits=}: index not unique'
AssertionError: traits='traits.csv': index not unique
thanks, cheers!,
greatings from Chile, South America
btw, these are my input files:
gene_presence_absence.csv
traits.csv
We are trying to process a gene_presence_absence.csv
file from Roary with Scoary2. Previously, we were using Scoary (v1) and were able to get results (albeit with a few errors in the log file), whereas with Scoary2, the exact same command is failing.
Here are the versions we are using for each of these packages:
Scoary: 1.6.16
Scoary2: 0.0.15
Roary: 3.13.0
Here is the command we have been using with Scoary (v1):
scoary --genes roary_output/85/gene_presence_absence.csv \
--traits traits.csv \
--outdir scoary1_test
The process completes successfully, although it does print the following error several times:
ERROR: Some isolates in your gene presence absence file were not represented in your traits file. These will count as MISSING data and will not be included.
But this does not prevent us from getting results for isolates that were not missing, so I consider this to be acceptable.
The Scoary2 usage guide suggests that we should be able to use the exact same command with the same inputs for Scoary2, so here is what we are running:
scoary2 --genes roary_output/85/gene_presence_absence.csv \
--traits traits.csv \
--outdir scoary2_test
This is failing with the following trace:
Loading traits...
Loading genes...
/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py:45: DtypeWarning: Columns (15,21,22,24,26,28,30,36,39,43,60,66,72,73,74,77,83,86,90,92,94,101,108,112,119,124,125,128,135,149,150,152,154,155,160,172,173,176,177,178,179,180,183) have mixed types. Specify dtype option on import or set low_memory=False.
count_df = pd.read_csv(path, delimiter=delimiter, index_col=0)
Welcome to Scoary2! (0.0.15)
Traceback (most recent call last):
File "/home/ndusek/miniconda3/envs/scoary2/bin/scoary2", line 8, in <module>
sys.exit(main())
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 380, in main
fire.Fire(scoary)
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 132, in scoary
genes_orig_df, genes_bool_df = load_genes(
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 146, in load_genes
genes_orig_df, genes_bool_df = load_gene_count_file(genes, delimiter, restrict_to, ignore)
File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 54, in load_gene_count_file
assert not count_df.isna().values.any(), f'{path=}: contains NaN'
AssertionError: path='roary_output/85/gene_presence_absence.csv': contains NaN
The error contains NaN
is clear enough, but I don't understand why Scoary2 would be complaining about this all of a sudden when the original Scoary had no problem with it.
Any idea what's going on here?
Hi Tom,
I finally got around to working with Scoary-2.
I'm following your example command (using Docker on a Mac) in the tutorial but keep getting an error message stating that my traits file can't be found. The traits and genes files are in the working directory. Interestingly, and maybe this is on purpose, it seems that scoary looks for the traits file first; if I use a non-existent file name for the genes file, it will still call the traits file missing first. Below is the command that I used:
docker run --rm -v /Users/[Username]/Desktop/Scoary2/ troder/scoary-2 scoary --genes GeneCount_Scoary_Ecoli.tsv --gene-data-type 'gene-count:\t' --traits Ecoli_traits.tsv --trait-data-type 'gaussian:kmeans:\t' --outdir Output --n_permut 500 --n_cpus 1
Note that also tried the above command by adding "/Users/[Username]/Desktop/Scoary2/", "/Scoary2/", and "./". before the respective file names and output directory.
Below is the error code and I've also attached the files. Thanks in advance for your help!
Traceback (most recent call last):
File "/usr/local/bin/scoary", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 283, in main
fire.Fire(scoary)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 86, in scoary
numeric_df, traits_df = load_traits(
File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 422, in load_traits
numeric_df = load_numeric(
File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 85, in load_numeric
numeric_df = pd.read_csv(traits, delimiter=delimiter, index_col=0, dtype=dtypes, na_values=STR_NA_VALUES)
File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 575, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 934, in __init__
self._engine = self._make_engine(f, self.engine)
File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1218, in _make_engine
self.handles = get_handle( # type: ignore[call-overload]
File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 786, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'Ecoli_traits.tsv'
Hello! I have Roary output data (gene_presence_absence.csv) of about 450 isolates belonging to a single species. I have a traits file. But the traits are not binary. They are continuous. The pre-print says that scoary2 can work with continuous traits. How do I need to format my traits file and add a flag to scoary2, telling that my data is continuous in nature?
P.S: We are talking of something like the color of a petal, where there is incomplete dominance. The flower can be Pink (Dominant), White (Recessive) or Yellow (Hybrid). How do I pass these as traits?
Hi,
I am trying to get started with Scoary2 using Singularity in an HPC environment. However, I am facing some issues:
$ cd SingularityImages/
$ singularity pull docker://troder/scoary-2
$ singularity run scoary-2_latest.sif scoary2 --help
Traceback (most recent call last):
File "/usr/local/bin/scoary2", line 5, in <module>
from scoary.scoary import main
File "/usr/local/lib/python3.10/site-packages/scoary/__init__.py", line 1, in <module>
from .scoary import scoary
File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 7, in <module>
from .analyze_trait import analyze_trait, worker
File "/usr/local/lib/python3.10/site-packages/scoary/analyze_trait.py", line 7, in <module>
from fast_fisher.fast_fisher_numba import odds_ratio, test1t as fisher_exact_two_tailed
File "/usr/local/lib/python3.10/site-packages/fast_fisher/fast_fisher_numba.py", line 5, in <module>
cc = CC('fast_fisher_compiled')
File "/usr/local/lib/python3.10/site-packages/numba/pycc/cc.py", line 65, in __init__
self._toolchain = Toolchain()
File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 78, in __init__
self._raise_external_compiler_error()
File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 121, in _raise_external_compiler_error
raise RuntimeError(msg)
RuntimeError: Attempted to compile AOT function without the compiler used by `numpy.distutils` present. If using conda try:
#> conda install gcc_linux-64 gxx_linux-64
Thank you for the help
Best regards,
Adrián
Hi,
I am testing out Scoary-2 on some data I have previously used Scoary with, and I am getting the following error while it is running (but it does keep going)
Process Process-12: Traceback (most recent call last): File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 48, in worker local_result_container[trait] = analyze_trait_fn(trait, new_ns, proc_id) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 165, in analyze_trait_step_2_pairpicking result_df['empirical_p'] = permute_picking( File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 97, in permute_picking CONFINT_CACHE.set(unique_topology, n_pos_assoc, n_permut, permuted_estimators) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 36, in set self.cur.execute( sqlite3.OperationalError: database is locked
This appears multiple times for different processes while running pair picking
Hi Thomas,
I'm hoping to use Scoary2 but during pip
installation I am getting the following error:
~ pip install scoary-2
ERROR: Ignored the following versions that require a different python version: 0.0.10 Requires-Python >=3.10,<3.11; 0.0.11 Requires-Python >=3.10,<3.11; 0.0.12 Requires-Python >=3.10,<3.11; 0.0.13 Requires-Python >=3.10,<3.11; 0.0.3 Requires-Python >=3.10,<3.11; 0.0.4 Requires-Python >=3.10,<3.11; 0.0.5 Requires-Python >=3.10,<3.11; 0.0.6 Requires-Python >=3.10,<3.11; 0.0.7 Requires-Python >=3.10,<3.11; 0.0.9 Requires-Python >=3.10,<3.11
ERROR: Could not find a version that satisfies the requirement scoary-2 (from versions: none)
ERROR: No matching distribution found for scoary-2
Any advice on how to proceed with python versions? Thanks for your time!
When trying to open the overview.html
file to view results using the web app generated by Scoary2, I am seeing the following error:
Failed to load overview_plot.svg
This seems to be related to the following error message in the console:
Access to fetch at 'file:///Users/ndusek/Downloads/75_2/overview_plot.svg' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, isolated-app, chrome-extension, chrome, https, chrome-untrusted.
The above message is also shown for several other assets that cannot be served successfully. For reference, I am trying all this in Google Chrome 124.0.6367.119. Happy to provide any additional information that would be helpful.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.