mrtomrod / scoary-2 Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 1.0 6.82 MB

Calculate assocations between genes and traits

License: MIT License

Python 99.69% Dockerfile 0.31%

scoary-2's People

Contributors

Stargazers

Watchers

scoary-2's Issues

Scoary2 has problem with headers

Hello @MrTomRod,
I have been trying to run scoary2 for a long time now. For some strange reason, scoary2 will throw up a columns missing error. I have literally copy pasted the names in the traits file to the gene presence absence file. Why is this happening? I have pasted the error here for your ready reference.

Cheers,
Arun Sai.

cols_missing={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
restrict_to={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
have_cols=set()

AssertionError: traits='traits.csv': index not unique

hi sir, hope you are doing great. Could you please help me with this:

(scoary-env) d@dpc:~/Documents/IMSAR/scoary$ scoary2 --genes gene_presence_absence.csv --traits traits.csv --outdir ./caca
Welcome to Scoary2! (0.0.11)
Loading traits...
Traceback (most recent call last):
  File "/home/d/scoary-env/bin/scoary2", line 8, in <module>
    sys.exit(main())
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 289, in main
    fire.Fire(scoary)
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 88, in scoary
    numeric_df, traits_df = load_traits(
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 410, in load_traits
    traits_df = load_binary(
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 58, in load_binary
    assert traits_df.index.is_unique, f'{traits=}: index not unique'
AssertionError: traits='traits.csv': index not unique

thanks, cheers!,

greatings from Chile, South America

btw, these are my input files:
gene_presence_absence.csv
traits.csv

"contains NaN" error when running Scoary2 on gene_presence_absence.csv from Roary

We are trying to process a gene_presence_absence.csv file from Roary with Scoary2. Previously, we were using Scoary (v1) and were able to get results (albeit with a few errors in the log file), whereas with Scoary2, the exact same command is failing.

Here are the versions we are using for each of these packages:

Scoary: 1.6.16
Scoary2: 0.0.15
Roary: 3.13.0

Scoary (v1) results

Here is the command we have been using with Scoary (v1):

scoary --genes roary_output/85/gene_presence_absence.csv \
       --traits traits.csv \
       --outdir scoary1_test

The process completes successfully, although it does print the following error several times:

ERROR: Some isolates in your gene presence absence file were not represented in your traits file. These will count as MISSING data and will not be included.

But this does not prevent us from getting results for isolates that were not missing, so I consider this to be acceptable.

Scoary2 results

The Scoary2 usage guide suggests that we should be able to use the exact same command with the same inputs for Scoary2, so here is what we are running:

scoary2 --genes roary_output/85/gene_presence_absence.csv \
        --traits traits.csv \
        --outdir scoary2_test

This is failing with the following trace:

Loading traits...
Loading genes...
/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py:45: DtypeWarning: Columns (15,21,22,24,26,28,30,36,39,43,60,66,72,73,74,77,83,86,90,92,94,101,108,112,119,124,125,128,135,149,150,152,154,155,160,172,173,176,177,178,179,180,183) have mixed types. Specify dtype option on import or set low_memory=False.
  count_df = pd.read_csv(path, delimiter=delimiter, index_col=0)
Welcome to Scoary2! (0.0.15)
Traceback (most recent call last):
  File "/home/ndusek/miniconda3/envs/scoary2/bin/scoary2", line 8, in <module>
    sys.exit(main())
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 380, in main
    fire.Fire(scoary)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 132, in scoary
    genes_orig_df, genes_bool_df = load_genes(
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 146, in load_genes
    genes_orig_df, genes_bool_df = load_gene_count_file(genes, delimiter, restrict_to, ignore)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 54, in load_gene_count_file
    assert not count_df.isna().values.any(), f'{path=}: contains NaN'
AssertionError: path='roary_output/85/gene_presence_absence.csv': contains NaN

The error contains NaN is clear enough, but I don't understand why Scoary2 would be complaining about this all of a sudden when the original Scoary had no problem with it.

Any idea what's going on here?

Trait file not found

Hi Tom,

I finally got around to working with Scoary-2.

I'm following your example command (using Docker on a Mac) in the tutorial but keep getting an error message stating that my traits file can't be found. The traits and genes files are in the working directory. Interestingly, and maybe this is on purpose, it seems that scoary looks for the traits file first; if I use a non-existent file name for the genes file, it will still call the traits file missing first. Below is the command that I used:

docker run --rm -v /Users/[Username]/Desktop/Scoary2/ troder/scoary-2 scoary --genes GeneCount_Scoary_Ecoli.tsv --gene-data-type 'gene-count:\t' --traits Ecoli_traits.tsv --trait-data-type 'gaussian:kmeans:\t' --outdir Output --n_permut 500 --n_cpus 1

Note that also tried the above command by adding "/Users/[Username]/Desktop/Scoary2/", "/Scoary2/", and "./". before the respective file names and output directory.

Below is the error code and I've also attached the files. Thanks in advance for your help!

Traceback (most recent call last):
  File "/usr/local/bin/scoary", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 283, in main
    fire.Fire(scoary)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 86, in scoary
    numeric_df, traits_df = load_traits(
  File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 422, in load_traits
    numeric_df = load_numeric(
  File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 85, in load_numeric
    numeric_df = pd.read_csv(traits, delimiter=delimiter, index_col=0, dtype=dtypes, na_values=STR_NA_VALUES)
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 934, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1218, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 786, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'Ecoli_traits.tsv'

Ecoli_traits.txt
GeneCount_Scoary_Ecoli.txt

Continuous_traits

Hello! I have Roary output data (gene_presence_absence.csv) of about 450 isolates belonging to a single species. I have a traits file. But the traits are not binary. They are continuous. The pre-print says that scoary2 can work with continuous traits. How do I need to format my traits file and add a flag to scoary2, telling that my data is continuous in nature?
P.S: We are talking of something like the color of a petal, where there is incomplete dominance. The flower can be Pink (Dominant), White (Recessive) or Yellow (Hybrid). How do I pass these as traits?

Scoary2 won't run in HPC

Hi,

I am trying to get started with Scoary2 using Singularity in an HPC environment. However, I am facing some issues:

$ cd SingularityImages/
$ singularity pull docker://troder/scoary-2
$ singularity run scoary-2_latest.sif scoary2 --help
  Traceback (most recent call last):
    File "/usr/local/bin/scoary2", line 5, in <module>
      from scoary.scoary import main
    File "/usr/local/lib/python3.10/site-packages/scoary/__init__.py", line 1, in <module>
      from .scoary import scoary
    File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 7, in <module>
      from .analyze_trait import analyze_trait, worker
    File "/usr/local/lib/python3.10/site-packages/scoary/analyze_trait.py", line 7, in <module>
      from fast_fisher.fast_fisher_numba import odds_ratio, test1t as fisher_exact_two_tailed
    File "/usr/local/lib/python3.10/site-packages/fast_fisher/fast_fisher_numba.py", line 5, in <module>
      cc = CC('fast_fisher_compiled')
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/cc.py", line 65, in __init__
      self._toolchain = Toolchain()
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 78, in __init__
      self._raise_external_compiler_error()
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 121, in _raise_external_compiler_error
      raise RuntimeError(msg)
  RuntimeError: Attempted to compile AOT function without the compiler used by `numpy.distutils` present. If using conda try:
  
  #> conda install gcc_linux-64 gxx_linux-64

Thank you for the help

Best regards,
Adrián

sqlite3.OperationalError: database is locked

Hi,

I am testing out Scoary-2 on some data I have previously used Scoary with, and I am getting the following error while it is running (but it does keep going)

Process Process-12: Traceback (most recent call last): File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 48, in worker local_result_container[trait] = analyze_trait_fn(trait, new_ns, proc_id) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 165, in analyze_trait_step_2_pairpicking result_df['empirical_p'] = permute_picking( File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 97, in permute_picking CONFINT_CACHE.set(unique_topology, n_pos_assoc, n_permut, permuted_estimators) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 36, in set self.cur.execute( sqlite3.OperationalError: database is locked

This appears multiple times for different processes while running pair picking

to do

add metadata to dendrogram
cpd search feature
add dendrogram to trait view -> allow recoloring
in dendrogram: instead of only gene presence-absence, show gene count!
does Roary gene_presence_absence.csv include gene identifiers? (yes)

Installation with ```pip```

Hi Thomas,

I'm hoping to use Scoary2 but during pip installation I am getting the following error:

~ pip install scoary-2
ERROR: Ignored the following versions that require a different python version: 0.0.10 Requires-Python >=3.10,<3.11; 0.0.11 Requires-Python >=3.10,<3.11; 0.0.12 Requires-Python >=3.10,<3.11; 0.0.13 Requires-Python >=3.10,<3.11; 0.0.3 Requires-Python >=3.10,<3.11; 0.0.4 Requires-Python >=3.10,<3.11; 0.0.5 Requires-Python >=3.10,<3.11; 0.0.6 Requires-Python >=3.10,<3.11; 0.0.7 Requires-Python >=3.10,<3.11; 0.0.9 Requires-Python >=3.10,<3.11
ERROR: Could not find a version that satisfies the requirement scoary-2 (from versions: none)
ERROR: No matching distribution found for scoary-2

Any advice on how to proceed with python versions? Thanks for your time!

Failed to load overview_plot.svg

When trying to open the overview.html file to view results using the web app generated by Scoary2, I am seeing the following error:

Failed to load overview_plot.svg

This seems to be related to the following error message in the console:

Access to fetch at 'file:///Users/ndusek/Downloads/75_2/overview_plot.svg' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, isolated-app, chrome-extension, chrome, https, chrome-untrusted.

The above message is also shown for several other assets that cannot be served successfully. For reference, I am trying all this in Google Chrome 124.0.6367.119. Happy to provide any additional information that would be helpful.

mrtomrod / scoary-2 Goto Github PK

scoary-2's People

Contributors

Stargazers

Watchers

scoary-2's Issues

Scoary (v1) results

Scoary2 results

Recommend Projects

Recommend Topics

Recommend Org

Jobs