GithubHelp home page GithubHelp logo

scoary-2's People

Contributors

mrtomrod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scoary-2's Issues

Scoary2 has problem with headers

Hello @MrTomRod,
I have been trying to run scoary2 for a long time now. For some strange reason, scoary2 will throw up a columns missing error. I have literally copy pasted the names in the traits file to the gene presence absence file. Why is this happening? I have pasted the error here for your ready reference.

Cheers,
Arun Sai.

cols_missing={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
restrict_to={'A58', 'A131', 'A49', 'A147', 'A134', 'A113', 'A46', 'A53', 'A65', 'A117', 'A41', 'A75', 'A25', 'A83', 'A92', 'A12', 'A56', 'A59', 'A118', 'A132', 'A150', 'A51', 'A97', 'A32', 'A119', 'A23', 'A112', 'A72', 'A61', 'A68', 'A136', 'A96', 'A105', 'A43', 'A22', 'A101', 'A129', 'A2', 'A94', 'A116', 'A145', 'A37', 'A104', 'A88', 'A143', 'A8', 'A86', 'A26', 'A62', 'A144', 'A57', 'A149', 'A13', 'A63', 'A67', 'A9', 'A29', 'A142', 'A3', 'A15', 'A33', 'A107', 'A93', 'A74', 'A100', 'A78', 'A31', 'A89', 'A84', 'A98', 'A16', 'A139', 'A111', 'A120', 'A39', 'A4', 'A146', 'A45', 'A151', 'A121', 'A99', 'A36', 'A20', 'A77', 'A27', 'A87', 'A52', 'A109', 'A55', 'A6', 'A11', 'A40', 'A110', 'A73', 'A124', 'A141', 'A80', 'A64', 'A28', 'A30', 'A130', 'A95', 'A90', 'A138', 'A5', 'A127', 'A66', 'A21', 'A85', 'A79', 'A137', 'A17', 'A91', 'A81', 'A103', 'A35', 'A14', 'A18', 'A19', 'A54', 'A24', 'A133', 'A1', 'A44', 'A114', 'A82', 'A140', 'A106', 'A42', 'A125', 'A70', 'A38', 'A108', 'A47', 'A115', 'A122', 'A48', 'A102', 'A71', 'A10', 'A76', 'A69', 'A50', 'A34', 'A148', 'A126', 'A135', 'A7', 'A128', 'A60', 'A123'}
have_cols=set()

AssertionError: traits='traits.csv': index not unique

hi sir, hope you are doing great. Could you please help me with this:

(scoary-env) d@dpc:~/Documents/IMSAR/scoary$ scoary2 --genes gene_presence_absence.csv --traits traits.csv --outdir ./caca
Welcome to Scoary2! (0.0.11)
Loading traits...
Traceback (most recent call last):
  File "/home/d/scoary-env/bin/scoary2", line 8, in <module>
    sys.exit(main())
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 289, in main
    fire.Fire(scoary)
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/d/scoary-env/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/scoary.py", line 88, in scoary
    numeric_df, traits_df = load_traits(
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 410, in load_traits
    traits_df = load_binary(
  File "/home/d/scoary-env/lib/python3.10/site-packages/scoary/load_traits.py", line 58, in load_binary
    assert traits_df.index.is_unique, f'{traits=}: index not unique'
AssertionError: traits='traits.csv': index not unique

thanks, cheers!,

greatings from Chile, South America

btw, these are my input files:
gene_presence_absence.csv
traits.csv

"contains NaN" error when running Scoary2 on gene_presence_absence.csv from Roary

We are trying to process a gene_presence_absence.csv file from Roary with Scoary2. Previously, we were using Scoary (v1) and were able to get results (albeit with a few errors in the log file), whereas with Scoary2, the exact same command is failing.

Here are the versions we are using for each of these packages:

Scoary: 1.6.16
Scoary2: 0.0.15
Roary: 3.13.0

Scoary (v1) results

Here is the command we have been using with Scoary (v1):

scoary --genes roary_output/85/gene_presence_absence.csv \
       --traits traits.csv \
       --outdir scoary1_test

The process completes successfully, although it does print the following error several times:

ERROR: Some isolates in your gene presence absence file were not represented in your traits file. These will count as MISSING data and will not be included.

But this does not prevent us from getting results for isolates that were not missing, so I consider this to be acceptable.

Scoary2 results

The Scoary2 usage guide suggests that we should be able to use the exact same command with the same inputs for Scoary2, so here is what we are running:

scoary2 --genes roary_output/85/gene_presence_absence.csv \
        --traits traits.csv \
        --outdir scoary2_test

This is failing with the following trace:

Loading traits...
Loading genes...
/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py:45: DtypeWarning: Columns (15,21,22,24,26,28,30,36,39,43,60,66,72,73,74,77,83,86,90,92,94,101,108,112,119,124,125,128,135,149,150,152,154,155,160,172,173,176,177,178,179,180,183) have mixed types. Specify dtype option on import or set low_memory=False.
  count_df = pd.read_csv(path, delimiter=delimiter, index_col=0)
Welcome to Scoary2! (0.0.15)
Traceback (most recent call last):
  File "/home/ndusek/miniconda3/envs/scoary2/bin/scoary2", line 8, in <module>
    sys.exit(main())
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 380, in main
    fire.Fire(scoary)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/scoary.py", line 132, in scoary
    genes_orig_df, genes_bool_df = load_genes(
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 146, in load_genes
    genes_orig_df, genes_bool_df = load_gene_count_file(genes, delimiter, restrict_to, ignore)
  File "/home/ndusek/miniconda3/envs/scoary2/lib/python3.10/site-packages/scoary/load_genes.py", line 54, in load_gene_count_file
    assert not count_df.isna().values.any(), f'{path=}: contains NaN'
AssertionError: path='roary_output/85/gene_presence_absence.csv': contains NaN

The error contains NaN is clear enough, but I don't understand why Scoary2 would be complaining about this all of a sudden when the original Scoary had no problem with it.

Any idea what's going on here?

Trait file not found

Hi Tom,

I finally got around to working with Scoary-2.

I'm following your example command (using Docker on a Mac) in the tutorial but keep getting an error message stating that my traits file can't be found. The traits and genes files are in the working directory. Interestingly, and maybe this is on purpose, it seems that scoary looks for the traits file first; if I use a non-existent file name for the genes file, it will still call the traits file missing first. Below is the command that I used:

docker run --rm -v /Users/[Username]/Desktop/Scoary2/ troder/scoary-2 scoary --genes GeneCount_Scoary_Ecoli.tsv --gene-data-type 'gene-count:\t' --traits Ecoli_traits.tsv --trait-data-type 'gaussian:kmeans:\t' --outdir Output --n_permut 500 --n_cpus 1

Note that also tried the above command by adding "/Users/[Username]/Desktop/Scoary2/", "/Scoary2/", and "./". before the respective file names and output directory.

Below is the error code and I've also attached the files. Thanks in advance for your help!

Traceback (most recent call last):
  File "/usr/local/bin/scoary", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 283, in main
    fire.Fire(scoary)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 86, in scoary
    numeric_df, traits_df = load_traits(
  File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 422, in load_traits
    numeric_df = load_numeric(
  File "/usr/local/lib/python3.10/site-packages/scoary/load_traits.py", line 85, in load_numeric
    numeric_df = pd.read_csv(traits, delimiter=delimiter, index_col=0, dtype=dtypes, na_values=STR_NA_VALUES)
  File "/usr/local/lib/python3.10/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 680, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 575, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 934, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "/usr/local/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1218, in _make_engine
    self.handles = get_handle(  # type: ignore[call-overload]
  File "/usr/local/lib/python3.10/site-packages/pandas/io/common.py", line 786, in get_handle
    handle = open(
FileNotFoundError: [Errno 2] No such file or directory: 'Ecoli_traits.tsv'

Ecoli_traits.txt
GeneCount_Scoary_Ecoli.txt

Continuous_traits

Hello! I have Roary output data (gene_presence_absence.csv) of about 450 isolates belonging to a single species. I have a traits file. But the traits are not binary. They are continuous. The pre-print says that scoary2 can work with continuous traits. How do I need to format my traits file and add a flag to scoary2, telling that my data is continuous in nature?
P.S: We are talking of something like the color of a petal, where there is incomplete dominance. The flower can be Pink (Dominant), White (Recessive) or Yellow (Hybrid). How do I pass these as traits?

Scoary2 won't run in HPC

Hi,

I am trying to get started with Scoary2 using Singularity in an HPC environment. However, I am facing some issues:

$ cd SingularityImages/
$ singularity pull docker://troder/scoary-2
$ singularity run scoary-2_latest.sif scoary2 --help
  Traceback (most recent call last):
    File "/usr/local/bin/scoary2", line 5, in <module>
      from scoary.scoary import main
    File "/usr/local/lib/python3.10/site-packages/scoary/__init__.py", line 1, in <module>
      from .scoary import scoary
    File "/usr/local/lib/python3.10/site-packages/scoary/scoary.py", line 7, in <module>
      from .analyze_trait import analyze_trait, worker
    File "/usr/local/lib/python3.10/site-packages/scoary/analyze_trait.py", line 7, in <module>
      from fast_fisher.fast_fisher_numba import odds_ratio, test1t as fisher_exact_two_tailed
    File "/usr/local/lib/python3.10/site-packages/fast_fisher/fast_fisher_numba.py", line 5, in <module>
      cc = CC('fast_fisher_compiled')
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/cc.py", line 65, in __init__
      self._toolchain = Toolchain()
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 78, in __init__
      self._raise_external_compiler_error()
    File "/usr/local/lib/python3.10/site-packages/numba/pycc/platform.py", line 121, in _raise_external_compiler_error
      raise RuntimeError(msg)
  RuntimeError: Attempted to compile AOT function without the compiler used by `numpy.distutils` present. If using conda try:
  
  #> conda install gcc_linux-64 gxx_linux-64

Thank you for the help

Best regards,
Adrián

sqlite3.OperationalError: database is locked

Hi,

I am testing out Scoary-2 on some data I have previously used Scoary with, and I am getting the following error while it is running (but it does keep going)

Process Process-12: Traceback (most recent call last): File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 48, in worker local_result_container[trait] = analyze_trait_fn(trait, new_ns, proc_id) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/analyze_trait.py", line 165, in analyze_trait_step_2_pairpicking result_df['empirical_p'] = permute_picking( File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 97, in permute_picking CONFINT_CACHE.set(unique_topology, n_pos_assoc, n_permut, permuted_estimators) File "/home/ubuntu/scratch/software/miniforge3/envs/scoary2/lib/python3.10/site-packages/scoary/permutations.py", line 36, in set self.cur.execute( sqlite3.OperationalError: database is locked

This appears multiple times for different processes while running pair picking

to do

  • add metadata to dendrogram
  • cpd search feature
  • add dendrogram to trait view -> allow recoloring
  • in dendrogram: instead of only gene presence-absence, show gene count!
  • does Roary gene_presence_absence.csv include gene identifiers? (yes)

Installation with ```pip```

Hi Thomas,

I'm hoping to use Scoary2 but during pip installation I am getting the following error:

~ pip install scoary-2
ERROR: Ignored the following versions that require a different python version: 0.0.10 Requires-Python >=3.10,<3.11; 0.0.11 Requires-Python >=3.10,<3.11; 0.0.12 Requires-Python >=3.10,<3.11; 0.0.13 Requires-Python >=3.10,<3.11; 0.0.3 Requires-Python >=3.10,<3.11; 0.0.4 Requires-Python >=3.10,<3.11; 0.0.5 Requires-Python >=3.10,<3.11; 0.0.6 Requires-Python >=3.10,<3.11; 0.0.7 Requires-Python >=3.10,<3.11; 0.0.9 Requires-Python >=3.10,<3.11
ERROR: Could not find a version that satisfies the requirement scoary-2 (from versions: none)
ERROR: No matching distribution found for scoary-2

Any advice on how to proceed with python versions? Thanks for your time!

Failed to load overview_plot.svg

When trying to open the overview.html file to view results using the web app generated by Scoary2, I am seeing the following error:

Failed to load overview_plot.svg

This seems to be related to the following error message in the console:

Access to fetch at 'file:///Users/ndusek/Downloads/75_2/overview_plot.svg' from origin 'null' has been blocked by CORS policy: Cross origin requests are only supported for protocol schemes: http, data, isolated-app, chrome-extension, chrome, https, chrome-untrusted.

The above message is also shown for several other assets that cannot be served successfully. For reference, I am trying all this in Google Chrome 124.0.6367.119. Happy to provide any additional information that would be helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.