GithubHelp home page GithubHelp logo

davised / automlsa2 Goto Github PK

View Code? Open in Web Editor NEW
5.0 5.0 0.0 83 KB

Automated Multi-Locus Sequence Analysis phylogenetic tree construction software

License: Other

Python 99.70% Shell 0.30%
bioinformatics phylogenetic-trees python3

automlsa2's People

Contributors

davised avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

automlsa2's Issues

Final tree file

Hi @davised,

I am currently conducting an MLSA in a large collection of bacterial genomes (~200). I successfully managed to run your amazing pipeline and then I obtained my final tree.

Is the .nex.treefile the the final tree?

Cheers,
Pablo

np.float issue

Hi @davised

First off, what an amazing package!!!!

Now, I am trying to use it with a small set of nine genomes and when I run the following command:
automlsa2 --dir 9-agro-genomes --query queries-e-coli.fas -t 2 -- test-w-ecoli-markers

I get the following error:
─────────────────────────── Traceback (most recent call last) ───────────────────────────╮ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/bin/automlsa2:8 in │ │ <module> │ │ │ │ 5 from automlsa2.__main__ import main │ │ 6 if __name__ == '__main__': │ │ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/automlsa2/__main__.py:79 in main │ │ │ │ 76 │ ) │ │ 77 │ │ │ 78 │ # BLAST output results, summary, and files -------------------------------- │ │ ❱ 79 │ blastres: pd.DataFrame = read_blast_results( │ │ 80 │ │ blastout, args.coverage, args.identity, args.threads │ │ 81 │ ) │ │ 82 │ blastfilt: pd.DataFrame = print_blast_summary( │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/automlsa2/blast_functions.py:161 in read_blast_results │ │ │ │ 158 │ │ 'qseqid': 'category', │ │ 159 │ │ 'sseqid': 'category', │ │ 160 │ │ 'saccver': 'string', │ │ ❱ 161 │ │ 'pident': np.float, │ │ 162 │ │ 'qlen': np.int, │ │ 163 │ │ 'length': np.int, │ │ 164 │ │ 'bitscore': np.float, │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/numpy/__init__.py:319 in __getattr__ │ │ │ │ 316 │ │ │ │ "corresponding NumPy scalar.", FutureWarning, stacklevel=2) │ │ 317 │ │ │ │ 318 │ │ if attr in __former_attrs__: │ │ ❱ 319 │ │ │ raise AttributeError(__former_attrs__[attr]) │ │ 320 │ │ │ │ 321 │ │ if attr == 'testing': │ │ 322 │ │ │ import numpy.testing as testing │ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: module 'numpy' has no attribute 'float'. np.floatwas a deprecated alias for the builtinfloat`. To avoid this error in existing
test-w-ecoli-markers.log

code, use float by itself. Doing this will not modify any behavior and is safe. If you
specifically wanted the numpy scalar type, use np.float64 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the
original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations`

I am adding the log just in case it is clearer like that.

It seems that a command is deprecated, right? How could I fix it?

Cheers,
Pablo

multiple gene copies

Hi @davised

First off, great tool!!!!!! This is a very very useful tool for me tbh

I am conducting a phylogenetic analysis using the 16S rRNA gene and it is not a single copy gene. As a result, some of the strains studied present 1-6 copies of the gene. So, in the automlsa2 pipiline, what happens? Does the pipeline generates a consensus sequence of all the copies or does it select a specific copy?

Thank you for your input.

Cheers,
Pablo

missing genes

Hi @davised

I am having a strange issue. I took a set of marker genes from a specific genome and then I used them as query in a collection of three genomes (one of those genomes is the one I took the marker genes from).

I ran the following command
automlsa2 --dir K599_test1 --query automlsa2_markers/K599_Hangzhou_markers/mlsa_markers_K599.fasta -p blastn K599_mlsa_test2_K599_markers

However, now, for some reason two of the marker genes are not found in the genome where they were taken from (see log file)
K599_mlsa_test2_K599_markers.log

I was just wondering if you could help me find out why these genes were not picked up.

Cheers,
Pablo

Missing genome in final output

Hi @davised,

I got (yet) another question.

After running the automlsa2 pipeline with all default parameters, I see that there is a list of genomes that were discarded. However, in this list, a very interesting genome for me is included (i.e. that genome is excluded from the analysis).

The presence_matrix.tsv shows that the gyrB gene is missing. My question is, what are my options to allow this genome to be included in my analysis? Should I lower the identity parameter? Alternatively, I could add myself the gyrB sequence to the gyrB multifasta file and use the --checkpoint parameter stating "prealign" as a command?

Cheers,
Pablo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.