davised / automlsa2 Goto Github PK
View Code? Open in Web Editor NEWAutomated Multi-Locus Sequence Analysis phylogenetic tree construction software
License: Other
Automated Multi-Locus Sequence Analysis phylogenetic tree construction software
License: Other
Hi @davised,
I am currently conducting an MLSA in a large collection of bacterial genomes (~200). I successfully managed to run your amazing pipeline and then I obtained my final tree.
Is the .nex.treefile the the final tree?
Cheers,
Pablo
Hi @davised
First off, what an amazing package!!!!
Now, I am trying to use it with a small set of nine genomes and when I run the following command:
automlsa2 --dir 9-agro-genomes --query queries-e-coli.fas -t 2 -- test-w-ecoli-markers
I get the following error:
─────────────────────────── Traceback (most recent call last) ───────────────────────────╮ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/bin/automlsa2:8 in │ │ <module> │ │ │ │ 5 from automlsa2.__main__ import main │ │ 6 if __name__ == '__main__': │ │ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(main()) │ │ 9 │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/automlsa2/__main__.py:79 in main │ │ │ │ 76 │ ) │ │ 77 │ │ │ 78 │ # BLAST output results, summary, and files -------------------------------- │ │ ❱ 79 │ blastres: pd.DataFrame = read_blast_results( │ │ 80 │ │ blastout, args.coverage, args.identity, args.threads │ │ 81 │ ) │ │ 82 │ blastfilt: pd.DataFrame = print_blast_summary( │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/automlsa2/blast_functions.py:161 in read_blast_results │ │ │ │ 158 │ │ 'qseqid': 'category', │ │ 159 │ │ 'sseqid': 'category', │ │ 160 │ │ 'saccver': 'string', │ │ ❱ 161 │ │ 'pident': np.float, │ │ 162 │ │ 'qlen': np.int, │ │ 163 │ │ 'length': np.int, │ │ 164 │ │ 'bitscore': np.float, │ │ │ │ /Users/pavlo/Downloads/test-10-genomes-agro-aug2023/automlsa2-env/lib/python3.10/site-p │ │ ackages/numpy/__init__.py:319 in __getattr__ │ │ │ │ 316 │ │ │ │ "corresponding NumPy scalar.", FutureWarning, stacklevel=2) │ │ 317 │ │ │ │ 318 │ │ if attr in __former_attrs__: │ │ ❱ 319 │ │ │ raise AttributeError(__former_attrs__[attr]) │ │ 320 │ │ │ │ 321 │ │ if attr == 'testing': │ │ 322 │ │ │ import numpy.testing as testing │ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: module 'numpy' has no attribute 'float'.
np.floatwas a deprecated alias for the builtin
float`. To avoid this error in existing
test-w-ecoli-markers.log
code, use float
by itself. Doing this will not modify any behavior and is safe. If you
specifically wanted the numpy scalar type, use np.float64
here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the
original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations`
I am adding the log just in case it is clearer like that.
It seems that a command is deprecated, right? How could I fix it?
Cheers,
Pablo
Hi @davised
First off, great tool!!!!!! This is a very very useful tool for me tbh
I am conducting a phylogenetic analysis using the 16S rRNA gene and it is not a single copy gene. As a result, some of the strains studied present 1-6 copies of the gene. So, in the automlsa2 pipiline, what happens? Does the pipeline generates a consensus sequence of all the copies or does it select a specific copy?
Thank you for your input.
Cheers,
Pablo
Hi @davised
I am having a strange issue. I took a set of marker genes from a specific genome and then I used them as query in a collection of three genomes (one of those genomes is the one I took the marker genes from).
I ran the following command
automlsa2 --dir K599_test1 --query automlsa2_markers/K599_Hangzhou_markers/mlsa_markers_K599.fasta -p blastn K599_mlsa_test2_K599_markers
However, now, for some reason two of the marker genes are not found in the genome where they were taken from (see log file)
K599_mlsa_test2_K599_markers.log
I was just wondering if you could help me find out why these genes were not picked up.
Cheers,
Pablo
Hi @davised,
I got (yet) another question.
After running the automlsa2 pipeline with all default parameters, I see that there is a list of genomes that were discarded. However, in this list, a very interesting genome for me is included (i.e. that genome is excluded from the analysis).
The presence_matrix.tsv shows that the gyrB gene is missing. My question is, what are my options to allow this genome to be included in my analysis? Should I lower the identity parameter? Alternatively, I could add myself the gyrB sequence to the gyrB multifasta file and use the --checkpoint parameter stating "prealign" as a command?
Cheers,
Pablo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.