sourmash-bio / sourmash Goto Github PK
View Code? Open in Web Editor NEWQuickly search, compare, and analyze genomic and metagenomic data sets.
Home Page: http://sourmash.readthedocs.io/en/latest/
License: Other
Quickly search, compare, and analyze genomic and metagenomic data sets.
Home Page: http://sourmash.readthedocs.io/en/latest/
License: Other
sourmash depends on PyYAML which depends on libyaml, but somehow pip install sourmash "succeeds" even if libyaml isn't there. It looks like a problem with PyYAML, perhaps.
It is rare that we want to update signatures after saving them, unless we're merging etc. Also, the software requirements for building and saving signatures may sometimes be quite different from the requirements for loading and comparing signatures... what's the best way to split up the functionality?
The class defines a __len__()
, but the length of what is returned by get_mins
is not necessarily consistent:
In [1]: import sourmash_lib._minhash
In [2]: mh = sourmash_lib._minhash.MinHash(1000, 21)
In [3]: len(mh)
Out[3]: 1000
In [4]: mh.get_mins()
Out[4]: []
I draw several fatal errors when I try to run what seems like a reasonable thing here.
git clone {repo}/sourmash.git
cd sourmash
pip install . pytest matplotlib
py.test .
Here is the top few dozen lines of the error logs from doing that:
(sourmash)jeremy@anjou:~/src/sourmash (master *=)$ py.test .
============================= test session starts ==============================
platform linux2 -- Python 2.7.11, pytest-2.9.2, py-1.4.31, pluggy-0.3.1
rootdir: /home/jeremy/src/sourmash, inifile: pytest.ini
collected 29 items / 1 errors
doc/api-example.rst F
doc/api.rst s
doc/command-line.rst s
doc/index.rst s
doc/more-info.rst s
doc/requirements.rst s
sourmash_lib/__init__.py FFFFFFFF
sourmash_lib/signature.py FFFFFFFF
sourmash_lib/test_sourmash.py .FFFFFF
==================================== ERRORS ====================================
________________ ERROR collecting sourmash_lib/test__minhash.py ________________
sourmash_lib/test__minhash.py:39: in <module>
from ._minhash import MinHash, hash_murmur
E ImportError: No module named _minhash
=================================== FAILURES ===================================
________________________ [doctest] doc/api-example.rst _________________________
007
008 Define two sequences:
009
010 >>> seq1 = "ATGGCA"
011 >>> seq2 = "AGAGCA"
012
013 Create two estimators using 3-mers, and add the sequences:
014
015 >>> import sourmash_lib
016 >>> E1 = sourmash_lib.Estimators(n=20, ksize=3)
UNEXPECTED EXCEPTION: ImportError('cannot import name _minhash',)
Traceback (most recent call last):
File "/usr/share/anaconda/anaconda2/envs/sourmash/lib/python2.7/doctest.py", line 1315, in __run
compileflags, 1) in test.globs
File "<doctest api-example.rst[3]>", line 1, in <module>
File "/home/jeremy/src/sourmash/sourmash_lib/__init__.py", line 27, in __init__
from . import _minhash
ImportError: cannot import name _minhash
/home/jeremy/src/sourmash/doc/api-example.rst:16: UnexpectedException
________________________________ test_jaccard_1 ________________________________
def test_jaccard_1():
> E1 = Estimators(n=5, ksize=20)
The commandline util sourmash
causes some structural grief in the git repo, since it has a name collision with the package directory. Conventional approaches would be to include it instead as sourmash_cli.py
somewhere and include it as an entry point in setup.py
.
Moving the commandline tool to a different name might also allow you to rename $repo/sourmash_lib
to $repo/sourmash
which is (at least for the pure python modules I'm familiar with) the usual approach.
❯❯❯ pip3 install sourmash
…
running build_ext
building 'sourmash_lib._minhash' extension
creating build/temp.linux-x86_64-3.5
creating build/temp.linux-x86_64-3.5/sourmash_lib
creating build/temp.linux-x86_64-3.5/third-party
creating build/temp.linux-x86_64-3.5/third-party/smhasher
/gsc/btl/linuxbrew/bin/gcc-5 -pthread -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -Os -w -pipe -march=core2 -fPIC -I/gsc/btl/linuxbrew/include -I/gsc/btl/linuxbrew/opt/openssl/include -I/gsc/btl/linuxbrew/opt/sqlite/include -I/gsc/btl/linuxbrew/opt/python3/include/python3.5m -c sourmash_lib/_minhash.cc -o build/temp.linux-x86_64-3.5/sourmash_lib/_minhash.o -std=c++11 -pedantic -O3
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
sourmash_lib/_minhash.cc:66:23: fatal error: _minhash.hh: No such file or directory
compilation terminated.
error: command '/gsc/btl/linuxbrew/bin/gcc-5' failed with exit status 1
Trying to set up a sourmash dev environment for the first time, I'm getting the following error.
python setup.py build_ext -i
running build_ext
copying build/lib.macosx-10.12-x86_64-3.5/sourmash_lib/_minhash.cpython-35m-darwin.so -> sourmash_lib
pip install '.[test]'
Processing /Users/standage/Software/sourmash
Requirement already satisfied (use --upgrade to upgrade): sourmash==1.1 from file:///Users/standage/Software/sourmash in ./env/lib/python3.5/site-packages
Requirement already satisfied: screed>=0.9 in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: PyYAML>=3.11 in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: ijson in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: pytest in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: pytest-cov in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: numpy in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: matplotlib in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: scipy in ./env/lib/python3.5/site-packages (from sourmash==1.1)
Requirement already satisfied: bz2file in ./env/lib/python3.5/site-packages (from screed>=0.9->sourmash==1.1)
Requirement already satisfied: py>=1.4.29 in ./env/lib/python3.5/site-packages (from pytest->sourmash==1.1)
Requirement already satisfied: coverage>=3.7.1 in ./env/lib/python3.5/site-packages (from pytest-cov->sourmash==1.1)
Requirement already satisfied: pytz in ./env/lib/python3.5/site-packages (from matplotlib->sourmash==1.1)
Requirement already satisfied: pyparsing!=2.0.0,!=2.0.4,!=2.1.2,>=1.5.6 in ./env/lib/python3.5/site-packages (from matplotlib->sourmash==1.1)
Requirement already satisfied: cycler in ./env/lib/python3.5/site-packages (from matplotlib->sourmash==1.1)
Requirement already satisfied: python-dateutil in ./env/lib/python3.5/site-packages (from matplotlib->sourmash==1.1)
Requirement already satisfied: six in ./env/lib/python3.5/site-packages (from cycler->matplotlib->sourmash==1.1)
python -m pytest
============================= test session starts ==============================
platform darwin -- Python 3.5.2, pytest-3.0.5, py-1.4.32, pluggy-0.4.0
rootdir: /Users/standage/Software/sourmash, inifile: pytest.ini
plugins: cov-2.4.0
collected 0 items / 1 errors
==================================== ERRORS ====================================
______________________________ ERROR collecting _______________________________
env/lib/python3.5/site-packages/_pytest/config.py:325: in _getconftestmodules
return self._path2confmods[path]
E KeyError: local('/Users/standage/Software/sourmash/env/lib/python3.5/site-packages/sourmash_lib')
During handling of the above exception, another exception occurred:
env/lib/python3.5/site-packages/_pytest/config.py:356: in _importconftest
return self._conftestpath2mod[conftestpath]
E KeyError: local('/Users/standage/Software/sourmash/env/lib/python3.5/site-packages/sourmash_lib/conftest.py')
During handling of the above exception, another exception occurred:
env/lib/python3.5/site-packages/_pytest/config.py:362: in _importconftest
mod = conftestpath.pyimport()
env/lib/python3.5/site-packages/py/_path/local.py:680: in pyimport
raise self.ImportMismatchError(modname, modfile, self)
E py._path.local.LocalPath.ImportMismatchError: ('sourmash_lib.conftest', '/Users/standage/Software/sourmash/sourmash_lib/conftest.py', local('/Users/standage/Software/sourmash/env/lib/python3.5/site-packages/sourmash_lib/conftest.py'))
During handling of the above exception, another exception occurred:
env/lib/python3.5/site-packages/py/_path/common.py:366: in visit
for x in Visitor(fil, rec, ignore, bf, sort).gen(self):
env/lib/python3.5/site-packages/py/_path/common.py:415: in gen
for p in self.gen(subdir):
env/lib/python3.5/site-packages/py/_path/common.py:415: in gen
for p in self.gen(subdir):
env/lib/python3.5/site-packages/py/_path/common.py:415: in gen
for p in self.gen(subdir):
env/lib/python3.5/site-packages/py/_path/common.py:415: in gen
for p in self.gen(subdir):
env/lib/python3.5/site-packages/py/_path/common.py:404: in gen
dirs = self.optsort([p for p in entries
env/lib/python3.5/site-packages/py/_path/common.py:405: in <listcomp>
if p.check(dir=1) and (rec is None or rec(p))])
env/lib/python3.5/site-packages/_pytest/main.py:670: in _recurse
ihook = self.gethookproxy(path)
env/lib/python3.5/site-packages/_pytest/main.py:575: in gethookproxy
my_conftestmodules = pm._getconftestmodules(fspath)
env/lib/python3.5/site-packages/_pytest/config.py:339: in _getconftestmodules
mod = self._importconftest(conftestpath)
env/lib/python3.5/site-packages/_pytest/config.py:364: in _importconftest
raise ConftestImportFailure(conftestpath, sys.exc_info())
E _pytest.config.ConftestImportFailure: ImportMismatchError('sourmash_lib.conftest', '/Users/standage/Software/sourmash/sourmash_lib/conftest.py', local('/Users/standage/Software/sourmash/env/lib/python3.5/site-packages/sourmash_lib/conftest.py'))
!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
=========================== 1 error in 0.46 seconds ============================
This was with a clean virtualenv, after cloning the repo, running pip install pytest
and then make test
.
Any ideas?
https://docs.python.org/3/library/argparse.html#sub-commands
... rather than rolling your own in the sourmash
CLI utility.
Not a blocking concern, but a site for future improvement (and transferring maintenance debt to the core Python library!).
Note existence of utils/compute-*-another-way.py
which use the mmh3 library to create sourmash MinHashes directly in Python.
The following pattern should work, but draws errors:
git checkout v0.9.4
mkvirtualenv --python=/usr/bin/python2.7 sourmash-py27
make clean install test
But if the install
target was using pip install .[test]
this would work. I've not quite got the PR formatted, but I'll just file the ticket here, as part of my second-pass review for JOSS.
I put together a notebook for tracking workflow using sourmash here:
https://github.com/bioinfonm/bat_metagenomics_2016/blob/master/python_code/sourmash_CAVE_bat_metagenomes.ipynb
Change log:
pylab.savefig includes the bbox_inches='tight'
pylab.savefig dumps to pdf
added two additional plots which use different metadata categories
network plot using nexworkx
Things do to:
Instead of separate label files have a single tab separated sheet with IDs and all metadata.
Get unions of each cluster
In 422943f I had to change to using 'load' to get past some problems that were showing up only on 2.7 (to do with the use of python types in YAML files). See https://stackoverflow.com/questions/9169025/how-can-i-add-a-python-tuple-to-a-yaml-file-using-pyyaml for an alternative resolution (I think we'd need 'long' and 'unicode').
Should we be using yaml safe_load instead?
I was wondering if it was worth looking at within signature differences to get an idea of the diversity of hashes? i.e using Jaccard, Hammings or Lenvestein metrics.
I need to check my understanding of how this whole hashing thing works for a metagenome:
I hacked together a BK-Tree script from someone else's code to look at which hashes might be similar within a given distance.
... for example, the urchin
, utils
, data
, refseq
and sigs
directories.
I could imagine that these directories are any of the following:
sourmash
is used forI wonder if a separate package of sourmash_demo
belongs in here somewhere. If it's useful for particular research questions, then maybe there should be addendum or supplementary packages that are bundled separately for PyPi clients or sourmash developers, (depending on who the audience is).
% ./sourmash compute -f data/GCF_000005845.2_ASM58
4v2_genomic.fna.gz
# running sourmash subcommand: compute
computing signatures for files: ['data/GCF_000005845.2_ASM584v2_genomic.fna.gz']
Computing signature for ksizes: [31]
Computing only DNA (and not protein) signatures.
Computing a total of 1 signatures.
Computing signature for ksizes: [31]
... reading sequences from data/GCF_000005845.2_ASM584v2_genomic.fna.gz
calculated 1 signatures for 1 sequences in data/GCF_000005845.2_ASM584v2_genomic.fna.gz
% ./sourmash sbt_index bar GCF_000005845.2_ASM584v2_genomic.fna.gz.sig
# running sourmash subcommand: sbt_index
loading 1 files into SBT
loaded 1 sigs; saving SBT under "bar".
% ./sourmash sbt_search bar GCF_000005845.2_ASM584v2_genomic.fna.gz.sig
# running sourmash subcommand: sbt_search
Traceback (most recent call last):
File "/Users/t/dev/jup/lib/python3.5/site-packages/khmer-2.0+706.g1745464-py3.5-macosx-10.6-intel.egg/khmer/__init__.py", line 140, in extract_nodegraph_info
"signature".format(filename) + str(signature))
ValueError: Node graph '.sbt.bar/bar.c7eda0a3534879b70046e3f66870ede2.sbt' is missing file type signatureb'[\n '
test_estimators.py::test_pickle
only checks the values on the Estimator object, and does not confirm that the new Estimator.mh
object behaves properly; we should check all the behavior, including n, k, dna, protein, track_abundance, and (assuming #83) max_hash.
An alternative and better approach might be to have accessors for all the internal MinHash foo, which I believe is being added in #79.
(possible duplicate ticket? I am having connectivity problems).
test.sh
file mentions sourmash clean
; this subcommand doesn't exist.
choices are (as I can see):
clean
subcommandmake clean
(?)The type field in the .sig file output is automatically type: mrnaseq
even if analyzing genomes.
'nuff said. Right now you need to wait until sourmash tries to search one of the leaf nodes to find out if you are searching a tree with the right kind of signature!
Why does a .sig
signature file declares itself of type "mrnaseq"
(while the input could well be genomic DNA) ?
https://github.com/dib-lab/sourmash/blob/master/sourmash_lib/signature.py#L36
Is this cruft? Should it be removed?
Can't figure out where it's gone.
I did:
mkvirtualenv sourmash-py27
pip install -r requirements.txt # implied but not specified in docs
make clean all test
this drew a failure because sphinx
wasn't installed.
Starting fresh:
pip install -r requirements.txt sphinx
make clean all test
succeeds (if you don't have a .tox
directory lying around messing things up).
I might add that all
is a funny name for a build target that builds the C++ library.
Minor concerns but still needs to be addressed for ease of developer entree.
for example:
sourmash_lib/__init__.py
is an unexpected place to find tests. Do they belong here, or can they be moved to a tests/test_*.py
constellation?
Similarly, the files in sourmash_lib/
include test files, which are installed by pip; this is not appropriate for clients who are not developing the sourmash
library itself. Probably, all of sourmash_tst_utils.py
and sourmash_lib/test_*.py
belong in a $repo/tests
directory or they need to be renamed.
Encountered error below with this command sourmash compute --protein -k 18,21 /mnt/scratch/ljcohen/mmetsp_tmp/SRR1300520.left.fq.head
. Looks like a problem with how 'N' are being handled?
(khmer_env)[ljcohen@dev-intel14 ~]$ sourmash compute --protein -k 18,21 /mnt/scratch/ljcohen/mmetsp_tmp/SRR1300520.left.fq.head
# running sourmash subcommand: compute
computing signatures for files: ['/mnt/scratch/ljcohen/mmetsp_tmp/SRR1300520.left.fq.head']
Computing signature for ksizes: [18, 21]
... reading sequences from /mnt/scratch/ljcohen/mmetsp_tmp/SRR1300520.left.fq.head
Traceback (most recent call last):
File "/mnt/home/ljcohen/khmer_env/bin/sourmash", line 9, in <module>
load_entry_point('sourmash==0.9.4', 'console_scripts', 'sourmash')()
File "/mnt/home/ljcohen/khmer_env/lib/python2.7/site-packages/sourmash_lib/__main__.py", line 338, in main
SourmashCommands()
File "/mnt/home/ljcohen/khmer_env/lib/python2.7/site-packages/sourmash_lib/__main__.py", line 42, in __init__
cmd(sys.argv[2:])
File "/mnt/home/ljcohen/khmer_env/lib/python2.7/site-packages/sourmash_lib/__main__.py", line 151, in compute
E.add_sequence(s, args.force)
File "/mnt/home/ljcohen/khmer_env/lib/python2.7/site-packages/sourmash_lib/__init__.py", line 65, in add_sequence
self.mh.add_sequence(seq, force)
ValueError: invalid DNA character in sequence: N
@ctb is on it, but I think there needs to be a few pieces of better documentation for how developers can get access to the test environment:
openjournals/joss-reviews#27
For example, #12 and #10 both suggest that there are a few additional steps to take to work as an active developer (rather than a client) of the C bindings.
Figures in command line docs are not found. I can't find the .png in the source code either.
This is still true after running make
; is there some other process that needs to be run on readthedocs so -- or my own Ubuntu machine -- so that the figures look better?
I've put together a quick notebook showing how to get at the hashes from the sourmash YAML output file.
It's available here:
https://github.com/bioinfonm/bat_metagenomics_2016/blob/master/python_code/sourmash_YAML_minhash_extract.ipynb
There is one signature file in the python_code folder for testing.
figure generation in both the interior library and in the commandline require that scipy and matplotlib be installed. I recommend an optional dependency named fig
or figures
.
The demo notebook is pretty great, and it would be nice to specify what's needed to run it by including another extras entry in setup.py
Then I could pip install sourmash[demo] and get everything at one go.
... pip install tries to install version 0.3.
Thus the README is wildly out of date (0.3 doesn't even compile on my machine).
Looks like tag v0.9.4
actually installs sourmash
0.9.4rc1
, same as master
.
Usual approach for prereleases is to have them on a branch (possibly 'master') and tag the exact release that sets the version number to v0.9.4
. Subsequent branches should update the release number to v0.9.5rc1
etc.
I've tried installing and compiling this package every which way I can, and nothing seems to get me past this error.. any help would be much appreciated!
(ENV)dd6/analysis/mash$ sourmash compute *.fa
# running sourmash subcommand: compute
computing signatures for files: ['11511_4#10.contigs_spades.fa', '11511_4#11.contigs_spades.fa']
Computing signature for ksizes: [31]
Traceback (most recent call last):
File "/lustre/scratch108/bacteria/dd6/cholera/ENV/bin/sourmash", line 339, in <module>
main()
File "/lustre/scratch108/bacteria/dd6/cholera/ENV/bin/sourmash", line 336, in main
SourmashCommands()
File "/lustre/scratch108/bacteria/dd6/cholera/ENV/bin/sourmash", line 43, in __init__
cmd(sys.argv[2:])
File "/lustre/scratch108/bacteria/dd6/cholera/ENV/bin/sourmash", line 138, in compute
protein=args.protein)
File "/lustre/scratch108/bacteria/dd6/cholera/ENV/local/lib/python2.7/site-packages/sourmash_lib/__init__.py", line 27, in __init__
from . import _minhash
ImportError: /lustre/scratch108/bacteria/dd6/cholera/ENV/local/lib/python2.7/site-packages/sourmash_lib/_minhash.so: undefined symbol: _ZSt24__throw_out_of_range_fmtPKcz
Having the hashing seed as parameter could be helpful.
@ctb I remember reading your blog about one use for sourmash and YAML would be to connect different researchers together.
I was wondering if such a database could help both to connect researchers and allow to search against a bunch of metagenomes.
I currently have over a 100 bat metagenomes from New Mexico and Arizona. I have completed most of the .sig for them. I would like to expand the metadata categories in the YAML files to included stuff like research group, contact, and maybe swiping some metadata categories from MGRAST. I would like to add in a link to the mash refseq70 data for each once as well. And eventually roll these into a on-line database.
I was wondering if github could be leveraged for this kinda of database with a live jupyter notebook as the interface?
Loading YAML signatures is unreasonably resource-intensive (therefore slow).
http://sourmash.readthedocs.io/en/latest/api.html
Not reading function docstrings, apparently.
I'm running sourmash compute
on a number of read files and I get this error. Obviously one of the reads contains an N
Traceback (most recent call last):
File "/home/cts/local/python34-virtualenv/bin/sourmash", line 11, in <module>
sys.exit(main())
File "/home/cts/local/python34-virtualenv/lib/python3.4/site-packages/sourmash_lib/__main__.py", line 338, in main
SourmashCommands()
File "/home/cts/local/python34-virtualenv/lib/python3.4/site-packages/sourmash_lib/__main__.py", line 42, in __init__
cmd(sys.argv[2:])
File "/home/cts/local/python34-virtualenv/lib/python3.4/site-packages/sourmash_lib/__main__.py", line 151, in compute
E.add_sequence(s, args.force)
File "/home/cts/local/python34-virtualenv/lib/python3.4/site-packages/sourmash_lib/__init__.py", line 65, in add_sequence
self.mh.add_sequence(seq, force)
ValueError: invalid DNA character in sequence: N
Is there a reason for throwing an error rather than simply dropping this kmer? I can get into the code to make changes but just wanted to know if this would affect the sketch too much?
I am sadly running RHEL6 which has g++ 4.4.7 that does not like the -std=c++11
flag. Anyway to install without upgrading my the compiler on the server?
Running setup.py bdist_wheel for sourmash ... error
Complete output from command /home/cts/local/python34-virtualenv/bin/python -u -c "import setuptools, tokenize;__file__='/export/data1/tmp/pip-build-uf91zm09/sourmash/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /export/data1/tmp/tmpk2syou66pip-wheel- --python-tag cp34:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.4
creating build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/__init__.py -> build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/fig.py -> build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/signature.py -> build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/sourmash_tst_utils.py -> build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/test__minhash.py -> build/lib.linux-x86_64-3.4/sourmash_lib
copying sourmash_lib/test_sourmash.py -> build/lib.linux-x86_64-3.4/sourmash_lib
running build_ext
building 'sourmash_lib._minhash' extension
creating build/temp.linux-x86_64-3.4
creating build/temp.linux-x86_64-3.4/sourmash_lib
creating build/temp.linux-x86_64-3.4/third-party
creating build/temp.linux-x86_64-3.4/third-party/smhasher
gcc -pthread -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/python34/include/python3.4m -c sourmash_lib/_minhash.cc -o build/temp.linux-x86_64-3.4/sourmash_lib/_minhash.o -std=c++11 -pedantic -O3
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
cc1plus: error: unrecognized command line option ‘-std=c++11’
error: command 'gcc' failed with exit status 1
I thought this might be of interest to folks using sourmash. I am sitting on 80+ metagenomes from bats. We swabbed their external surfaces (i.e., furred surfaces, ears, wings and uropatagia). I ran sourmash on a subset from one field area.
We see the bats grouping by species. But the Myotis Velifer are split into three groups. Each group corresponds to a different geographic region within the sampling area. The black colored text are cave-caught bats and the green are surface-netted.
Right now, --protein switches 'sourmash compute' over to computing only protein MinHashes. It would be nice to read the sequence once, and compute both protein and DNA signatures at the same time.
The functionality to store the protein/DNA signatures in a single .sig file is there, although it's a bit untested.
First thought -- provide a '--dna' boolean flag, and change the '--protein' flag to be bool as well.
Then change '--protein' to no longer "turn off" DNA sketch computation.
One other thought, in the command line tool we should be really clear about how many signatures we're computing!
Please include "doi: http://dx.doi.org/10.1101/029827" into paper.bib
even while tests don't run ( see #10 ), still better to declare a(n optional) dependency on pytest
, or whatever test package does the right thing.
Hello, dev teams.
I have tried to install the package under Linux Mint 18 Sarah, both via pip and all the instruction from here. The installation finished without error, but if I run it, I got the following error:
from ._minhash import MinHash
ImportError: /home/username/sourmash/sourmash_lib/_minhash.cpython-35m-x86_64-linux-gnu.so: undefined symbol: _ZSt24__throw_out_of_range_fmtPKcz
It seems a solution is needed. Thanks.
Thank you very much.
Hello,
Hopefully I am using the right words here. Is there a way to get at which minhashes are driving each split in the cluster (or the larger groups). I went through the Hash website and I think I understand how each hash is being created.
Ideally I'd like to grab a set of hashes that drives the split in the cluster and then run those against the RefSeq.msh to see what they are.
thanks,
ara
A few things for @lgautier and @luizirber to weigh in on --
the sourmash_lib.signature
module is badly named; I keep on colliding names with sig
, etc. I was thinking of renaming it sigutils
or something.
the Estimators
class is badly named. It's a legacy class anyway, held over from when MinHash didn't exist. It's still a convenient wrapper around the CPython MinHash
class, at least for now, since pure Python is much easier to write, change, and test than new C code. One approach might be to deprecate the direct use of MinHash
and wrap it more tightly in Estimators
, and then rename Estimators
to MinHash
. Or if that's too confusing, MinHashWrapper
. Thoughts?
regardless of what we do with the names, Estimators
should be moved out of __init__.py
.
Not hugely urgent but if you have super strong opinions, now is the time :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.