GithubHelp home page GithubHelp logo

Comments (16)

ctb avatar ctb commented on June 8, 2024

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

Titus,

I installed sourmash through pip. I am currently running it by the command line through a jupyter notebook. Is getting the union easier by running it through python?

Thanks!
ara

from sourmash.

ctb avatar ctb commented on June 8, 2024

Right, it's not a built in feature at the command line interface, but it's
relatively easy to do via Python.

Can you provide me with an example of the sort of workflow you want to use?

e.g.

  • calculate signatures for a bunch of sequences
  • cluster signatures at some threshold
  • retrieve all signatures that cluster with a specific query signature
  • build union or intersection of signatures within a cluster

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

The work flow you describes is pretty much what I am looking for:

calculate signatures for a bunch of sequences
cluster signatures at some threshold
retrieve all signatures that cluster with a specific query signature
build union or intersection of signatures within a cluster

Longer term I'd like to be able to see if there is a signature that occurs across all samples. I am trying to sort out the species signatures and any geographic signatures. Currently our metagenomes are clustering by bat species with some exceptions.

Does sourmash use the same procedure that Mash uses to find similar hashes? And if so is that part coded in python?

One thing I wanted to try to code for was a table of "fuzzy" hashes that occur in each sample.
fuzzyhash1 fuzzyhash2 fuzzyhash3
bat1 4 1 0
bat2 8 0 0
bat3 3 2 3

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

Are signatures and hashes the same thing?

from sourmash.

ctb avatar ctb commented on June 8, 2024

On Fri, Jul 29, 2016 at 09:55:22AM -0700, Ara Winter wrote:

Are signatures and hashes the same thing?

Here's how I'm using the terms:

Hash: individual k-mer

Signature: collection of hashes

from sourmash.

ctb avatar ctb commented on June 8, 2024

On Fri, Jul 29, 2016 at 09:30:15AM -0700, Ara Winter wrote:

The work flow you describes is pretty much what I am looking for:

calculate signatures for a bunch of sequences
cluster signatures at some threshold
retrieve all signatures that cluster with a specific query signature
build union or intersection of signatures within a cluster

ok! I'm not sure if I'll get to it this week but please do bump this issue
in a week or so.

Longer term I'd like to be able to see if there is a signature that occurs across all samples. I am trying to sort out the species signatures and any geographic signatures. Currently our metagenomes are clustering by bat species with some exceptions.

ok - I can give you reasons why it might not work, but it's worth a try!

Does sourmash use the same procedure that Mash uses to find similar hashes? And if so is that part coded in python?

Yes (it's mash compatible) and no (not coded in python). It used to be and
I could put together a Python description of the algorithm if you like.

One thing I wanted to try to code for was a table of "fuzzy" hashes that occur in each sample.
fuzzyhash1 fuzzyhash2 fuzzyhash3
bat1 4 1 0
bat2 8 0 0
bat3 3 2 3

Would the fuzzyhash1 / fuzzyhash2 lists of hashes come from some sort of
clustering or grouping of hashes in the signatures?

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

ok - I can give you reasons why it might not work, but it's worth a try!

Oh, I'd like to hear why this might not work. I've read through the Mash paper and I am still trying to wrangle with the concepts in there.

Would the fuzzyhash1 / fuzzyhash2 lists of hashes come from some sort of
clustering or grouping of hashes in the signatures?

Yes, I was imagining a clustering plus picking a representative hash (similar to 16S OTU clustering).

I am in my second week of my post-doc and I have some time to develop/use new tools. Using signatures is at the top of my list since I stumbled across sourmash. I have a few other questions that I will start another thread for.

from sourmash.

ctb avatar ctb commented on June 8, 2024

On Mon, Aug 01, 2016 at 07:30:31AM -0700, Ara Winter wrote:

ok - I can give you reasons why it might not work, but it's worth a try!

Oh, I'd like to hear why this might not work. I've read through the Mash paper and I am still trying to wrangle with the concepts in there.

Basically, the hashes in the signature give you extraordinarily sensitive
ability to detect similar species, but this falls off quickly as species
diverge. The MetaPalette paper (http://msystems.asm.org/content/1/3/e00020-16) gives some good input here wrt to k-mer sizes and species/strain divergence.

So I'd worry about moderately distant genomes being completely disjoint
in signature space.

Would the fuzzyhash1 / fuzzyhash2 lists of hashes come from some sort of
clustering or grouping of hashes in the signatures?

Yes, I was imagining a clustering plus picking a representative hash (similar to 16S OTU clustering).

You'd probably want to work with as many hashes as possible, for sensitivity
raesons.

I am in my second week of my post-doc and I have some time to develop/use new tools. Using signatures is at the top of my list since I stumbled across sourmash. I have a few other questions that I will start another thread for.

ok! note that the YAML signature files are easy to parse with many
languages, and the overall idea is surprisingly trivial, so you could
easily develop your own code to work with the output of sourmash -
I'd go with what you're comfortable with rather than relying too heavily
on this code too much :)

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

Thanks @ctb ! I will read through the MetaPalette paper later today.

I just wrote a little python script to parse the YAML signature files so I could start hacking away.

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

So I'd worry about moderately distant genomes being completely disjoint
in signature space.

So if you have a decently diverse metagenome, this same issue would crop up? Does increasing the number of hashes help with this?

from sourmash.

ctb avatar ctb commented on June 8, 2024

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

very cool! if you want to share at some point it could be useful to others
(or you can tell me what I can provide through this project's docs to help
people like you in the future!)

Gladly! Right now it's just parsing one file. I need to fix it so it loops through all the .sig files. I am not the best at using github. So what is a good way to share the notebook with you through github?

Thanks again.

from sourmash.

ctb avatar ctb commented on June 8, 2024

from sourmash.

anmwinter avatar anmwinter commented on June 8, 2024

Morning @ctb I thought I would give the the union hashes a little bump here.

What is the commands for running sourmash through python? I saw a few .py files in the repo.

thanks!
ara

from sourmash.

ctb avatar ctb commented on June 8, 2024

Documented all of this over in the API docs a while back, closing!

https://sourmash.readthedocs.io/en/latest/api-example.html#set-operations-on-hashes

from sourmash.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.