GithubHelp home page GithubHelp logo

gpmap's People

Contributors

harmsm avatar lgoldbach avatar lperezmo avatar zsailer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpmap's Issues

Using VCF files?

I have DST information for drug resistant and susceptible bacteria. How can I convert them to a genotype-phenotype map for your gpmap & epistasis modules?

what to do about extra data?

Need to properly handle extra data that isn't used by GPMap. Right now, GPMap ignores extra data annotated on each genotype. Users however, will likely want to keep this information together.

The question is, should we expose these attributes through the GPMap interface. Or should they only live in the underlying GPMap DataFrame?

Move data under "data" key in dictionary/json format.

The dictionary and json formats put metadata and data at the same level, i.e.

{
  "name": "my_data",
  "description": "a sentence about my data",
  "genotypes": [...],
  "phenotypes": [...],
  .
  .
  .
}

I suggest that we move data underneath a "data" key. This aligns more with the GenotypePhenotypeMap anyways, which stores the data in a DataFrame under the data attribute. This cleanly separates the data from metadata.

This idea came up when thinking about how someone stores data in an Excel file or CSV file. Metadata can't easily go in those formats next to the data, so they usually live in a separate file (like a JSON or YAML file). If they were to merge, say, an Excel data and JSON meta data file into a single JSON file, I think it's more clear to have the data become a single field with subfields.

{
  "name": "my_data",
  "description": "a sentence about my data",
  "data" : {
    "genotypes": [...],
    "phenotypes": [...],
    .
    .
    .
  }
}

Is it suitable for whole gene?

I wonder if it's suitable for full sequence?
When I test the test_data and change it to amino acid genotypes, it works well. However, when I elongate the test sequence to 242 aa, it can't work.

And I got error at the step of SequenceSpace():
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 14.2 GiB for an array with shape (1912602624,) and data type float64

I hope to use it to analyse random mutantions on a 239 amino acids long gene.

GenotypePhenotypeMap throws error

Error:

AttributeError: GenotypePhenotypeMap instance has no attribute '_indices'

When using code from tutorial:

from gpmap import GenotypePhenotypeMap

Create list of genotypes and phenotypes

wildtype = "AA"
genotypes = ["AA", "AV", "AM", "VA", "VV", "VM"]
phenotypes = [1.0, 1.1, 1.4, 1.5, 2.0, 3.0]

Create GenotypePhenotypeMap object

gpm = GenotypePhenotypeMap(wildtype, genotypes, phenotypes)

Attempted on two machines with same result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.