GithubHelp home page GithubHelp logo

uc-davis-molecular-computing / scadnano-python-package Goto Github PK

View Code? Open in Web Editor NEW
12.0 5.0 6.0 10.16 MB

Python scripting library for generating designs readable by scadnano.

Home Page: https://scadnano.org

License: MIT License

Python 73.15% SuperCollider 0.03% Shell 0.02% Scala 26.80%
dna-origami dna-structures cadnano dna-sequences

scadnano-python-package's People

Contributors

anelisecho avatar cesarg707 avatar cgevans avatar daniel-h- avatar danielhader avatar dave-doty avatar pexatus avatar tcosmo avatar unhumbleben avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

scadnano-python-package's Issues

imported cadnano design fails to export

Importing this cadnano design in the web interface, and then exporting it to cadnano, fails with the error

Error exporting file: '>' not supported between instances of 'NoneType' and 'int'

I haven't tried it yet from the command line with the Python library directly.

cadnano file attached as zip.

Double Layer Origami 12 LQ edits.zip

circular Strands

Add a Boolean field Strand.circular and draw a crossover from last substrand to first. Any equivalent cyclic permutation of the substrands should display in the same way.

Even if one doesn't wish the final design to have circular strands, this will help to avoid the problem that sometimes an intermediate design temporarily creates a circular Strand, even though subsequent edits make it linear again. Currently such designs are simply not allowed.

See UC-Davis-molecular-computing/scadnano#5.

re-arrange order of API documentation, and list classes somewhere (e.g., on the side)

Figure out how to configure Sphinx to re-arrange order of documentation in API.

Currently the API documentation puts things in the same order as the source code, which itself is not easy to change because of some dependencies of later type declarations on previous type declarations.

Also, see if there's a way to list the classes (Strand, Domain, etc.) on the side.

need tutorial

The API documentation for the scadnano Python scripting package is fairly extensive, but we need a simple tutorial that walks someone through the concepts one by one in a reasonable order, showing them how to write scripts to generate anything from simple designs up to a full-sized origami or non-origami system (e.g., tile-based design).

color should accept an integer

Currently the Strand.color field can be a hex string (e.g., "#ffaa12") or a map (e.g., {"r": 123, "g": 456, "789"}).

codenano also uses a decimal integer, e.g., 123456 interpreted as a 24-bit number encoding the RGB values 8 bits at a time. Accept this sort of color specification as well.

from_cadnano_v2 keyerror: 'vstrands'

I've got a simple scadnano script to load a .dna file and write the corresponding cadnano .json file in the same directory, which can be found here: https://github.com/jcalumba/oxdna_relax/blob/master/export.py .
My example .dna file can be found here:
https://github.com/jcalumba/oxdna_relax/blob/master/export.dna .

When I try to run this script with any given *.dna file, I get the following error:
C:\Users\jcalumba\scadnano\scadnano-python-package\scadnano\backup>python ../export.py export.dna
2​
Traceback (most recent call last):​
File "../export.py", line 16, in ​
origami = design.from_cadnano_v2('.', name)​
File "C:\Users\jcalumba\scadnano\scadnano-python-package\scadnano\scadnano.py", line 2207, in from_cadnano_v2​
num_bases = len(cadnano_v2_design['vstrands'][0]['scaf'])​
KeyError: 'vstrands'

To fix it, I tried to pass in an empty list of vstrands when I instantiated my DNAdesign object, seen here:
design = sc.DNADesign(helices=[], strands=[], vstrands=[], grid=sc.square)

but got a keyword error:

File "../export.py", line 15, in
design = main()​
File "../export.py", line 5, in main​
design = sc.DNADesign(helices=[], strands=[], vstrands=[], grid=sc.square)​
TypeError: init() got an unexpected keyword argument 'vstrands'

_from_scadnano_json fails on the attached json

{ "version": "0.3.0", "helices": [ {"grid_position": [0, 0]}, {"max_offset": 32, "grid_position": [0, 1]} ], "strands": [ { "color": "#0066cc", "substrands": [ {"helix": 0, "forward": true, "start": 0, "end": 32} ], "is_scaffold": true } ] }

Exception raised: '>' not supported between instances of 'NoneType' and 'int'

allow Helix.position to specify x,y,z under "origin"

scadnano represents position like this:

{ "x": 0, "y": 0, "z": 0, "pitch": 0 , "roll": 0 , "yaw": 0}

but codenano represents them like this:

{ "origin": { "x": 0, "y": 0, "z": 0}, "pitch": 0 , "roll": 0 , "yaw": 0}

Although I prefer the former to keep the file format flat and readable (and scadnano will continue to write in that format), we should be able to read the latter.

remove DNAOrigamiDesign

The web interface has no concept of a special origami design type. A DNADesign is implicitly an origami if at least one strand is a scaffold, and multiple strands can be scaffolds.

The Python package is inconsistent, because many Strand's can have the field is_scaffold set to true, but only one of them can be equal to DNAOrigamiDesign.scaffold.

It would be cleaner just to remove DNAOrigamiDesign. There could still be convenience methods for scaffold(s) such as assigning M13 to the first strand labeled as a scaffold.

rewrite origami_rectangle to use add_nick and add_*_crossover

The current version of origami_rectangle.create specifies each Strand by explicitly listing its Substrands. This is tedious and error-prone.

It is simpler to draw two long strands, one in each direction, on each Helix, and then use the methods add_nick, add_half_crossover, and add_full_crossover, just as one would do using cadnano to manually design the origami.

See examples/6-helix-bundle-honeycomb.py for an example.

cadnano_v2 import shifts helix positions

I imported the "squarenut.json" origami from here: https://www.dropbox.com/s/zsm3xlnyurnffd9/Nature09.zip?file_subpath=%2FNature09

According to the included squarenut.svg (in the file squarenut.zip), the helices should be positioned this way:

image

But importing it in scadnano, they appear this way (as though each is shifted to the right by one):

image

Is this a bug in the import? Or is it a mistake in the way I am implementing the honeycomb lattice in scadnano? (Documented here.)

I intended for it to interpret honeycomb coordinates exactly the same as cadnano, so if I got that wrong, I'll just switch it. But first, I wanted to check to see if it is a bug in the import.

I don't have a working cadnano installation, so it's difficult for me to test this.

Put another way, scadnano assumes that the helix at the origin (helix 21 in the two designs shown) has neighbors above it, below and to the right, and below and to the left, with empty space below it, above and to the right, and above and to the left. The cadnano design seems to invert this.

The file squarenut.zip has the cadnano .json file, the cadnano exported SVG file, and the scadnano .dna file created after importing in the web interface (which calls the Python scripting interface, which is why I posted it in the Python repo).

support Extensions on the end of a Strand

This gives a nice way to specify toeholds and extensions common in DNA strand displacement designs.

See also UC-Davis-molecular-computing/scadnano#34.

This is not supported currently in scadnano, and it will take a lot of effort to support it, since much of the logic pervading the code assumes the first and last substrands are Domain's. There's nothing requiring this in principle, but it will be a headache to change it. Also, some design decisions will have to be made along the way. 

For example, the default staple name for exporting sequences is the same as cadnano, where the staple is named after the (helix,offset) pairs of its 5' and 3' ends. We could do the same thing, where if the end of a strand is an Extension, we use the adjacent Domain's to name the staple. But little decisions like this will probably have to happen all over the place.

add Strand.reverse() and DNADesign.reverse_all() methods

scadnano can create designs not describable in cadnano, for example using loopouts or parallel crossovers.

One incompatible feature is that scaffolds can go reverse on even-numbered helices; in cadnano they always go forward on even-numbered helices and reverse on odd-numbered helices. But if this is the only incompatibility, then it can easily be fixed by reversing the polarity of all strands: reverse the direction of each Substrand, and reverse the order of the list of all Substrands. A method to do this to a whole DNADesign would simply do this to all strands.

custom extension in write_scadnano_file

Currently the filename can be customized. If extension is specified instead (should be mutually exclusion with filename), then keep the same name as the script, but use the custom extension instead of the default. (Mostly important for write_scadnano_file, but could be used e.g., with writing DNA sequences.

pip install raises FileNotFoundError for README.md

With scadnano already installed, attempting to upgrade raises an exception:

(base) PS C:\Users\pexat> pip install --upgrade scadnano
Collecting scadnano
  Downloading https://files.pythonhosted.org/packages/02/56/74995fd209b99b246b652b4065bda926ef2c01254a94e814364a4e6e0b09/scadnano-0.2.0.tar.gz (46kB)
     |████████████████████████████████| 51kB 234kB/s
    ERROR: Complete output from command python setup.py egg_info:
    ERROR: Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\Users\pexat\AppData\Local\Temp\pip-install-9u2no0ri\scadnano\setup.py", line 8, in <module>
        with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\pexat\\AppData\\Local\\Temp\\pip-install-9u2no0ri\\scadnano\\README.md'
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\pexat\AppData\Local\Temp\pip-install-9u2no0ri\scadnano\

It appears to upgrade okay, however.

is_scaffold bug?

In Strand.to_json_serializable, currently at scadnano.py line 1249.

I strongly believe that you want to check hasattr(self, is_scaffold) and not hasattr(self, is_scaffold_key) as it is ccurently.

support DNA modifications such as biotins/fluorophores/etc.

The easiest way to do this is simply to ignore IDT-style modifications such as /5Biosg/ACGT written into the DNA sequence.

For the long term, support having these modifications specified as part of the Strand object, which can be displayed in scadnano.

organize examples into user-facing official examples and miscellaneous examples for internal testing

Currently, there are a large number of designs in the examples folder, many not well-commented. I've put those there whenever I made up a new design, typically to test a new feature in the Python package or the web interface.

These should be broken into a "miscellaneous" folder that's like the current one, and an official "user-friendly" examples folder. The latter should contain a small number of well-commented examples intended to showcase the features of scadnano.

switch honeycomb coordinate system to match cadnano

Currently, the scadnano honeycomb coordinate system is a subset of the hex honeycomb system, so simply omits certain (x,y) coordinates.

cadnano uses a bijection between pairs of integers and coordinates. Switch to this would involve more conversion between hex and honeycomb coordinates, but would perhaps make it easy to think about honeycomb coordinates as being uniquely identified by rows and columns.

See UC-Davis-molecular-computing/scadnano#169

allow chained commands for less verbose way to add Strands to DNADesign

codenano allows "chained commands" for a less verbose way to create strands. For example see here: https://docs.rs/codenano/0.5.1/codenano/

design.strand(0, 0).to(31)
    .cross(1).to(10)
    .cross(2).to(21);
// Now its reverse complement:
design.strand(2, 21).to(10)
   .cross(1).to(31)
   .cross(0).to(0);

This is equivalent to the more verbose Python code:

domain_11 = sc.Domain(0, True, 0, 31)
domain_12 = sc.Domain(1, False, 10, 31)
domain_13 = sc.Domain(2, True, 10, 21)
strand1 = sc.Strand([domain_11, domain_12, domain_13])
domain_21 = sc.Domain(2, False, 10, 21)
domain_22 = sc.Domain(1, True, 10, 31)
domain_23 = sc.Domain(0, False, 0, 31)
strand1 = sc.Strand([domain_11, domain_12, domain_13])

or the slightly less verbose

strand1 = sc.Strand([
    sc.Domain(0, True, 0, 31),
    sc.Domain(1, False, 10, 31),
    sc.Domain(2, True, 10, 21),
])
strand1 = sc.Strand([
    sc.Domain(2, False, 10, 21),
    sc.Domain(1, True, 10, 31),
    sc.Domain(0, False, 0, 31),
])

Note that this requires crossovers to be "vertical", i.e., they have the same offset on the from Helix and the to Helix. Perhaps that can be overridden with an optional second parameter to cross. But since the most common crossover is vertical, the less verbose method is superior, since it reduces the amount of redundant information that needs to be specified.

cadnano_v2 import of squarenut design adds extra strands

In this paper, SI Figure S5 shows a "squarenut design". It is available as a cadnano file here.

I put together a zip file squarenut.zip with three files: the cadnano squarenut.json file, the imported scadnano squarenut.dna file, and the SVG image squarenut.svg showing how the design should appear in cadnano.

This is what the first two helices look like in the SVG file (but there are similar problems in every helix)

image

This is how they appear in scadnano:

image

As you can see, there are extra staple strands on the left and right ends of the two helices (in red in scadnano). Each appears to be confined to a single helix, but have long-range horizontal crossovers from one side of the helix to the other.

In the file squarenut.dna, here are the two extra strands on helix 0:

{
  "color": "#cc0000",
  "substrands": [
    {"helix": 0, "forward": false, "start": 9, "end": 35},
    {"helix": 0, "forward": false, "start": 133, "end": 135}
  ]
}

and

{
  "color": "#cc0000",
  "substrands": [
    {"helix": 0, "forward": false, "start": 112, "end": 133}
  ]
},

make ColorCycler part of DNADesign

Ensure that it starts on the first color for every newly created DNADesign.

This will ensure a consistent cycling of colors, instead of being dependent on the global ColorCycler variable.

put version in only one place

I tried to make a file scadnano_version, that various other files such as scadnano.py, setup.py, conf.py, could import to see the version.

No matter how I did it, some code would fail to import it properly, whether in CI unit testing, building the docs, or packaging for distribution in PyPI.

Figure out a way to write the current version in exactly one file. Maybe this is as simple as making it a text file that is read, rather than imported, since the Python import rules are so Byzantine.

save parts of JSON not used by scadnano

Specify for each part of the design, when it is from from a JSON file, it should store all the fields that are not used by scadnano, and write them back out on serialization. This will allow scadnano to edit designs created by other programs that use fields scadnano doesn't use.

This is essentially the same as this feature in the web interface:

UC-Davis-molecular-computing/scadnano#6

allow "domains" on each BoundSubstrand and Loopout

scadnano (and cadnano) logically break up each Strand into Substrand's based on which Helix they are bound in. It is common to furthermore break up each of these into logical "domains" based on which other Strand is bound.

However, some schemes such as the Wang/Thachuk/Soloveichik "leakless circuits" have several consecutive domains between the same Strand's on the same Helix. Thus, we would not want to enforce that when a BoundSubstrand switches from one Domain to another, this necessary occurs at a point where the other Strand switches identity.

If Set<Domain> domains is present as a top-level field in DNADesign, then Strand.dna_sequence should be null for any Strand that has Substrand's with domains. A domain has two fields, String name and int length. For each Substrand with a List<Domain> domains field, the sum of the lengths must equal Substrand.dna_length(). Any domain with name name is considered complementary to any domain with name name + '*'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.