uc-davis-molecular-computing / scadnano-python-package Goto Github PK
View Code? Open in Web Editor NEWPython scripting library for generating designs readable by scadnano.
Home Page: https://scadnano.org
License: MIT License
Python scripting library for generating designs readable by scadnano.
Home Page: https://scadnano.org
License: MIT License
The PyPI page for scadnano shows the markdown in README.md but does not format it properly. See if there is a way to specify something that is formatted properly.
Importing this cadnano design in the web interface, and then exporting it to cadnano, fails with the error
Error exporting file: '>' not supported between instances of 'NoneType' and 'int'
I haven't tried it yet from the command line with the Python library directly.
cadnano file attached as zip.
Allow helix indices to be non-consecutive.
This implies changing the type of DNADesign.helices
from List<Helix>
to Map<int, Helix>
.
This is necessary to allow cadnano designs to be imported while preserving helix indices, since it allows non-consecutive helix indices.
Add a Boolean field Strand.circular
and draw a crossover from last substrand to first. Any equivalent cyclic permutation of the substrands should display in the same way.
Even if one doesn't wish the final design to have circular strands, this will help to avoid the problem that sometimes an intermediate design temporarily creates a circular Strand
, even though subsequent edits make it linear again. Currently such designs are simply not allowed.
Currently the only option is to give a custom sequence, or use rotation 5588 of the standard "p7249" M13 sequence.
In addition to p7249, there are also p7560 and p8064 (https://www.tilibit.com/collections/scaffold-dna). Furthermore, the user should be able to specify the rotation (default 5588 for M13).
Currently DNA modifications can only be displayed as a string. Support a custom image to be specified, e.g., a star for a fluorophore.
Related to UC-Davis-molecular-computing/scadnano#226
As in codenano: https://docs.rs/codenano/0.5.1/codenano/
Figure out how to configure Sphinx to re-arrange order of documentation in API.
Currently the API documentation puts things in the same order as the source code, which itself is not easy to change because of some dependencies of later type declarations on previous type declarations.
Also, see if there's a way to list the classes (Strand, Domain, etc.) on the side.
Also with_sequence
, but with_domain_sequence
can do a partial assignment of a DNA sequence.
Fix Helix.__post_init__()
to only check major_ticks
against max_offset
and min_offset
if they are all non-None.
The API documentation for the scadnano Python scripting package is fairly extensive, but we need a simple tutorial that walks someone through the concepts one by one in a reasonable order, showing them how to write scripts to generate anything from simple designs up to a full-sized origami or non-origami system (e.g., tile-based design).
Currently the Strand.color field can be a hex string (e.g., "#ffaa12"
) or a map (e.g., {"r": 123, "g": 456, "789"}
).
codenano also uses a decimal integer, e.g., 123456
interpreted as a 24-bit number encoding the RGB values 8 bits at a time. Accept this sort of color specification as well.
I've got a simple scadnano script to load a .dna file and write the corresponding cadnano .json file in the same directory, which can be found here: https://github.com/jcalumba/oxdna_relax/blob/master/export.py .
My example .dna file can be found here:
https://github.com/jcalumba/oxdna_relax/blob/master/export.dna .
When I try to run this script with any given *.dna file, I get the following error:
C:\Users\jcalumba\scadnano\scadnano-python-package\scadnano\backup>python ../export.py export.dna
2
Traceback (most recent call last):
File "../export.py", line 16, in
origami = design.from_cadnano_v2('.', name)
File "C:\Users\jcalumba\scadnano\scadnano-python-package\scadnano\scadnano.py", line 2207, in from_cadnano_v2
num_bases = len(cadnano_v2_design['vstrands'][0]['scaf'])
KeyError: 'vstrands'
To fix it, I tried to pass in an empty list of vstrands when I instantiated my DNAdesign object, seen here:
design = sc.DNADesign(helices=[], strands=[], vstrands=[], grid=sc.square)
but got a keyword error:
File "../export.py", line 15, in
design = main()
File "../export.py", line 5, in main
design = sc.DNADesign(helices=[], strands=[], vstrands=[], grid=sc.square)
TypeError: init() got an unexpected keyword argument 'vstrands'
{ "version": "0.3.0", "helices": [ {"grid_position": [0, 0]}, {"max_offset": 32, "grid_position": [0, 1]} ], "strands": [ { "color": "#0066cc", "substrands": [ {"helix": 0, "forward": true, "start": 0, "end": 32} ], "is_scaffold": true } ] }
Exception raised: '>' not supported between instances of 'NoneType' and 'int'
scadnano represents position like this:
{ "x": 0, "y": 0, "z": 0, "pitch": 0 , "roll": 0 , "yaw": 0}
but codenano represents them like this:
{ "origin": { "x": 0, "y": 0, "z": 0}, "pitch": 0 , "roll": 0 , "yaw": 0}
Although I prefer the former to keep the file format flat and readable (and scadnano will continue to write in that format), we should be able to read the latter.
These look "nice"
kelly_colors = ['F2F3F4', '222222', 'F3C300', '875692', 'F38400', 'A1CAF1', 'BE0032', 'C2B280', '848482', '008856', 'E68FAC', '0067A5', 'F99379', '604E97', 'F6A600', 'B3446C', 'DCD300', '882D17', '8DB600', '654522', 'E25822', '2B3D26']
See https://medium.com/@rjurney/kellys-22-colours-of-maximum-contrast-58edb70c90d1
codenano has the same interpretation of y, but x and z are swapped.
Let's swap them. Then the main view shows x-y coordinates, and the side view shows z-y coordinates.
Related to UC-Davis-molecular-computing/scadnano#307
First we need to close issue #86.
The web interface has no concept of a special origami design type. A DNADesign
is implicitly an origami if at least one strand is a scaffold, and multiple strands can be scaffolds.
The Python package is inconsistent, because many Strand
's can have the field is_scaffold
set to true
, but only one of them can be equal to DNAOrigamiDesign.scaffold
.
It would be cleaner just to remove DNAOrigamiDesign
. There could still be convenience methods for scaffold(s) such as assigning M13 to the first strand labeled as a scaffold.
This isn't a big deal, but Design is a bit easier to type, and "DNA" is a bit redundant since this is all about DNA (and is not the same as the DNA sequence assigned to a strand.)
We need to be able to load a .dna file into the objects of the library.
Example usecase: would make my life easier when debugging the .dna <-> cadnano formats.
The current version of origami_rectangle.create
specifies each Strand by explicitly listing its Substrands. This is tedious and error-prone.
It is simpler to draw two long strands, one in each direction, on each Helix, and then use the methods add_nick
, add_half_crossover
, and add_full_crossover
, just as one would do using cadnano to manually design the origami.
See examples/6-helix-bundle-honeycomb.py for an example.
It should have the same fields and interpretation as Parameters in codenano:
https://docs.rs/codenano/0.5.1/codenano/struct.Parameters.html
I imported the "squarenut.json" origami from here: https://www.dropbox.com/s/zsm3xlnyurnffd9/Nature09.zip?file_subpath=%2FNature09
According to the included squarenut.svg (in the file squarenut.zip), the helices should be positioned this way:
But importing it in scadnano, they appear this way (as though each is shifted to the right by one):
Is this a bug in the import? Or is it a mistake in the way I am implementing the honeycomb lattice in scadnano? (Documented here.)
I intended for it to interpret honeycomb coordinates exactly the same as cadnano, so if I got that wrong, I'll just switch it. But first, I wanted to check to see if it is a bug in the import.
I don't have a working cadnano installation, so it's difficult for me to test this.
Put another way, scadnano assumes that the helix at the origin (helix 21 in the two designs shown) has neighbors above it, below and to the right, and below and to the left, with empty space below it, above and to the right, and above and to the left. The cadnano design seems to invert this.
The file squarenut.zip has the cadnano .json file, the cadnano exported SVG file, and the scadnano .dna file created after importing in the web interface (which calls the Python scripting interface, which is why I posted it in the Python repo).
This gives a nice way to specify toeholds and extensions common in DNA strand displacement designs.
See also UC-Davis-molecular-computing/scadnano#34.
This is not supported currently in scadnano, and it will take a lot of effort to support it, since much of the logic pervading the code assumes the first and last substrands are Domain
's. There's nothing requiring this in principle, but it will be a headache to change it. Also, some design decisions will have to be made along the way.
For example, the default staple name for exporting sequences is the same as cadnano, where the staple is named after the (helix,offset) pairs of its 5' and 3' ends. We could do the same thing, where if the end of a strand is an Extension
, we use the adjacent Domain
's to name the staple. But little decisions like this will probably have to happen all over the place.
These were just added to the scadnano web interface to make it easier to edit tick marks manually. But they also give a shorter way to store common periodic tick marks in the .dna file, so they should also be implemented in the Python package as well.
scadnano can create designs not describable in cadnano, for example using loopouts or parallel crossovers.
One incompatible feature is that scaffolds can go reverse on even-numbered helices; in cadnano they always go forward on even-numbered helices and reverse on odd-numbered helices. But if this is the only incompatibility, then it can easily be fixed by reversing the polarity of all strands: reverse the direction of each Substrand, and reverse the order of the list of all Substrands. A method to do this to a whole DNADesign would simply do this to all strands.
Currently if the number of helices is h, then DNADesign.helices_view_order
is a permutation of the list [0,1,...,h-1].
It would be more natural if it is a permutation of the set of helix indices.
Currently the filename can be customized. If extension is specified instead (should be mutually exclusion with filename), then keep the same name as the script, but use the custom extension instead of the default. (Mostly important for write_scadnano_file
, but could be used e.g., with writing DNA sequences.
With scadnano already installed, attempting to upgrade raises an exception:
(base) PS C:\Users\pexat> pip install --upgrade scadnano
Collecting scadnano
Downloading https://files.pythonhosted.org/packages/02/56/74995fd209b99b246b652b4065bda926ef2c01254a94e814364a4e6e0b09/scadnano-0.2.0.tar.gz (46kB)
|████████████████████████████████| 51kB 234kB/s
ERROR: Complete output from command python setup.py egg_info:
ERROR: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\pexat\AppData\Local\Temp\pip-install-9u2no0ri\scadnano\setup.py", line 8, in <module>
with open(path.join(this_directory, 'README.md'), encoding='utf-8') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\pexat\\AppData\\Local\\Temp\\pip-install-9u2no0ri\\scadnano\\README.md'
----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in C:\Users\pexat\AppData\Local\Temp\pip-install-9u2no0ri\scadnano\
It appears to upgrade okay, however.
In Strand.to_json_serializable, currently at scadnano.py line 1249.
I strongly believe that you want to check hasattr(self, is_scaffold)
and not hasattr(self, is_scaffold_key)
as it is ccurently.
The easiest way to do this is simply to ignore IDT-style modifications such as /5Biosg/ACGT
written into the DNA sequence.
For the long term, support having these modifications specified as part of the Strand
object, which can be displayed in scadnano.
This is similar to UC-Davis-molecular-computing/scadnano#145 in the web interface, but for the scripting package.
Currently, there are a large number of designs in the examples folder, many not well-commented. I've put those there whenever I made up a new design, typically to test a new feature in the Python package or the web interface.
These should be broken into a "miscellaneous" folder that's like the current one, and an official "user-friendly" examples folder. The latter should contain a small number of well-commented examples intended to showcase the features of scadnano.
Currently, the scadnano honeycomb coordinate system is a subset of the hex honeycomb system, so simply omits certain (x,y) coordinates.
cadnano uses a bijection between pairs of integers and coordinates. Switch to this would involve more conversion between hex and honeycomb coordinates, but would perhaps make it easy to think about honeycomb coordinates as being uniquely identified by rows and columns.
codenano allows "chained commands" for a less verbose way to create strands. For example see here: https://docs.rs/codenano/0.5.1/codenano/
design.strand(0, 0).to(31)
.cross(1).to(10)
.cross(2).to(21);
// Now its reverse complement:
design.strand(2, 21).to(10)
.cross(1).to(31)
.cross(0).to(0);
This is equivalent to the more verbose Python code:
domain_11 = sc.Domain(0, True, 0, 31)
domain_12 = sc.Domain(1, False, 10, 31)
domain_13 = sc.Domain(2, True, 10, 21)
strand1 = sc.Strand([domain_11, domain_12, domain_13])
domain_21 = sc.Domain(2, False, 10, 21)
domain_22 = sc.Domain(1, True, 10, 31)
domain_23 = sc.Domain(0, False, 0, 31)
strand1 = sc.Strand([domain_11, domain_12, domain_13])
or the slightly less verbose
strand1 = sc.Strand([
sc.Domain(0, True, 0, 31),
sc.Domain(1, False, 10, 31),
sc.Domain(2, True, 10, 21),
])
strand1 = sc.Strand([
sc.Domain(2, False, 10, 21),
sc.Domain(1, True, 10, 31),
sc.Domain(0, False, 0, 31),
])
Note that this requires crossovers to be "vertical", i.e., they have the same offset on the from Helix and the to Helix. Perhaps that can be overridden with an optional second parameter to cross
. But since the most common crossover is vertical, the less verbose method is superior, since it reduces the amount of redundant information that needs to be specified.
Currently it is expected that cross/loopout
and to
alternate. Allow two consecutive to
's, and add a new function (like to
) where two consecutive of them do not make two domains, but merely change the current offset.
In this paper, SI Figure S5 shows a "squarenut design". It is available as a cadnano file here.
I put together a zip file squarenut.zip with three files: the cadnano squarenut.json
file, the imported scadnano squarenut.dna
file, and the SVG image squarenut.svg
showing how the design should appear in cadnano.
This is what the first two helices look like in the SVG file (but there are similar problems in every helix)
This is how they appear in scadnano:
As you can see, there are extra staple strands on the left and right ends of the two helices (in red in scadnano). Each appears to be confined to a single helix, but have long-range horizontal crossovers from one side of the helix to the other.
In the file squarenut.dna
, here are the two extra strands on helix 0:
{
"color": "#cc0000",
"substrands": [
{"helix": 0, "forward": false, "start": 9, "end": 35},
{"helix": 0, "forward": false, "start": 133, "end": 135}
]
}
and
{
"color": "#cc0000",
"substrands": [
{"helix": 0, "forward": false, "start": 112, "end": 133}
]
},
Ensure that it starts on the first color for every newly created DNADesign
.
This will ensure a consistent cycling of colors, instead of being dependent on the global ColorCycler
variable.
Currently these are missing on most. Many functions/methods do discuss their parameters and return values. For these it may be most appropriate to move that discussion into :param:
and :return:
.
This is supported with a button in the web interface:
UC-Davis-molecular-computing/scadnano#289
It should also be supported with a method on DNADesign in the python interface, where it takes an optional iterable of crossovers and an optional iterable of helices (similar to how the user in the web interface can select some helices and some crossovers).
I tried to make a file scadnano_version, that various other files such as scadnano.py, setup.py, conf.py, could import to see the version.
No matter how I did it, some code would fail to import it properly, whether in CI unit testing, building the docs, or packaging for distribution in PyPI.
Figure out a way to write the current version in exactly one file. Maybe this is as simple as making it a text file that is read, rather than imported, since the Python import rules are so Byzantine.
Use Helix.min_offset
instead to specify that a helix starts at a different offset than 0.
Specify for each part of the design, when it is from from a JSON file, it should store all the fields that are not used by scadnano, and write them back out on serialization. This will allow scadnano to edit designs created by other programs that use fields scadnano doesn't use.
This is essentially the same as this feature in the web interface:
scadnano (and cadnano) logically break up each Strand
into Substrand
's based on which Helix
they are bound in. It is common to furthermore break up each of these into logical "domains" based on which other Strand
is bound.
However, some schemes such as the Wang/Thachuk/Soloveichik "leakless circuits" have several consecutive domains between the same Strand
's on the same Helix
. Thus, we would not want to enforce that when a BoundSubstrand
switches from one Domain
to another, this necessary occurs at a point where the other Strand
switches identity.
If Set<Domain> domains
is present as a top-level field in DNADesign
, then Strand.dna_sequence
should be null for any Strand
that has Substrand
's with domains. A domain has two fields, String name
and int length
. For each Substrand
with a List<Domain> domains
field, the sum of the lengths must equal Substrand.dna_length()
. Any domain with name name
is considered complementary to any domain with name name + '*'
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.