GithubHelp home page GithubHelp logo

mixes's People

Contributors

cgevans avatar damienwoods avatar dave-doty avatar

Watchers

 avatar  avatar  avatar

Forkers

scolobb

mixes's Issues

remove buffer mixline when buffer volume is 0

Currently, the table shows a line for buffer/water even if the volume is 0:

image

Remove the line in the table when the buffer volume is 0, unless a parameter buffer_line_if_absent is True.

track volumes across mixes within an experiment, and across experiments (old 53)

(from @dave-doty)

Track volumes across mixes in one experiment

It would be useful to track volumes across different mixes to ensure that sufficient volumes are available.

First, when using an intermediate mix m1 in downstream mixes, we currently have fairly primitive error-handling. For example, if the volume of m1 is 100 uL, then an error is issued if more than 100 uL of m1 is required in a downstream mix m2. However, suppose m1 is required in two downstream mixes m2 and m3, each requiring 60 uL of m1. No error is given because 60 < 100, even though a total of 120 uL of m1 is required for both m2 and m3.

This could be handled by a new class Experiment, which can track several related mixes used in a single experiment and issue an error if the total volumes don't add up.

Improved buffer / filler handling

At the moment, buffer is treated as a special addition to mixes, not as a component.

I think it would make more sense to have a buffer / fill action, which would fill any remaining quantity with the component in that action. An experiment could have a default fill action to add to mixes.

This would allow handling of situations where the filling component changes: for example, if someone is storing their DNA in water without buffer, and then adds buffer at some point in mixing, then wants to preserve that buffer in later stages, even if potentially adding in some other things that are in water without buffer.

splitting mixes

Sometimes one wants to make a single large mix and split it across several test tubes. Have a way to specify this where one simply indicates the desired volumes/concentrations in each individual test tube, as well as the number of them (and possibly their names), and it should automatically figure out the volumes to mix in the large Mix, and print instructions for that and for splitting/aliquotting into several individual test tubes.

It should also support having a small excess (say default of 5% extra) in case there are a large number of test tubes, so that the final one does not suffer from the volume being significantly lower due to pipetting error on the other aliquots.

support for other protocol steps

This isn't really a specific issue, just a place to populate with ideas that can be used to create more specific issues.

Here's some things that would be useful for experiments I've done recently:

  1. Splitting: Sometimes one wants to make a single large mix and split it across several test tubes. Have a way to specify this where one simply indicates the desired volumes/concentrations in each individual test tube, as well as the number of them (and possibly their names), and it should automatically figure out the volumes to mix in the large Mix, and print instructions for that and for splitting/aliquotting into several individual test tubes.
  2. Master mixes: For instance, if 10 sample tubes all contain water, buffer, scaffold, staples, and differ only in a 5th component, then you'd want to make a master mix of the shared components, with enough volume to split into 10 tubes to reduce the amount of pipetting. Even better (but maybe this is overcomplicating it) would be if we have an optional threshold parameter master_mix_combine to Experiment that does the following. If at least two (or some configurable number) Mixes contain identical volumes of at least master_mix_combine components, then the Experiment object can automatically create the master mix. This can be done by adding a method Experiment.instructions that normally would just iterate over the Mixes in order and call instructions on each of them, but in this special case would actually create new Mixes using the master mix approach. Or maybe a cleaner way to do this is to add a method Experiment.use_master_mixes that returns a new Experiment object with master mixes substituted where appropriate. (But would still be nice to have a method Experiment.instructions). This should use the splitting mentioned in the feature idea above.
  3. Tracking samples in gel lanes: This might be more general, but it's nice to specify in a protocol in advance which sample goes into which gel lane, so it's written down. Having a command of some sort for this that creates a Markdown table showing sample names in lanes, e.g.,
    image

`Experiment.add_mix` should have option not to raise exception on adding an existing mix

It is common to re-run notebook cells several times. If the cell contains a line like

experiment.add_mix(my_mix)

then because a Jupyter notebook stores all variables globally, the second time the cell is run, it raises a ValueError, reporting that my_mix already exists in the experiment.

Add a parameter check_existing that, if False, skips this check, so that it is not necessary to restart the kernel and re-run previous cells just to run the cell a second time.

don't normalize pipetting volumes in mixes

The pint quantities are typically normalized to be the correct "3 orders of magnitude", i.e., 10 nM instead of 0.01 uM or 10,000 pM.

However, this is awkward for pipetting since we almost always want uL or mL:

image

Make the default behavior (perhaps configurable by an optional parameter somewhere) to be that pipetting volumes are normalized only if they are above 1 uL (e.g., so still normalize from 2000 uL to 2 mL), but for instance 0.75 uL would be written like that rather than 750 nL.

fix KeyError on deserializing `Experiment` with `_SplitMix`

Create a mix using the function split_mix() from the mixes module, add that mix using Experiment.add_mix(), then save the experiment to a JSON file. Loading the JSON file using Experiment.load() results in this error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [4], in <cell line: 6>()
      3 reference = load_reference('../alhambra_reference_30-monomers.csv')
      4 # reference.df
----> 6 old_experiment = Experiment.load('../sds14-30-monomers-qPCR/sds14.json')
      8 experiment = Experiment()
      9 monomer_np_mix_1uM = old_experiment['monomers np 1uM']

File /mnt/c/Dropbox/git/mixes/src/alhambra_mixes/experiments.py:243, in Experiment.load(cls, filename_or_stream)
    240     s = filename_or_stream
    241     close = False
--> 243 exp = cls._structure(json.load(s))
    244 if close:
    245     s.close()

File /mnt/c/Dropbox/git/mixes/src/alhambra_mixes/experiments.py:225, in Experiment._structure(cls, d)
    223 del d["class"]
    224 for k, v in d["components"].items():
--> 225     d["components"][k] = _structure(v)
    226 return cls(**d)

File /mnt/c/Dropbox/git/mixes/src/alhambra_mixes/dictstructure.py:24, in _structure(x, experiment)
     22 def _structure(x: dict[str, Any], experiment: "Experiment" | None = None) -> Any:
     23     if isinstance(x, dict) and ("class" in x):
---> 24         c = _STRUCTURE_CLASSES[x["class"]]
     25         del x["class"]
     26         if hasattr(c, "_structure"):

KeyError: '_SplitMix'

When I get more time I can come back and write a minimal reproducible example, but I thought maybe Constantine would see how to fix this right away.

It would also be good to write a unit test that captures this bug.

Track volumes across experiments

From #4:

A trickier problem is tracking volumes across different experiments for re-used mixes or strand stocks. This is trickier because, unlike the idea for the Experiment object above, this would require persisting the volume remaining in strand stocks or re-used mixes to disk in a database. The reason this is tricky is that one often re-runs the same code repeatedly to debug, and we would not want that to affect the stored volumes in the database.

One idea would be to use a "functional programming style". Maintain the database as a ordered sequence of files, each file associated to a single experiment. One could introduce a new object called Project, and just like multiple Mix's can be associated to an Experiment, multiple Experiment's can be associated to a Project. The Project would be responsible for maintaining an order on the experiments, and when running code for one experiment, it reads the database file of the previous experiment, say exp7, and writes a new one exp8. If the code is re-run, then this only overwrites the files for exp8.

Tracking samples in gel lanes

It's nice to specify in a protocol in advance which sample goes into which gel lane, so it's written down. Having a command of some sort for this that creates a Markdown table showing sample names in lanes, e.g.,
image

Write a function that takes as input a list of sample names, along with an optional num_lanes parameter (if missing then the length of the list of sample names is assumed to be the number of lanes).

support "enzyme units"

Some enzymes such as T4 DNA ligase use strange units of concentration such as "Weiss units" (https://www.thermofisher.com/document-connect/document-connect.html?url=https://assets.thermofisher.com/TFS-Assets%2FLSG%2Fmanuals%2FMAN0011987_T4_DNA_Ligase_5_Weiss_1000_Weiss_U_UG.pdf).

pint seems to support these units for concentration:

from pint import Quantity

q = Quantity('2 U/uL')
print(q)

prints 2.0 enzyme_unit / microliter

But alhambra_mixes doesn't:

mix_vol = 10
t4_ligase_2U = Component('T4 DNA ligase 2U/uL', concentration='2 U/uL')
atp_7p5mM_ligase_mix = mix = Mix(
    actions=[
        FixedConcentration(components=[atp_stock], fixed_concentration="7.5 mM"),
        FixedConcentration(components=[t4_ligase_2U], fixed_concentration=f"1 U/uL"),
    ],
    name=f'lig; atp 7.5mM',
    test_tube_name='lig; atp 7.5mM',
    fixed_total_volume=f"{mix_vol} uL",
    min_volume="1 uL",
)

which generates this error:

ValueError: 2 U/uL is not a valid quantity here (should be molarity).

export only ASCII in `Experiment` json

On Windows there is a common error of this form: UndefinedUnitError: 'µM' is not defined in the unit registry. It appears to stem from the use of Unicode such as µ.

There may be some fix, I'm not sure, but an easy way to ensure people don't get these errors is to output only ASCII, e.g., uM instead of µM whenever JSON is exported.

master mixes

For instance, if 10 sample tubes all contain water, buffer, scaffold, staples, and differ only in a 5th component, then you'd want to make a master mix of the shared components, with enough volume to split into 10 tubes to reduce the amount of pipetting. Even better (but maybe this is overcomplicating it) would be if we have an optional threshold parameter master_mix_combine to Experiment that does the following. If at least two (or some configurable number) Mixes contain identical volumes of at least master_mix_combine components, then the Experiment object can automatically create the master mix. This can be done by adding a method Experiment.instructions that normally would just iterate over the Mixes in order and call instructions on each of them, but in this special case would actually create new Mixes using the master mix approach. Or maybe a cleaner way to do this is to add a method Experiment.use_master_mixes that returns a new Experiment object with master mixes substituted where appropriate. (But would still be nice to have a method Experiment.instructions).

This should use the split_mix function implemented for issue #22.

generalized `LabStep`

Create a notion of a generalize "lab step", which is just an object with an instructions method that can be called to get Markdown describing the lab step (much like Mix.instructions currently), or a string (in which case the string itself is the Markdown, used for "manually" describing lab steps).

An Experiment can have a list of LabStep's, and it can also have an instructions method that simply iterates over each LabStep, calling instructions on each (checking whether it's just a string also).

This could be done with an explicit abstract superclass that has an abstract method instructions.

A simpler way (given Python's type system) that would still allow mypy to type is to use the typing.Protocol class to say that LabStep is a type alias for str | LabStepObject, where LabStepObject is used to define the type signature of the instructions method (e.g., takes a tablefmt parameter for any tables that appear in the instructions), e.g.

from typing import Protocol, TypeAlias
from tabulate import TableFormat

class LabStepObject(Protocol):
    def instructions(self, tablefmt: str | TableFormat, ...):
        ...

LabStep: TypeAlias = str | LabStepObject

It's not obvious what the type signature of instructions should be. Currently for Mix it's a mish-mash of stuff that happens to be relevant for making mixes, and much of it would be irrelevant for, for example, specifying which samples go to which gel lanes (see #21, idea 3). So we want to brainstorm how to redesign the type signature to simplify implementing new lab steps. Perhaps some of those parameters can become fields in the object instead, so that they are specific to that type of lab step. (For example, all the stuff related to plates could be fields in Mix since a Mix might deal with plates, but it wouldn't make sense for anything about plates to be specified for a LabStep that tracks samples in gel lanes.)

One way to do this is make a Preferences class, which contains all the "preferences" for every situation we encounter (whether to combine plate actions, use well markers, tablefmt, etc.), and most LabStep objects ignore most of the fields in Preferences, and use only the fields relevant to their situation. For this to be usable, all of the fields should be optional and have a default value, which in many cases will simply be ignored.

Location tracking should be more general.

At the moment, location tracking is based on plates and wells, and it is a bit annoying to change this, or even use 384 well plates instead of 96 (though the latter is possible).

It would be better to have a more general tracking of location, which might include, for example, different types of boxes of tubes, locations of plates/boxes/tubes in fridges/etc, and so on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.